Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

AWS re:Invent 2014: Performance Tuning EC2 Instances

Talk for AWS re:Invent 2014 by Brendan Gregg, Netflix.

Video: https://www.youtube.com/watch?v=7Cyd22kOqWc

Description: "Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance."

next
prev
1/81
next
prev
2/81
next
prev
3/81
next
prev
4/81
next
prev
5/81
next
prev
6/81
next
prev
7/81
next
prev
8/81
next
prev
9/81
next
prev
10/81
next
prev
11/81
next
prev
12/81
next
prev
13/81
next
prev
14/81
next
prev
15/81
next
prev
16/81
next
prev
17/81
next
prev
18/81
next
prev
19/81
next
prev
20/81
next
prev
21/81
next
prev
22/81
next
prev
23/81
next
prev
24/81
next
prev
25/81
next
prev
26/81
next
prev
27/81
next
prev
28/81
next
prev
29/81
next
prev
30/81
next
prev
31/81
next
prev
32/81
next
prev
33/81
next
prev
34/81
next
prev
35/81
next
prev
36/81
next
prev
37/81
next
prev
38/81
next
prev
39/81
next
prev
40/81
next
prev
41/81
next
prev
42/81
next
prev
43/81
next
prev
44/81
next
prev
45/81
next
prev
46/81
next
prev
47/81
next
prev
48/81
next
prev
49/81
next
prev
50/81
next
prev
51/81
next
prev
52/81
next
prev
53/81
next
prev
54/81
next
prev
55/81
next
prev
56/81
next
prev
57/81
next
prev
58/81
next
prev
59/81
next
prev
60/81
next
prev
61/81
next
prev
62/81
next
prev
63/81
next
prev
64/81
next
prev
65/81
next
prev
66/81
next
prev
67/81
next
prev
68/81
next
prev
69/81
next
prev
70/81
next
prev
71/81
next
prev
72/81
next
prev
73/81
next
prev
74/81
next
prev
75/81
next
prev
76/81
next
prev
77/81
next
prev
78/81
next
prev
79/81
next
prev
80/81
next
prev
81/81

PDF: AWSreInvent2014_perf_tuning_EC2_nobkg.pdf

Keywords (from pdftotext):

slide 1:
    PFC306
    Brendan Gregg, Performance Engineering, Netflix
    November 12, 2014 | Las Vegas, NV
    
slide 2:
slide 3:
slide 4:
slide 5:
slide 6:
slide 7:
slide 8:
slide 9:
    EC2
    ELB
    Cassandra
    Applications
    (Services)
    Elasticsearch
    EVCache
    SES
    SQS
    
slide 10:
slide 11:
slide 12:
slide 13:
    Start
    Find best
    balance
    Select memory to
    cache working set
    
slide 14:
    ASG Cluster
    prod1
    ELB
    Canary
    ASG-v010
    ASG-v011
    Instance
    Instance
    Instance
    Instance
    Instance
    Instance
    
slide 15:
slide 16:
slide 17:
    Select instance families
    From any desired
    resource, see
    types & cost
    Select resources
    
slide 18:
    eg, 8 vCPU:
    
slide 19:
slide 20:
slide 21:
    Acceptable
    Headroom
    Unacceptable
    
slide 22:
slide 23:
slide 24:
slide 25:
slide 26:
    Cost per hour
    Services
    
slide 27:
slide 28:
slide 29:
slide 30:
slide 31:
slide 32:
slide 33:
slide 34:
slide 35:
slide 36:
    # schedtool –B PID
    
slide 37:
    vm.swappiness = 0
    # from 60
    
slide 38:
    # echo never >gt; /sys/kernel/mm/transparent_hugepage/enabled
    # from madvise
    
slide 39:
    vm.dirty_ratio = 80
    # from 40
    vm.dirty_background_ratio = 5
    # from 10
    vm.dirty_expire_centisecs = 12000
    # from 3000
    mount -o defaults,noatime,discard,nobarrier …
    
slide 40:
    /sys/block/*/queue/rq_affinity
    /sys/block/*/queue/scheduler
    /sys/block/*/queue/nr_requests
    /sys/block/*/queue/read_ahead_kb
    mdadm –chunk=64 ...
    noop
    
slide 41:
    net.core.somaxconn = 1000
    net.core.netdev_max_backlog = 5000
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    net.ipv4.tcp_wmem = 4096 12582912 16777216
    net.ipv4.tcp_rmem = 4096 12582912 16777216
    net.ipv4.tcp_max_syn_backlog = 8096
    net.ipv4.tcp_slow_start_after_idle = 0
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.ip_local_port_range = 10240 65535
    net.ipv4.tcp_abort_on_overflow = 1
    # maybe
    
slide 42:
    echo tsc >gt; /sys/devices/system/clocksource/clocksource0/current_clocksource
    
slide 43:
slide 44:
slide 45:
slide 46:
slide 47:
    Resource
    Utilization
    (%)
    
slide 48:
slide 49:
slide 50:
slide 51:
    Application
    System Libraries
    System Calls
    Kernel
    Devices
    
slide 52:
slide 53:
slide 54:
    $ sar -n TCP,ETCP,DEV 1
    Linux 3.2.55 (test-e4f1a80b)
    rxpck/s
    08/18/2014
    09:10:43 PM
    09:10:44 PM
    09:10:44 PM
    IFACE
    eth0
    txpck/s
    09:10:43 PM
    09:10:44 PM
    active/s passive/s
    09:10:43 PM
    09:10:44 PM
    […]
    atmptf/s
    _x86_64_ (8 CPU)
    rxkB/s
    txkB/s rxcmp/s txcmp/s
    4537.46 28513.24
    iseg/s
    oseg/s
    estres/s retrans/s isegerr/s
    orsts/s
    rxmcst/s
    
slide 55:
slide 56:
slide 57:
slide 58:
slide 59:
    Stack frame
    Ancestry
    Mouse-over
    frames to
    quantify
    
slide 60:
    # git clone https://github.com/brendangregg/FlameGraph
    # cd FlameGraph
    # perf record -F 99 -ag -- sleep 60
    # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl >gt; perf.svg
    
slide 61:
slide 62:
    Kernel
    TCP/IP
    Broken
    Java stacks
    (missing
    frame
    pointer)
    Locks
    epoll
    Time
    Idle
    thread
    
slide 63:
slide 64:
slide 65:
    # ./iosnoop –ts
    Tracing block I/O. Ctrl-C to end.
    STARTs
    ENDs
    COMM
    5982800.302061 5982800.302679 supervise
    5982800.302423 5982800.302842 supervise
    5982800.304962 5982800.305446 supervise
    5982800.305250 5982800.305676 supervise
    […]
    PID
    TYPE DEV
    202,1
    202,1
    202,1
    202,1
    BLOCK
    BYTES LATms
    # ./iosnoop –h
    USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]
    -d device
    # device string (eg, "202,1)
    -i iotype
    # match type (eg, '*R*' for all reads)
    -n name
    # process name to match on I/O issue
    -p PID
    # PID to match on I/O issue
    # include queueing time in LATms
    # include start time of I/O (s)
    # include completion time of I/O (s)
    […]
    
slide 66:
slide 67:
    # perf record –e skb:consume_skb –ag -- sleep 10
    # perf report
    [...]
    74.42% swapper [kernel.kallsyms] [k] consume_skb
    --- consume_skb
    arp_process
    arp_rcv
    Summarizing stack traces for a
    __netif_receive_skb_core
    tracepoint
    __netif_receive_skb
    netif_receive_skb
    virtnet_poll
    perf_events can do many things,
    net_rx_action
    it is hard to pick just one example
    __do_softirq
    irq_exit
    do_IRQ
    ret_from_intr
    […]
    
slide 68:
slide 69:
    ec2-guest# ./showboost
    CPU MHz
    : 2500
    Turbo MHz
    : 2900 (10 active)
    Turbo Ratio : 116% (10 active)
    CPU 0 summary every 5 seconds...
    TIME
    06:11:35
    06:11:40
    06:11:45
    [...]
    C0_MCYC
    C0_ACYC
    Real CPU MHz
    UTIL
    51%
    50%
    49%
    RATIO
    116%
    115%
    115%
    MHz
    
slide 70:
slide 71:
slide 72:
    Region
    Breakdowns
    App
    Interactive
    Graph
    Metrics
    Options
    Summary Statistics
    
slide 73:
slide 74:
slide 75:
    Utilization
    Per device
    Breakdowns
    Saturation
    Errors
    
slide 76:
slide 77:
slide 78:
    http://aws.amazon.com/ec2/instance-types/
    http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html
    http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html
    http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance
    http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html
    http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html
    http://www.brendangregg.com/linuxperf.html
    http://www.slideshare.net/brendangregg/linux-performance-tools-2014
    http://www.brendangregg.com/USEmethod/use-linux.html
    http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
    https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
    
slide 79:
slide 80:
    Talk
    Time
    Title
    PFC-305
    Wednesday, 1:15pm
    Embracing Failure: Fault Injection and Service Reliability
    BDT-403
    Wednesday, 2:15pm
    Next Generation Big Data Platform at Netflix
    PFC-306
    Wednesday, 3:30pm
    Performance Tuning EC2
    DEV-309
    Wednesday, 3:30pm
    From Asgard to Zuul, How Netflix’s proven Open Source
    Tools can accelerate and scale your services
    ARC-317
    Wednesday, 4:30pm
    Maintaining a Resilient Front-Door at Massive Scale
    PFC-304
    Wednesday, 4:30pm
    Effective Inter-process Communications in the Cloud: The
    Pros and Cons of Micro Services Architectures
    ENT-209
    Wednesday, 4:30pm
    Cloud Migration, Dev-Ops and Distributed Systems
    APP-310
    Friday, 9:00am
    Scheduling using Apache Mesos in the Cloud
    
slide 81: