Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

USENIX ATC 2017: Performance Superpowers with Enhanced BPF

Video: https://www.youtube.com/watch?v=oc9000dM9-k

Talk for USENIX ATC 2017 by Brendan Gregg.

Description: "The Berkeley Packet Filter (BPF) in Linux has been enhanced in very recent versions to do much more than just filter packets, and has become a hot area of operating systems innovation, with much more yet to be discovered. BPF is a sandboxed virtual machine that runs user-level defined programs in kernel context, and is part of many kernels. The Linux enhancements allow it to run custom programs on other events, including kernel- and user-level dynamic tracing (kprobes and uprobes), static tracing (tracepoints), and hardware events. This is finding uses for the generation of new performance analysis tools, network acceleration technologies, and security intrusion detection systems.

This talk will explain the BPF enhancements, then discuss the new performance observability tools that are in use and being created, especially from the BPF compiler collection (bcc) open source project. These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and much more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations.

Because these BPF enhancements are only in very recent Linux (such as Linux 4.9), most companies are not yet running new enough kernels to be exploring BPF yet. This will change in the next year or two, as companies including Netflix upgrade their kernels. This talk will give you a head start on this growing technology, and also discuss areas of future work and unsolved problems."

next
prev
1/60
next
prev
2/60
next
prev
3/60
next
prev
4/60
next
prev
5/60
next
prev
6/60
next
prev
7/60
next
prev
8/60
next
prev
9/60
next
prev
10/60
next
prev
11/60
next
prev
12/60
next
prev
13/60
next
prev
14/60
next
prev
15/60
next
prev
16/60
next
prev
17/60
next
prev
18/60
next
prev
19/60
next
prev
20/60
next
prev
21/60
next
prev
22/60
next
prev
23/60
next
prev
24/60
next
prev
25/60
next
prev
26/60
next
prev
27/60
next
prev
28/60
next
prev
29/60
next
prev
30/60
next
prev
31/60
next
prev
32/60
next
prev
33/60
next
prev
34/60
next
prev
35/60
next
prev
36/60
next
prev
37/60
next
prev
38/60
next
prev
39/60
next
prev
40/60
next
prev
41/60
next
prev
42/60
next
prev
43/60
next
prev
44/60
next
prev
45/60
next
prev
46/60
next
prev
47/60
next
prev
48/60
next
prev
49/60
next
prev
50/60
next
prev
51/60
next
prev
52/60
next
prev
53/60
next
prev
54/60
next
prev
55/60
next
prev
56/60
next
prev
57/60
next
prev
58/60
next
prev
59/60
next
prev
60/60

PDF: USENIX_ATC2017_BPF_superpowers.pdf

Keywords (from pdftotext):

slide 1:
    2017 USENIX Annual Technical Conference
    Performance Analysis Superpowers with Linux eBPF
    Brendan Gregg
    Senior Performance Architect
    Jul 2017
    
slide 2:
slide 3:
    Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    12068 titus-reap 127.0.0.1
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    46476 127.0.0.1
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    0 1.25
    
slide 4:
    bcc/BPF tools
    
slide 5:
    Agenda
    1. eBPF & bcc
    2. bcc/BPF CLI Tools
    3. bcc/BPF Visualiza?ons
    
slide 6:
    Take aways
    1. Understand Linux tracing components
    2. Understand the role and state of enhanced BPF
    3. Discover opportuni?es for future development
    
slide 7:
slide 8:
    Who at NeNlix will use BPF?
    
slide 9:
    Introducing enhanced BPF for tracing: kernel-level soRware
    BPF
    
slide 10:
    Ye Olde BPF
    Berkeley Packet Filter
    # tcpdump host 127.0.0.1 and port 22 -d
    Op?mizes packet filter
    (000) ldh
    [12]
    performance
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    2 x 32-bit registers
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    (006) ldb
    [23]
    & scratch memory
    (007) jeq
    #0x84
    jt 10
    jf 8
    (008) jeq
    #0x6
    jt 10
    jf 9
    (009) jeq
    #0x11
    jt 10
    jf 18
    User-defined bytecode
    (010) ldh
    [20]
    executed by an in-kernel
    (011) jset
    #0x1fff
    jt 18
    jf 12
    sandboxed virtual machine
    (012) ldxb
    4*([14]&0xf)
    (013) ldh
    [x + 14]
    Steven McCanne and Van Jacobson, 1993
    [...]
    
slide 11:
    Enhanced BPF
    aka eBPF or just "BPF"
    10 x 64-bit registers
    maps (hashes)
    ac:ons
    Alexei Starovoitov, 2014+
    
slide 12:
    BPF for Tracing, Internals
    Observability Program
    BPF
    bytecode
    BPF
    program
    event config
    output
    per-event
    data
    sta?s?cs
    Kernel
    load
    verifier
    sta?c tracing
    tracepoints
    abach
    dynamic tracing
    BPF
    kprobes
    uprobes
    async
    copy
    sampling, PMCs
    maps
    perf_events
    Enhanced BPF is also now used for SDNs, DDOS mi?ga?on, intrusion detec?on, container security, …
    
slide 13:
    Dynamic Tracing
    
slide 14:
    1999: Kerninst
    hbp://www.paradyn.org/html/kerninst.html
    
slide 15:
    Event Tracing Efficiency
    E.g., tracing TCP retransmits
    Kernel
    Old way: packet capture
    tcpdump
    Analyzer
    1. read
    2. dump
    buffer
    1. read
    2. process
    3. print
    file system
    send
    receive
    disks
    New way: dynamic tracing
    Tracer
    1. configure
    2. read
    tcp_retransmit_skb()
    
slide 16:
    Linux Events & BPF Support
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    BPF
    support
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 17:
    A Linux Tracing Timeline
    1990’s: Sta?c tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: Rrace (2.6.27)
    2009: perf_events (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2016: Rrace & perf_events enhancements
    2012: uprobes (3.5)
    2014-2017: enhanced BPF patches: suppor:ng tracing events
    2016-2017: Rrace hist triggers
    also: LTTng, ktap, sysdig, ...
    
slide 18:
    Introducing BPF Complier Collec?on: user-level front-end
    BCC
    
slide 19:
    bcc
    • BPF Compiler Collec?on
    Tracing layers:
    – hbps://github.com/iovisor/bcc
    – Lead developer: Brenden Blanco
    bcc tool
    • Includes tracing tools
    • Provides BPF front-ends:
    Python
    Lua
    C++
    C helper libraries
    golang (gobpf)
    bcc tool
    bcc
    Python
    user
    kernel
    lua
    front-ends
    Kernel
    Events
    BPF
    
slide 20:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 21:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 22:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    en:re program
    
slide 23:
    ply/BPF
    hbps://github.com/iovisor/ply/blob/master/README.md
    en:re program
    
slide 24:
    The Tracing Landscape, Jul 2017
    (less brutal)
    (my opinion)
    dtrace4L.
    Ease of use
    sysdig
    (alpha)
    (brutal)
    ktap
    (many)
    perf
    LTTng
    recent changes
    ply/BPF
    (h i s t t
    rigge
    rs)
    Rrace
    (mature)
    stap
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 25:
    Performance analysis
    BCC/BPF CLI TOOLS
    
slide 26:
    Pre-BPF: Linux Perf Analysis in 60s
    1. uptime
    2. dmesg -T | tail
    3. vmstat 1
    4. mpstat -P ALL 1
    5. pidstat 1
    6. iostat -xz 1
    7. free -m
    8. sar -n DEV 1
    9. sar -n TCP,ETCP 1
    10. top
    hbp://techblog.neNlix.com/2015/11/linux-performance-analysis-in-60s.html
    
slide 27:
    bcc Installa?on
    • hbps://github.com/iovisor/bcc/blob/master/INSTALL.md
    • eg, Ubuntu Xenial:
    # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\
    sudo tee /etc/apt/sources.list.d/iovisor.list
    # sudo apt-get update
    # sudo apt-get install bcc-tools
    – Also available as an Ubuntu snap
    – Ubuntu 16.04 is good, 16.10 beber: more tools work
    • Installs many tools
    – In /usr/share/bcc/tools, and …/tools/old for older kernels
    
slide 28:
    bcc General Performance Checklist
    execsnoop
    opensnoop
    ext4slower (…)
    biolatency
    biosnoop
    cachestat
    tcpconnect
    tcpaccept
    9. tcpretrans
    10. gethostlatency
    11. runqlat
    12. profile
    
slide 29:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 30:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 31:
    Exonerate or confirm storage latency issues and outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 32:
    Exonerate or confirm storage latency issues and outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 33:
    Iden:fy mul:modal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represen??ve with mul?ple modes or outliers
    
slide 34:
    Iden:fy mul:modal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represen??ve with mul?ple modes or outliers
    
slide 35:
    Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 36:
    Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 37:
    Iden:fy DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 38:
    Iden:fy DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 39:
    Examine CPU scheduler run queue latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 40:
    Examine CPU scheduler run queue latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 41:
    Advanced Analysis
    • Find/draw a func?onal diagram
    • Apply performance methods
    hbp://www.brendangregg.com/methodology.html
    1. Workload Characteriza?on
    2. Latency Analysis
    3. USE Method
    • Start with the Q's,
    then find the A's
    • Use mul?-tools:
    – funccount, trace, argdist, stackcount
    e.g., storage I/O subsystem:
    
slide 42:
    Construct programma:c one-liners with trace
    e.g. reads over 20000 bytes:
    # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    TIME
    PID
    COMM
    FUNC
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    # trace -h
    [...]
    trace –K blk_account_io_start
    Trace this kernel function, and print info with a kernel stack trace
    trace 'do_sys_open "%s", arg2'
    Trace the open syscall and print the filename being opened
    trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    Trace the read syscall and print a message for reads >gt;20000 bytes
    trace r::do_sys_return
    Trace the return from the open syscall
    trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
    Trace the open() call from libc only if the flags (arg2) argument is 42
    [...]
    argdist by Sasha Goldshtein
    
slide 43:
    Create in-kernel summaries with argdist
    e.g. histogram of tcp_cleanup_rbuf() copied:
    # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied'
    [15:34:45]
    copied
    : count
    distribution
    0 ->gt; 1
    : 15088
    |**********************************
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 4786
    |***********
    128 ->gt; 255
    : 1
    256 ->gt; 511
    : 1
    512 ->gt; 1023
    : 4
    1024 ->gt; 2047
    : 11
    2048 ->gt; 4095
    : 5
    4096 ->gt; 8191
    : 27
    8192 ->gt; 16383
    : 105
    16384 ->gt; 32767
    : 0
    argdist by Sasha Goldshtein
    
slide 44:
    Coming to a GUI near you
    BCC/BPF VISUALIZATIONS
    
slide 45:
    BPF metrics and analysis can be automated in GUIs
    Eg, NeNlix Vector (self-service UI):
    Flame Graphs
    Heat Maps
    Tracing Reports
    Should be open sourced; you may also build/buy your own
    
slide 46:
    Latency heatmaps show histograms over :me
    
slide 47:
    Efficient CPU and off-CPU flame graphs by coun:ng stacks in kernel context
    CPU
    Off-CPU
    
slide 48:
    Advanced off-CPU analysis: BPF can merge the blocking stack with the waker
    stack in-kernel
    Waker task
    Waker stack
    Stack
    Direc?on
    Wokeup
    Blocked stack
    Blocked task
    
slide 49:
    BPF
    FUTURE WORK
    
slide 50:
    BCC Improvements
    Challenges:
    • Ini?alize all variables
    • Extra bpf_probe_read()s
    • BPF_PERF_OUTPUT()
    • Verifier errors
    
slide 51:
    Higher-level Language
    • bcc's Python/C interface is ok, but verbose
    • Alternate higher-level language front end?
    – New front-ends can use exis?ng libbcc, and can be added as part of bcc itself
    – Whave a problem in search of a new language (instead of the other way around)
    
slide 52:
    ply
    • A new BPF-based language and tracer for Linux
    – Created by Tobias Waldekranz
    – hbps://github.com/iovisor/ply hbps://wkz.github.io/ply/
    • High-level language
    – Simple one-liners
    – Short scripts
    • In development (?)
    – kprobes and tracepoints only, uprobes/perf_events not yet
    – Successful so far as a proof of concept
    – Not produc?on tested yet (bcc is)
    
slide 53:
    File opens can be traced using a short ply one-liner
    # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }'
    1 probe active
    opened: /sys/kernel/debug/tracing/events/enable
    opened: /etc/ld.so.cache
    opened: /lib/x86_64-linux-gnu/libselinux.so.1
    opened: /lib/x86_64-linux-gnu/libc.so.6
    opened: /lib/x86_64-linux-gnu/libpcre.so.3
    opened: /lib/x86_64-linux-gnu/libdl.so.2
    opened: /lib/x86_64-linux-gnu/libpthread.so.0
    opened: /proc/filesystems
    opened: /usr/lib/locale/locale-archive
    opened: .
    [...]
    
slide 54:
    ply programs are concise, such as measuring read latency
    # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    2 probes active
    ^Cde-activating probes
    [...]
    @ns:
    [ 512,
    1k)
    [ 1k,
    2k)
    [ 2k,
    4k)
    [ 4k,
    8k)
    [ 8k, 16k)
    [ 16k, 32k)
    [ 32k, 64k)
    [ 64k, 128k)
    [128k, 256k)
    [256k, 512k)
    [512k,
    1M)
    [...]
    3 |########
    7 |###################
    12 |################################|
    3 |########
    2 |#####
    0 |
    0 |
    3 |########
    1 |###
    1 |###
    2 |#####
    
slide 55:
    New Tooling/Metrics
    
slide 56:
    New Visualiza?ons
    
slide 57:
    Case Studies
    Use it
    Solve something
    Write about it
    Talk about it
    
slide 58:
    Take aways
    1. Understand Linux tracing components
    2. Understand the role and state of enhanced BPF
    3. Discover opportuni?es for future development
    Please contribute:
    - hbps://github.com/
    iovisor/bcc
    - hbps://github.com/
    iovisor/ply
    BPF Tracing in Linux
    • 3.19: sockets
    • 3.19: maps
    • 4.1: kprobes
    • 4.3: uprobes
    • 4.4: BPF output
    • 4.6: stacks
    • 4.7: tracepoints
    • 4.9: profiling
    • 4.9: PMCs
    
slide 59:
    Links & References
    iovisor bcc:
    - hbps://github.com/iovisor/bcc hbps://github.com/iovisor/bcc/tree/master/docs
    - hbp://www.brendangregg.com/blog/ (search for "bcc")
    - hbp://www.brendangregg.com/ebpf.html#bcc
    - hbp://blogs.microsoR.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    - On designing tracing tools: hbps://www.youtube.com/watch?v=uibLwoVKjec
    bcc tutorial:
    - hbps://github.com/iovisor/bcc/blob/master/INSTALL.md
    - …/docs/tutorial.md
    - …/docs/tutorial_bcc_python_developer.md
    - …/docs/reference_guide.md
    - .../CONTRIBUTING-SCRIPTS.md
    ply: hbps://github.com/iovisor/ply
    BPF:
    - hbps://www.kernel.org/doc/Documenta?on/networking/filter.txt
    - hbps://github.com/iovisor/bpf-docs
    - hbps://suchakra.wordpress.com/tag/bpf/
    Flame Graphs:
    - hbp://www.brendangregg.com/flamegraphs.html
    - hbp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    - hbp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    NeNlix Tech Blog on Vector:
    - hbp://techblog.neNlix.com/2015/04/introducing-vector-neNlixs-on-host.html
    Linux Performance: hbp://www.brendangregg.com/linuxperf.html
    
slide 60:
    2017 USENIX Annual Technical Conference
    Thank You
    – Ques?ons?
    – iovisor bcc: hbps://github.com/iovisor/bcc
    – hbp://www.brendangregg.com
    – hbp://slideshare.net/brendangregg
    – bgregg@neNlix.com
    – @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware), Sasha Goldshtein (Sela),
    Daniel Borkmann (Cisco), Wang Nan (Huawei), and other BPF and bcc contributors!