Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

AWS re:Invent 2019: BPF Performance Analysis at Netflix

Talk by Brendan Gregg at AWS re:Invent 2019.

Video: https://www.youtube.com/watch?v=16slh29iN1g

Description: "Extended BPF (eBPF) is an open source Linux technology that powers a whole new class of software: mini programs that run on events. Among its many uses, BPF can be used to create powerful performance analysis tools capable of analyzing everything: CPUs, memory, disks, file systems, networking, languages, applications, and more. In this session, Netflix's Brendan Gregg tours BPF tracing capabilities, including many new open source performance analysis tools he developed for his new book "BPF Performance Tools: Linux System and Application Observability." The talk includes examples of using these tools in the Amazon EC2 cloud."

next
prev
1/65
next
prev
2/65
next
prev
3/65
next
prev
4/65
next
prev
5/65
next
prev
6/65
next
prev
7/65
next
prev
8/65
next
prev
9/65
next
prev
10/65
next
prev
11/65
next
prev
12/65
next
prev
13/65
next
prev
14/65
next
prev
15/65
next
prev
16/65
next
prev
17/65
next
prev
18/65
next
prev
19/65
next
prev
20/65
next
prev
21/65
next
prev
22/65
next
prev
23/65
next
prev
24/65
next
prev
25/65
next
prev
26/65
next
prev
27/65
next
prev
28/65
next
prev
29/65
next
prev
30/65
next
prev
31/65
next
prev
32/65
next
prev
33/65
next
prev
34/65
next
prev
35/65
next
prev
36/65
next
prev
37/65
next
prev
38/65
next
prev
39/65
next
prev
40/65
next
prev
41/65
next
prev
42/65
next
prev
43/65
next
prev
44/65
next
prev
45/65
next
prev
46/65
next
prev
47/65
next
prev
48/65
next
prev
49/65
next
prev
50/65
next
prev
51/65
next
prev
52/65
next
prev
53/65
next
prev
54/65
next
prev
55/65
next
prev
56/65
next
prev
57/65
next
prev
58/65
next
prev
59/65
next
prev
60/65
next
prev
61/65
next
prev
62/65
next
prev
63/65
next
prev
64/65
next
prev
65/65

PDF: reInvent2019_BPF_Performance_Analysis.pdf

Keywords (from pdftotext):

slide 1:
slide 2:
    OPN303
    BPF Performance Analysis at Netflix
    Brendan Gregg
    Senior Performance Architect
    Netflix
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 3:
    Superpowers Demo
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 4:
    Agenda
    Why BPF Is Changing Linux
    BPF Internals
    Performance Analysis
    Tool Development
    
slide 5:
    Why BPF Is Changing Linux
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 6:
    50 years, one (dominant) OS model
    Applications
    System Calls
    Kernel
    Hardware
    
slide 7:
    Origins: Multics
    1960s
    Applications
    Supervisor
    Hardware
    Privilege
    Ring 0
    Ring 1
    Ring 2
    
slide 8:
    Modern Linux: a new OS model
    User-mode
    Applications
    Kernel-mode
    Applications (BPF)
    System Calls
    BPF Helper Calls
    Kernel
    Hardware
    
slide 9:
    50 years, one process state model
    User
    preemption or time quantum expired
    On-CPU
    schedule
    Kernel
    resource I/O
    acquire lock
    Off-CPU
    swap out
    sleep
    wait for work
    Wait
    Block
    Sleep
    Idle
    Runnable
    wakeup
    Swapping
    swap in
    acquired
    wakeup
    work arrives
    Linux groups
    most sleep states
    
slide 10:
    BPF uses a new program state model
    On-CPU
    Off-CPU
    Enabled
    attach
    Loaded
    event fires
    program ended
    BPF
    helpers
    Kernel
    spin lock
    Spinning
    
slide 11:
    Netconf 2018
    Alexei Starvoitov
    
slide 12:
    Kernel Recipes 2019, Alexei Starovoitov
    ~40 active BPF programs on every Facebook server
    
slide 13:
    >gt;150K Amazon EC2 server instances
    ~34% US Internet traffic at night
    >gt;130M subscribers
    ~14 active BPF programs on every instance (so far)
    
slide 14:
    Modern Linux: Event-based Applications
    Kernel-mode
    Applications (BPF)
    User-mode
    Applications
    U.E.
    Scheduler
    Kernel
    Kernel
    Events
    Hardware Events (incl. clock)
    
slide 15:
    Modern Linux is becoming microkernel-ish
    Kernel-mode
    Services & Drivers
    User-mode
    Applications
    BPF
    BPF
    BPF
    Smaller
    Kernel
    Hardware
    The word “microkernel” has already been invoked by Jonathan Corbet, Thomas Graf, Greg Kroah-Hartman, ...
    
slide 16:
    BPF Internals
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 17:
    BPF 1992: Berkeley Packet Filter
    # tcpdump -d host 127.0.0.1 and port 80
    (000) ldh
    [12]
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    (006) ldb
    [23]
    (007) jeq
    #0x84
    jt 10 jf 8
    (008) jeq
    #0x6
    jt 10 jf 9
    (009) jeq
    #0x11
    jt 10 jf 18
    (010) ldh
    [20]
    (011) jset
    #0x1fff
    jt 18 jf 12
    (012) ldxb
    4*([14]&0xf)
    (013) ldh
    [x + 14]
    (014) jeq
    #0x50
    jt 17 jf 15
    (015) ldh
    [x + 16]
    (016) jeq
    #0x50
    jt 17 jf 18
    (017) ret
    #262144
    (018) ret
    A limited
    virtual machine for
    efficient packet filters
    
slide 18:
    BPF 2019: aka extended BPF
    bpftrace
    XDP
    bpfconf
    BPF Microconference
    & Facebook Katran, Google KRSI, Netflix flowsrus,
    and many more
    
slide 19:
    BPF 2019
    User-Defined BPF Programs
    SDN Configuration
    DDoS Mitigation
    Kernel
    Runtime
    Event Targets
    verifier
    sockets
    Intrusion Detection
    Container Security
    kprobes
    BPF
    Observability
    Firewalls
    Device Drivers
    uprobes
    tracepoints
    BPF
    actions
    perf_events
    
slide 20:
    BPF is open source and in the Linux kernel
    (you’re all getting it)
    BPF is also now a technology name,
    and no longer an acronym
    
slide 21:
    BPF Internals
    BPF Instructions
    Verifier
    Interpreter
    JIT Compiler
    Registers
    Machine Code
    Execution
    Map Storage (Mbytes)
    Events
    BPF
    Context
    BPF
    Helpers
    Rest of
    Kernel
    
slide 22:
    Is BPF Turing complete?
    
slide 23:
    BPF: a new type of software
    Execution
    model
    Userdefined
    Compile
    Security
    Failure
    mode
    Resource
    access
    User
    task
    yes
    any
    userbased
    abort
    syscall,
    fault
    Kernel
    task
    static
    none
    panic
    direct
    BPF
    event
    yes
    JIT,
    CO-RE
    verified,
    JIT
    error
    message
    restricted
    helpers
    
slide 24:
    Performance Analysis
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 25:
    BPF enables a new class of
    custom, efficient, and production-safe
    performance analysis tools
    
slide 26:
    BPF
    Performance
    Tools
    
slide 27:
    Tool Examples by Subsystem
    1. CPUs (scheduling)
    2. Memory
    3. Disks
    4. File Systems
    5. Networking
    6. Languages
    7. Applications
    8. Kernel
    9. Hypervisors
    10. Containers
    
slide 28:
    Tool Extensions & Sources
    .py: BCC (Python)
    .bt: bpftrace
    (some tools exist for both)
    https://github.com/iovisor/bcc
    https://github.com/iovisor/bpftrace
    https://github.com/brendangregg/bpf-perf-tools-book
    
slide 29:
    CPUs: execsnoop
    New process trace
    # execsnoop.py -T
    TIME(s) PCOMM
    run
    bash
    svstat
    perl
    grep
    sed
    xargs
    cut
    echo
    mkdir
    [...]
    run
    bash
    svstat
    perl
    [...]
    PID
    PPID
    RET ARGS
    0 ./run
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;pr...
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/xargs
    0 /usr/bin/cut -d -f 1
    0 /bin/echo
    0 /bin/mkdir -v -p /data/tomcat
    0 ./run
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;pr...
    
slide 30:
    CPUs: runqlat
    Scheduler latency (run queue latency)
    # runqlat.py 10 1
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    8192 ->gt; 16383
    16384 ->gt; 32767
    : count
    : 1906
    : 22087
    : 21245
    : 7333
    : 4902
    : 6002
    : 7370
    : 13001
    : 4823
    : 1519
    : 3682
    : 3170
    : 5759
    : 14549
    : 5589
    distribution
    |***
    |****************************************|
    |************************************** |
    |*************
    |********
    |**********
    |*************
    |***********************
    |********
    |**
    |******
    |*****
    |**********
    |**************************
    |**********
    
slide 31:
    CPUs: runqlen
    Run queue length
    # runqlen.py 10 1
    Sampling run queue length... Hit Ctrl-C to end.
    runqlen
    : count
    : 47284
    : 211
    : 28
    : 6
    : 4
    : 1
    : 1
    distribution
    |****************************************|
    
slide 32:
    Memory: ffaults (book)
    Page faults by filename
    # ffaults.bt
    Attaching 1 probe...
    [...]
    @[dpkg]: 18
    @[sudoers.so]: 19
    @[ld.so.cache]: 27
    @[libpthread-2.27.so]: 29
    @[ld-2.27.so]: 32
    @[locale-archive]: 34
    @[system.journal]: 39
    @[libstdc++.so.6.0.25]: 43
    @[libapt-pkg.so.5.0.2]: 47
    @[BrowserMetrics-5D8A6422-77F1.pma]: 86
    @[libc-2.27.so]: 168
    @[i915]: 409
    @[pkgcache.bin]: 860
    @[]: 25038
    
slide 33:
    Disks: biolatency
    Disk I/O latency histograms, per second
    # biolatency.py -mT 1 5
    Tracing block device I/O... Hit Ctrl-C to end.
    06:20:16
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    06:20:17
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    [...]
    : count
    : 36
    : 1
    : 3
    : 17
    : 33
    : 7
    : 6
    distribution
    |**************************************|
    |***
    |*****************
    |**********************************
    |*******
    |******
    : count
    : 96
    : 25
    : 29
    distribution
    |************************************
    |*********
    |***********
    
slide 34:
    File Systems: xfsslower
    XFS I/O slower than a threshold (variants for ext4, btrfs, zfs)
    # xfsslower.py 50
    Tracing XFS operations slower than 50 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    21:20:46 java
    112789 R 8012
    21:20:47 java
    112789 R 3571
    21:20:49 java
    112789 R 5152
    21:20:52 java
    112789 R 5214
    21:20:52 java
    112789 R 7465
    21:20:54 java
    112789 R 5326
    21:20:55 java
    112789 R 4336
    [...]
    22:02:39 java
    112789 R 65536
    22:02:39 java
    112789 R 65536
    22:02:39 java
    112789 R 65536
    22:02:39 java
    112789 R 65536
    22:02:39 java
    112789 R 65536
    22:02:39 java
    112789 R 65536
    [...]
    LAT(ms) FILENAME
    60.16 file.out
    136.60 file.out
    63.88 file.out
    108.47 file.out
    58.09 file.out
    89.14 file.out
    67.89 file.out
    182.10 shuffle_6_646_0.data
    30.10 shuffle_6_646_0.data
    309.52 shuffle_6_646_0.data
    400.31 shuffle_6_646_0.data
    324.92 shuffle_6_646_0.data
    119.37 shuffle_6_646_0.data
    
slide 35:
    File Systems: xfsdist
    XFS I/O latency histograms, by operation
    # xfsdist.py 60
    Tracing XFS operation latency... Hit Ctrl-C to end.
    22:41:24:
    operation = 'read'
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    [...]
    operation = 'write'
    usecs
    0 ->gt; 1
    2 ->gt; 3
    [...]
    : count
    : 382130
    : 85717
    : 23639
    : 5668
    : 3594
    : 21387
    distribution
    |****************************************|
    |********
    |**
    |**
    : count
    : 12925
    : 83375
    distribution
    |*****
    |*************************************
    
slide 36:
    Networking: tcplife
    TCP session lifespans with connection details
    # tcplife.py
    PID
    COMM
    LADDR
    22597 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    22598 curl
    22604 curl
    22624 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    22647 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    [...]
    LPORT RADDR
    46644 127.0.0.1
    28527 127.0.0.1
    61620 52.205.89.26
    44400 52.204.43.121
    46648 127.0.0.1
    28527 127.0.0.1
    46650 127.0.0.1
    28527 127.0.0.1
    RPORT TX_KB RX_KB MS
    0 0.23
    0 0.28
    1 91.79
    1 121.38
    0 0.22
    0 0.27
    0 0.21
    0 0.26
    
slide 37:
    Networking: tcpsynbl (book)
    TCP SYN backlogs as histograms
    # tcpsynbl.bt
    Attaching 4 probes...
    Tracing SYN backlog size. Ctrl-C to end.
    @backlog[backlog limit]: histogram of backlog size
    @backlog[128]:
    [0]
    2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    @backlog[500]:
    [0]
    [1]
    [2, 4)
    [4, 8)
    2783 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    9 |
    4 |
    1 |
    
slide 38:
    Languages: funccount
    Count native function calls (C, C++, Go, etc.)
    # funccount.py 'tcp_s*'
    Tracing 50 functions for "tcp_s*"... Hit Ctrl-C to end.
    FUNC
    COUNT
    [...]
    tcp_setsockopt
    tcp_shutdown
    tcp_sndbuf_expand
    tcp_send_delayed_ack
    tcp_set_state
    tcp_sync_mss
    tcp_sendmsg_locked
    tcp_sendmsg
    tcp_send_mss
    tcp_small_queue_check.isra.29
    tcp_schedule_loss_probe
    tcp_send_ack
    tcp_stream_memory_free
    Detaching...
    
slide 39:
    Applications: mysqld_qslower
    MySQL queries slower than a threshold
    # mysqld_qslower.py $(pgrep mysqld)
    Tracing MySQL server queries for PID 9908 slower than 1 ms...
    TIME(s)
    PID
    MS QUERY
    9962 169.032 SELECT * FROM words WHERE word REGEXP '^bre.*n$'
    9962 205.787 SELECT * FROM words WHERE word REGEXP '^bpf.tools$'
    95.276 SELECT COUNT(*) FROM words
    23.723025 9962 186.680 SELECT count(*) AS count FROM words WHERE word REGEXP
    '^bre.*n$'
    30.343233 9962 181.494 SELECT * FROM words WHERE word REGEXP '^bre.*n$' ORDER BY word
    [...]
    
slide 40:
    Kernel: workq (book)
    Work queue function execution times
    # workq.bt
    Attaching 4 probes...
    Tracing workqueue request latencies. Ctrl-C to end.
    @us[blk_mq_timeout_work]:
    [1]
    1 |@@
    [2, 4)
    11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    [4, 8)
    18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    @us[xfs_end_io]:
    [1]
    [2, 4)
    [4, 8)
    [8, 16)
    [16, 32)
    [32, 64)
    [...]
    2 |@@@@@@@@
    6 |@@@@@@@@@@@@@@@@@@@@@@@@@@
    6 |@@@@@@@@@@@@@@@@@@@@@@@@@@
    12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    3 |@@@@@@@@@@@@@
    
slide 41:
    Hypervisor: xenhyper (book)
    Count hypercalls from Xen PV guests
    # xenhyper.bt
    Attaching 1 probe...
    @[mmu_update]: 44
    @[update_va_mapping]: 78
    @[mmuext_op]: 6473
    @[stack_switch]: 23445
    
slide 42:
    Containers: blkthrot (book)
    Count block I/O throttles by blk cgroup
    # blkthrot.bt
    Attaching 3 probes...
    Tracing block I/O throttles by cgroup. Ctrl-C to end
    @notthrottled[1]: 506
    @throttled[1]: 31
    
slide 43:
    That was only
    14 out of
    150+ tools
    All are
    open source
    Not all 150+
    tools shown here
    
slide 44:
    Coping with so many BPF tools at Netflix
    On Netflix servers, /apps/nflx-bpf-alltools has all the tools
    BCC, bpftrace, my book, Netflix internal
    Open source at: https://github.com/Netflix-Skunkworks/bpftoolkit
    Latest tools are fetched & put in a hierarchy: cpu, disk, …
    We are also building GUIs to front these tools
    
slide 45:
    Tool Development
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 46:
    Only one engineer at your company
    needs to learn tool development
    They can turn everyone’s ideas into tools
    
slide 47:
    The Tracing Landscape, Dec 2019
    (brutal)
    Ease of use
    (less brutal)
    (my opinion)
    (eBPF)
    (0.9.3)
    bpftrace
    ply/BPF
    sysdig
    (many)
    perf
    stap
    LTTng
    recent changes
    (alpha)
    (mature)
    Stage of
    Development
    (hist
    trigg
    ftrace
    ers,
    synth
    etic e
    vents
    bcc/BPF
    C/BPF
    Raw BPF
    Scope & Capability
    
slide 48:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    entire program
    
slide 49:
    bpftrace/BPF
    bpftrace -e 'kr:vfs_read { @ = hist(retval); }'
    https://github.com/iovisor/bpftrace
    entire program
    
slide 50:
    bpftrace Syntax
    bpftrace -e ‘k:do_nanosleep /pid >gt; 100/ { @[comm]++ }’
    Probe
    Action
    Filter
    (optional)
    
slide 51:
    Probe Type Shortcuts
    tracepoint
    Kernel static tracepoints
    usdt
    User-level statically defined tracing
    kprobe
    Kernel function tracing
    kretprobe
    Kernel function returns
    uprobe
    User-level function tracing
    uretprobe
    User-level function returns
    profile
    Timed sampling across all CPUs
    interval
    Interval output
    software
    Kernel software events
    hardware
    Processor hardware events
    
slide 52:
    Filters
    /pid == 181/
    /comm != “sshd”/
    /@ts[tid]/
    
slide 53:
    Actions
    Per-event output
    printf()
    system()
    join()
    time()
    Map summaries
    @ = count() or @++
    @ = hist()
    The following is in the https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
    
slide 54:
    Functions
    hist(n)
    Log2 histogram
    Print formatted
    printf(fmt, ...)
    lhist(n, min, max, step) Linear hist.
    print(@x[, top[, div]]) Print map
    count()
    Count events
    delete(@x)
    Delete map element
    sum(n)
    Sum value
    clear(@x)
    Delete all keys/values
    min(n)
    Minimum value
    Maximum value
    Register lookup
    max(n)
    reg(n)
    join(a)
    Join string array
    time(fmt)
    Print formatted time
    system(fmt)
    Run shell command
    avg(n)
    stats(n)
    Average value
    Statistics
    str(s)
    String
    ksym(p)
    Resolve kernel addr
    cat(file)
    Print file contents
    usym(p)
    Resolve user addr
    exit()
    Quit bpftrace
    kaddr(n)
    Resolve kernel symbol
    uaddr(n)
    Resolve user symbol
    
slide 55:
    Variable Types
    Basic Variables
    @global
    @thread_local[tid]
    $scratch
    Associative Arrays
    @array[key] = value
    Buitins
    pid
    
slide 56:
    Builtin Variables
    pid
    Process ID (kernel tgid)
    arg0, arg1, …
    Function args
    tid
    Thread ID (kernel pid)
    retval
    Return value
    cgroup Current Cgroup ID
    args
    Tracepoint args
    uid
    User ID
    func
    Function name
    gid
    Group ID
    probe
    Full probe name
    nsecs
    Nanosecond timestamp
    curtask
    Curr task_struct (u64)
    cpu
    Processor ID
    rand
    Random number (u32)
    comm
    Process name
    kstack Kernel stack trace
    ustack User stack trace
    
slide 57:
    bpftrace: BPF observability front-end
    # Files opened by process
    bpftrace -e 't:syscalls:sys_enter_open { printf("%s %s\n", comm,
    str(args->gt;filename)) }'
    # Read size distribution by process
    bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->gt;ret) }'
    # Count VFS calls
    bpftrace -e 'kprobe:vfs_* { @[func]++ }'
    # Show vfs_read latency as a histogram
    bpftrace -e 'k:vfs_read { @[tid] = nsecs }
    kr:vfs_read /@[tid]/ { @ns = hist(nsecs - @[tid]); delete(@tid) }’
    # Trace user-level function
    bpftrace -e 'uretprobe:bash:readline { printf(“%s\n”, str(retval)) }’
    Linux 4.9+, https://github.com/iovisor/bpftrace
    
slide 58:
    Example: bpftrace biolatency
    Disk I/O latency histograms, per second
    # biolatency.bt
    Attaching 3 probes...
    Tracing block device I/O... Hit Ctrl-C to end.
    @usecs:
    [256, 512)
    [512, 1K)
    [1K, 2K)
    [2K, 4K)
    [4K, 8K)
    [8K, 16K)
    [16K, 32K)
    [32K, 64K)
    [64K, 128K)
    [128K, 256K)
    [...]
    2 |
    10 |@
    426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    9 |@
    128 |@@@@@@@@@@@@@@@
    68 |@@@@@@@@
    0 |
    0 |
    10 |@
    
slide 59:
    Example: bpftrace biolatency
    Implemented in 
slide 60:
    Netflix Vector (old)
    
slide 61:
    Grafana at Netflix
    
slide 62:
    Takeaways
    Add BCC & bpftrace
    packages to your servers
    Start using BPF perf tools
    directly or via GUIs
    Identify 1+ engineer at your
    company to develop
    tools & GUIs
    From: BPF Performance Tools: Linux System and Application
    Observability, Brendan Gregg, Addison Wesley 2019
    
slide 63:
    Thanks & URLs
    BPF: Alexei Starovoitov, Daniel Borkmann, David S. Miller, Linus Torvalds, BPF community
    BCC: Brenden Blanco, Yonghong Song, Sasha Goldsthein, BCC community
    bpftrace: Alastair Robertson, Matheus Marchini, Dan Xu, bpftrace community
    https://github.com/iovisor/bcc
    https://github.com/iovisor/bpftrace
    https://github.com/brendangregg/bpf-perf-tools-book
    http://www.brendangregg.com/ebpf.html
    http://www.brendangregg.com/bpf-performance-tools-book.html
    All diagrams and photos (slides 11 & 22) are my own; slide 12 is from KernelRecipes: https://www.youtube.com/watch?v=bbHFg9IsTk8
    
slide 64:
    Thank you!
    Brendan Gregg
    @brendangregg
    bgregg@netflix.com
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    
slide 65:
    Please complete the session
    survey in the mobile app.
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.