Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

SCALE17x: eBPF Perf Tools 2019

Video: https://youtu.be/P2hbiWTB2w4?t=158

eBPF Performance Tools 2019, by Brendan Gregg for SCaLE17x. This talk includes a live demo of tracing Minecraft using eBPF (this demo is not in the slides).

next
prev
1/39
next
prev
2/39
next
prev
3/39
next
prev
4/39
next
prev
5/39
next
prev
6/39
next
prev
7/39
next
prev
8/39
next
prev
9/39
next
prev
10/39
next
prev
11/39
next
prev
12/39
next
prev
13/39
next
prev
14/39
next
prev
15/39
next
prev
16/39
next
prev
17/39
next
prev
18/39
next
prev
19/39
next
prev
20/39
next
prev
21/39
next
prev
22/39
next
prev
23/39
next
prev
24/39
next
prev
25/39
next
prev
26/39
next
prev
27/39
next
prev
28/39
next
prev
29/39
next
prev
30/39
next
prev
31/39
next
prev
32/39
next
prev
33/39
next
prev
34/39
next
prev
35/39
next
prev
36/39
next
prev
37/39
next
prev
38/39
next
prev
39/39

PDF: SCALE2019_eBPF_Perf_Tools.pdf

Keywords (from pdftotext):

slide 1:
    # biolatency.bt
    Attaching 3 probes...
    Tracing block device I/O... Hit Ctrl-C to end.
    eBPF Perf Tools 2019
    @usecs:
    [256, 512)
    [512, 1K)
    [1K, 2K)
    [2K, 4K)
    [4K, 8K)
    [8K, 16K)
    [16K, 32K)
    [32K, 64K)
    [64K, 128K)
    [128K, 256K)
    SCaLE
    Mar 2019
    2 |
    10 |@
    426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    9 |@
    128 |@@@@@@@@@@@@@@@
    68 |@@@@@@@@
    0 |
    0 |
    10 |@
    Brendan Gregg
    
slide 2:
    LIVE DEMO
    eBPF Minecraft Analysis
    
slide 3:
    Enhanced BPF
    Linux 4.*
    also known as just "BPF"
    User-Defined BPF Programs
    SDN Configuration
    DDoS Mitigation
    Kernel
    Runtime
    Event Targets
    verifier
    sockets
    Intrusion Detection
    Container Security
    kprobes
    BPF
    Observability
    Firewalls (bpfilter)
    Device Drivers
    uprobes
    tracepoints
    BPF
    actions
    perf_events
    
slide 4:
    eBPF bcc
    Linux 4.4+
    https://github.com/iovisor/bcc
    
slide 5:
    eBPF bpftrace (aka BPFtrace)
    Linux 4.9+
    # Files opened by process
    bpftrace -e 't:syscalls:sys_enter_open { printf("%s %s\n", comm,
    str(args->gt;filename)) }'
    # Read size distribution by process
    bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->gt;ret) }'
    # Count VFS calls
    bpftrace -e 'kprobe:vfs_* { @[func]++ }'
    # Show vfs_read latency as a histogram
    bpftrace -e 'k:vfs_read { @[tid] = nsecs }
    kr:vfs_read /@[tid]/ { @ns = hist(nsecs - @[tid]); delete(@tid) }’
    # Trace user-level function
    Bpftrace -e 'uretprobe:bash:readline { printf(“%s\n”, str(retval)) }’
    https://github.com/iovisor/bpftrace
    
slide 6:
    eBPF is solving new things: off-CPU + wakeup analysis
    
slide 7:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 8:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 9:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    entire program
    
slide 10:
    bpftrace
    bpftrace -e 'kr:vfs_read { @ = hist(retval); }'
    https://github.com/iovisor/bpftrace
    entire program
    
slide 11:
    (brutal)
    Ease of use (less brutal)
    The Tracing Landscape, Mar 2019
    (my opinion)
    (eBPF)
    (0.9)
    bpftrace
    ply/BPF
    sysdig
    (many)
    perf
    stap
    LTTng
    (hist
    recent changes
    (alpha)
    (mature)
    trigg
    e rs)
    ftrace
    Stage of
    Development
    bcc/BPF
    C/BPF
    Raw BPF
    Scope & Capability
    
slide 12:
    e.g., identify multimodal disk I/O latency and outliers
    with bcc/eBPF biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    
slide 13:
    bcc/eBPF programs can be laborious: biolatency
    # define BPF program
    bpf_text = """
    #include gt;
    #include gt;
    typedef struct disk_key {
    char disk[DISK_NAME_LEN];
    u64 slot;
    } disk_key_t;
    BPF_HASH(start, struct request *);
    STORAGE
    // time block I/O
    int trace_req_start(struct pt_regs *ctx, struct request *req)
    u64 ts = bpf_ktime_get_ns();
    start.update(&req, &ts);
    return 0;
    // output
    int trace_req_completion(struct pt_regs *ctx, struct request *req)
    u64 *tsp, delta;
    // fetch timestamp and calculate delta
    tsp = start.lookup(&req);
    if (tsp == 0) {
    return 0;
    // missed issue
    delta = bpf_ktime_get_ns() - *tsp;
    FACTOR
    // store as histogram
    STORE
    start.delete(&req);
    return 0;
    """
    # code substitutions
    if args.milliseconds:
    bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;')
    label = "msecs"
    else:
    bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;')
    label = "usecs"
    if args.disks:
    bpf_text = bpf_text.replace('STORAGE',
    'BPF_HISTOGRAM(dist, disk_key_t);')
    bpf_text = bpf_text.replace('STORE',
    'disk_key_t key = {.slot = bpf_log2l(delta)}; ' +
    'void *__tmp = (void *)req->gt;rq_disk->gt;disk_name; ' +
    'bpf_probe_read(&key.disk, sizeof(key.disk), __tmp); ' +
    'dist.increment(key);')
    else:
    bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
    bpf_text = bpf_text.replace('STORE',
    'dist.increment(bpf_log2l(delta));')
    if debug or args.ebpf:
    print(bpf_text)
    if args.ebpf:
    exit()
    # load BPF program
    b = BPF(text=bpf_text)
    if args.queued:
    b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start")
    else:
    b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start")
    b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start")
    b.attach_kprobe(event="blk_account_io_completion",
    fn_name="trace_req_completion")
    print("Tracing block device I/O... Hit Ctrl-C to end.")
    # output
    exiting = 0 if args.interval else 1
    dist = b.get_table("dist")
    while (1):
    try:
    sleep(int(args.interval))
    except KeyboardInterrupt:
    exiting = 1
    print()
    if args.timestamp:
    print("%-8s\n" % strftime("%H:%M:%S"), end="")
    dist.print_log2_hist(label, "disk")
    dist.clear()
    countdown -= 1
    if exiting or countdown == 0:
    exit()
    
slide 14:
    … rewritten in bpftrace (launched Oct 2018)!
    #!/usr/local/bin/bpftrace
    BEGIN
    printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
    kprobe:blk_account_io_start
    @start[arg0] = nsecs;
    kprobe:blk_account_io_completion
    /@start[arg0]/
    @usecs = hist((nsecs - @start[arg0]) / 1000);
    delete(@start[arg0]);
    
slide 15:
    … rewritten in bpftrace
    # biolatency.bt
    Attaching 3 probes...
    Tracing block device I/O... Hit Ctrl-C to end.
    @usecs:
    [256, 512)
    [512, 1K)
    [1K, 2K)
    [2K, 4K)
    [4K, 8K)
    [8K, 16K)
    [16K, 32K)
    [32K, 64K)
    [64K, 128K)
    [128K, 256K)
    2 |
    10 |@
    426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    9 |@
    128 |@@@@@@@@@@@@@@@
    68 |@@@@@@@@
    0 |
    0 |
    10 |@
    
slide 16:
    bcc
    canned complex tools, agents
    bpftrace
    one-liners, custom scripts
    
slide 17:
    bcc
    
slide 18:
    eBPF bcc
    Linux 4.4+
    https://github.com/iovisor/bcc
    
slide 19:
    bpftrace
    
slide 20:
    eBPF bpftrace
    Linux 4.9+
    https://github.com/iovisor/bcc
    
slide 21:
    bpftrace Development
    v0.80
    Jan-2019
    Dec 2016
    Oct 2018
    Major Features (v1)
    v0.90
    Mar?2019
    Minor Features (v1)
    v1.0
    ?2019
    Stable Docs
    API Stability
    Known Bug Fixes
    Packaging
    More Bug Fixes
    
slide 22:
    bpftrace Syntax
    bpftrace -e ‘k:do_nanosleep /pid >gt; 100/ { @[comm]++ }’
    Probe
    Filter
    (optional)
    Action
    
slide 23:
    Probes
    
slide 24:
    Probe Type Shortcuts
    tracepoint
    Kernel static tracepoints
    usdt
    User-level statically defined tracing
    kprobe
    Kernel function tracing
    kretprobe
    Kernel function returns
    uprobe
    User-level function tracing
    uretprobe
    User-level function returns
    profile
    Timed sampling across all CPUs
    interval
    Interval output
    software
    Kernel software events
    hardware
    Processor hardware events
    
slide 25:
    Filters
    /pid == 181/
    ● /comm != “sshd”/
    ● /@ts[tid]/
    
slide 26:
    Actions
    Per-event output
    printf()
    system()
    join()
    time()
    Map Summaries
    @ = count() or @++
    @ = hist()
    The following is in the https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
    
slide 27:
    Functions
    Log2 histogram
    hist(n)
    lhist(n, min, max, step) Linear hist.
    count()
    Count events
    sum(n)
    Sum value
    min(n)
    Minimum value
    printf(fmt, ...) Print formatted
    print(@x[, top[, div]]) Print map
    delete(@x)
    Delete map element
    clear(@x)
    Delete all keys/values
    reg(n)
    Register lookup
    join(a)
    Join string array
    max(n)
    Maximum value
    avg(n)
    Average value
    stats(n)
    Statistics
    time(fmt)
    Print formatted time
    str(s)
    String
    system(fmt)
    Run shell command
    sym(p)
    Resolve kernel addr
    exit()
    Quit bpftrace
    usym(p)
    Resolve user addr
    kaddr(n)
    Resolve kernel symbol
    uaddr(n)
    Resolve user symbol
    
slide 28:
    Variable Types
    Basic Variables
    @global
    @thread_local[tid]
    $scratch
    Associative Arrays
    @array[key] = value
    Buitins
    pid
    
slide 29:
    Builtin Variables
    pid
    Process ID (kernel tgid)
    arg0, arg1, ... Function arguments
    tid
    Thread ID (kernel pid)
    retval Return value
    cgroup Current Cgroup ID
    func
    Function name
    uid
    User ID
    probe
    Full name of the probe
    gid
    Group ID
    curtask Current task_struct (u64)
    nsecs
    Nanosecond timestamp
    rand
    cpu
    Processor ID
    comm
    Process name
    stack
    Kernel stack trace
    ustack User stack trace
    Random number (u32)
    
slide 30:
    biolatency (again)
    #!/usr/local/bin/bpftrace
    BEGIN
    printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
    kprobe:blk_account_io_start
    @start[arg0] = nsecs;
    kprobe:blk_account_io_completion
    /@start[arg0]/
    @usecs = hist((nsecs - @start[arg0]) / 1000);
    delete(@start[arg0]);
    
slide 31:
    bpftrace Internals
    
slide 32:
    Issues
    All major capabilities exist
    Many minor things
    https://github.com/iovisor/bpftrace/issues
    
slide 33:
    Other Tools
    
slide 34:
    Netflix Vector: BPF heat maps
    https://medium.com/netflix-techblog/extending-vector-with-ebpf-to-inspect-host-and-container-performance5da3af4c584b
    
slide 35:
    Anticipated Worldwide Audience
    BPF Tool Developers:
    – Raw BPF: gt;200
    – bpftrace: >gt;5,000
    BPF Tool Users:
    – CLI tools (of any type): >gt;20,000
    – GUIs (fronting any type): >gt;200,000
    
slide 36:
    Other Tools
    cloudflare/ebpf_exporter
    kubectl-trace
    sysdig eBPF support
    
slide 37:
    Take Aways
    Easily explore systems with bcc/bpftrace
    Contribute: see bcc/bpftrace issue list
    Share: posts, talks
    
slide 38:
    URLs
    - https://github.com/iovisor/bcc
    https://github.com/iovisor/bcc/blob/master/docs/tutorial.md
    https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
    - https://github.com/iovisor/bpftrace
    https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md
    https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
    
slide 39:
    Thanks
    bpftrace
    Alastair Robertson (creator)
    Netflix: myself so for
    Sthima: Mary Marchini, Willian Gaspar
    Facebook: Jon Haslam, Dan Xu
    Augusto Mecking Caringi, Dale Hamel, ...
    eBPF/bcc
    Facebook: Alexei Starovoitov, Teng Qin, Yonghong Song, Martin Lau, Mark
    Drayton, …
    Netflix: myself
    VMware: Brenden Blanco
    Sasha Goldsthein, Paul Chaignon, ...