Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

NetConf 2018: BPF Observability

Talk/discussion for NetConf 2018 on BPF Observability by Brendan Gregg.

NetConf 2018: http://vger.kernel.org/netconf2018.html

next
prev
1/25
next
prev
2/25
next
prev
3/25
next
prev
4/25
next
prev
5/25
next
prev
6/25
next
prev
7/25
next
prev
8/25
next
prev
9/25
next
prev
10/25
next
prev
11/25
next
prev
12/25
next
prev
13/25
next
prev
14/25
next
prev
15/25
next
prev
16/25
next
prev
17/25
next
prev
18/25
next
prev
19/25
next
prev
20/25
next
prev
21/25
next
prev
22/25
next
prev
23/25
next
prev
24/25
next
prev
25/25

PDF: NetConf_BPF_observability.pdf

Keywords (from pdftotext):

slide 1:
    BPF Observability
    NetConf 2018
    Brendan Gregg
    May 2018
    
slide 2:
    Ne#lix & BPF
    Performance Engineering: observability
    Network Engineering: flow tracing, XDP
    Security Team: intrusion detec?on, whitelist genera?on
    Container Team: networking, security
    
slide 3:
    eBPF
    bcc
    hBps://github.com/iovisor/bcc
    
slide 4:
    Linux Events & BPF Support
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    BPF
    support
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 5:
    Basics like disk I/O latency histograms
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    
slide 6:
    … over Nme as a latency heat map
    
slide 7:
    Basics like funcNon counts
    # ./funccount.py 'tcp*'
    Tracing 334 functions for "tcp*"... Hit Ctrl-C to end.
    FUNC
    COUNT
    tcp_init_cwnd
    tcp_fastopen_active_disable_ofo_check
    tcp_finish_connect
    tcp_write_queue_purge
    [...]
    tcp_gro_receive
    tcp4_gro_receive
    tcp_chrono_stop
    tcp_release_cb
    tcp_schedule_loss_probe
    tcp_established_options
    tcp_v4_md5_lookup
    tcp_md5_do_lookup
    tcp_poll
    tcp_stream_memory_free
    Detaching...
    
slide 8:
    … I know
    # echo 'tcp*' >gt; /sys/kernel/debug/tracing/set_ftrace_filter
    # echo 1 >gt; /sys/kernel/debug/tracing/function_profile_enabled
    # cat /sys/kernel/debug/tracing/trace_stat/function0
    Function
    Hit
    Time
    ------------tcp_v4_rcv
    2218.794 us
    tcp_v4_do_rcv
    1557.531 us
    tcp_sendmsg
    1357.115 us
    tcp_rcv_established
    1309.926 us
    tcp_sendmsg_locked
    1273.319 us
    tcp_push
    845.834 us
    tcp_write_xmit
    808.439 us
    tcp_ack
    603.849 us
    tcp_transmit_skb
    513.866 us
    tcp_clean_rtx_queue
    298.300 us
    tcp_recvmsg
    295.443 us
    tcp4_gro_receive
    194.116 us
    tcp_v4_inbound_md5_hash
    188.469 us
    tcp_data_queue
    186.623 us
    tcp_send_mss
    158.441 us
    tcp_established_options
    142.160 us
    [...]
    Avg
    --8.667 us
    6.436 us
    15.599 us
    5.597 us
    14.635 us
    9.722 us
    9.186 us
    3.317 us
    4.588 us
    2.043 us
    2.710 us
    0.732 us
    0.775 us
    3.732 us
    1.821 us
    0.721 us
    s^2
    --21.572 us
    14.847 us
    30.813 us
    7.151 us
    27.509 us
    14.658 us
    13.235 us
    3.252 us
    4.871 us
    0.480 us
    1.768 us
    0.147 us
    0.065 us
    13.787 us
    0.568 us
    0.046 us
    
slide 9:
    … I know...
    # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
    # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048
    [active]
    […]
    { id: sys_rt_sigprocmask
    { id: sys_futex
    { id: sys_write
    { id: sys_setitimer
    { id: sys_read
    { id: sys_select
    { id: sys_writev
    { id: sys_poll
    { id: sys_recvmsg
    { id: sys_ioctl
    Totals:
    Hits: 67612
    Entries: 72
    Dropped: 0
    [ 14] } hitcount:
    [202] } hitcount:
    [ 1] } hitcount:
    [ 38] } hitcount:
    [ 0] } hitcount:
    [ 23] } hitcount:
    [ 20] } hitcount:
    [ 7] } hitcount:
    [ 47] } hitcount:
    [ 16] } hitcount:
    
slide 10:
    More advanced counts!
    # tcptop
    Tracing... Output every 1 secs. Hit Ctrl-C to end
    gt;
    19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681
    PID
    COMM
    sshd
    sshd
    sshd
    LADDR
    100.66.3.172:22
    100.66.3.172:22
    100.66.3.172:22
    100.66.3.172:22
    RADDR
    100.127.69.165:6684
    100.127.69.165:6684
    100.127.69.165:25219
    100.127.69.165:7165
    RX_KB
    TX_KB
    
slide 11:
    TCP session logging, via tcp_set_state() sock:inet_sock_set_state
    # ./tcplife
    PID
    COMM
    LADDR
    22597 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    22598 curl
    22604 curl
    22624 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    22647 recordProg 127.0.0.1
    3277 redis-serv 127.0.0.1
    […]
    LPORT RADDR
    46644 127.0.0.1
    28527 127.0.0.1
    61620 52.205.89.26
    44400 52.204.43.121
    46648 127.0.0.1
    28527 127.0.0.1
    46650 127.0.0.1
    28527 127.0.0.1
    RPORT TX_KB RX_KB MS
    0 0.23
    0 0.28
    1 91.79
    1 121.38
    0 0.22
    0 0.27
    0 0.21
    0 0.26
    
slide 12:
    Kernel drop tracing via tcp_drop() … (wriUen yesterday in the YOTEL Boston)
    # ./tcpdrop.py
    TIME
    PID
    IP SADDR:SPORT
    >gt; DADDR:DPORT
    20:49:06 0
    4 10.32.119.56:443
    >gt; 10.66.65.252:22912
    tcp_drop+0x1
    tcp_v4_do_rcv+0x135
    tcp_v4_rcv+0x9c7
    ip_local_deliver_finish+0x62
    ip_local_deliver+0x6f
    ip_rcv_finish+0x129
    ip_rcv+0x28f
    __netif_receive_skb_core+0x432
    __netif_receive_skb+0x18
    netif_receive_skb_internal+0x37
    napi_gro_receive+0xc5
    ena_clean_rx_irq+0x3c3
    ena_io_poll+0x33f
    net_rx_action+0x140
    __softirqentry_text_start+0xdf
    irq_exit+0xb6
    do_IRQ+0x82
    […]
    STATE (FLAGS)
    CLOSE (ACK)
    
slide 13:
    tcp_drop
    Was:
    discard:
    __kfree_skb(skb);
    Now:
    discard:
    tcp_drop(sk, skb);
    Future?:
    discard:
    tcp_drop(sk, skb, reason);
    
slide 14:
    tcp_stats (future?)
    Now:
    __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS);
    __TCP_INC_STATS(net, TCP_MIB_CSUMERRORS);
    Future?:
    tcp_stats(sk, LINUX_MIB_LISTENDROPS);
    […]
    • for custom per-session stats
    
slide 15:
    User-level instrumentaNon as well, eg, DNS lookups:
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 16:
    Advanced things, like Off-CPU Nme flame graphs
    
slide 17:
    Even more advanced: off-wake flame graphs
    Waker task
    Waker stack
    Stack
    Direc?on
    Wokeup
    Blocked stack
    Blocked task
    
slide 18:
    Chain graphs: merge all wakeup stacks
    
slide 19:
    More networking tools todo
    TCP RTT histograms by host
    TCP buffer analysis
    TCP/IP inter-stack latency
    
slide 20:
    A Linux Tracing Timeline
    1990’s: Sta?c tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: drace (2.6.27)
    2009: perf_events (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2016: drace & perf_events enhancements
    2012: uprobes (3.5)
    2014-2018: enhanced BPF patches: supporNng tracing events
    2016-2018: drace hist triggers
    2018: ply or bparace?
    also: LTTng, ktap, sysdig, ...
    
slide 21:
    The Tracing Landscape, May 2018
    Ease of use
    (less brutal)
    (my opinion)
    (eBPF)
    sysdig
    (many)
    perf
    stap
    LTTng
    recent changes
    (alpha)
    (brutal)
    bpdrace
    ply/BPF
    (mature)
    (h i s t t
    rigge
    rs)
    drace
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 22:
    ply
    hBps://github.com/iovisor/ply
    # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    2 probes active
    ^Cde-activating probes
    [...]
    @ns:
    [ 512,
    1k)
    [ 1k,
    2k)
    [ 2k,
    4k)
    [ 4k,
    8k)
    [ 8k, 16k)
    [ 16k, 32k)
    [ 32k, 64k)
    [ 64k, 128k)
    [128k, 256k)
    [256k, 512k)
    [512k,
    1M)
    [...]
    3 |########
    7 |###################
    12 |################################|
    3 |########
    2 |#####
    0 |
    0 |
    3 |########
    1 |###
    1 |###
    2 |#####
    
slide 23:
    bpdrace
    hBps://github.com/ajor/bpdrace
    # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/
    { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }'
    Attaching 2 probes...
    @ns:
    [0, 1]
    [2, 4)
    [4, 8)
    [8, 16)
    [16, 32)
    [32, 64)
    [64, 128)
    [128, 256)
    [256, 512)
    [512, 1k)
    [1k, 2k)
    [2k, 4k)
    [4k, 8k)
    [8k, 16k)
    [16k, 32k)
    [32k, 64k)
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    6 |@@@@@
    20 |@@@@@@@@@@@@@@@@@@@
    4 |@@@
    14 |@@@@@@@@@@@@@
    53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    2 |@
    
slide 24:
    BTF
    • BPF Type Format
    • Adding in Linux 4.18 (coming soon)
    • Last bit needed for a bpdrace/ply high-level language that can
    walk structs like Kernel C
    
slide 25:
    Resources
    hBps://github.com/iovisor/bcc
    hBp://www.brendangregg.com/ebpf.html
    hBp://www.brendangregg.com/blog/
    hBps://github.com/ajor/bpdrace
    hBps://github.com/iovisor/ply