NetConf 2018: BPF Observability
Talk/discussion for NetConf 2018 on BPF Observability by Brendan Gregg.NetConf 2018: http://vger.kernel.org/netconf2018.html
next prev 1/25 | |
next prev 2/25 | |
next prev 3/25 | |
next prev 4/25 | |
next prev 5/25 | |
next prev 6/25 | |
next prev 7/25 | |
next prev 8/25 | |
next prev 9/25 | |
next prev 10/25 | |
next prev 11/25 | |
next prev 12/25 | |
next prev 13/25 | |
next prev 14/25 | |
next prev 15/25 | |
next prev 16/25 | |
next prev 17/25 | |
next prev 18/25 | |
next prev 19/25 | |
next prev 20/25 | |
next prev 21/25 | |
next prev 22/25 | |
next prev 23/25 | |
next prev 24/25 | |
next prev 25/25 |
PDF: NetConf_BPF_observability.pdf
Keywords (from pdftotext):
slide 1:
BPF Observability NetConf 2018 Brendan Gregg May 2018slide 2:
Ne#lix & BPF Performance Engineering: observability Network Engineering: flow tracing, XDP Security Team: intrusion detec?on, whitelist genera?on Container Team: networking, securityslide 3:
eBPF bcc hBps://github.com/iovisor/bccslide 4:
Linux Events & BPF Support BPF output Linux 4.4 Linux 4.7 BPF stacks Linux 4.6 Linux 4.3 Linux 4.1 (version BPF support arrived) Linux 4.9 Linux 4.9slide 5:
Basics like disk I/O latency histograms # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |******************slide 6:
… over Nme as a latency heat mapslide 7:
Basics like funcNon counts # ./funccount.py 'tcp*' Tracing 334 functions for "tcp*"... Hit Ctrl-C to end. FUNC COUNT tcp_init_cwnd tcp_fastopen_active_disable_ofo_check tcp_finish_connect tcp_write_queue_purge [...] tcp_gro_receive tcp4_gro_receive tcp_chrono_stop tcp_release_cb tcp_schedule_loss_probe tcp_established_options tcp_v4_md5_lookup tcp_md5_do_lookup tcp_poll tcp_stream_memory_free Detaching...slide 8:
… I know # echo 'tcp*' >gt; /sys/kernel/debug/tracing/set_ftrace_filter # echo 1 >gt; /sys/kernel/debug/tracing/function_profile_enabled # cat /sys/kernel/debug/tracing/trace_stat/function0 Function Hit Time ------------tcp_v4_rcv 2218.794 us tcp_v4_do_rcv 1557.531 us tcp_sendmsg 1357.115 us tcp_rcv_established 1309.926 us tcp_sendmsg_locked 1273.319 us tcp_push 845.834 us tcp_write_xmit 808.439 us tcp_ack 603.849 us tcp_transmit_skb 513.866 us tcp_clean_rtx_queue 298.300 us tcp_recvmsg 295.443 us tcp4_gro_receive 194.116 us tcp_v4_inbound_md5_hash 188.469 us tcp_data_queue 186.623 us tcp_send_mss 158.441 us tcp_established_options 142.160 us [...] Avg --8.667 us 6.436 us 15.599 us 5.597 us 14.635 us 9.722 us 9.186 us 3.317 us 4.588 us 2.043 us 2.710 us 0.732 us 0.775 us 3.732 us 1.821 us 0.721 us s^2 --21.572 us 14.847 us 30.813 us 7.151 us 27.509 us 14.658 us 13.235 us 3.252 us 4.871 us 0.480 us 1.768 us 0.147 us 0.065 us 13.787 us 0.568 us 0.046 usslide 9:
… I know... # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active] […] { id: sys_rt_sigprocmask { id: sys_futex { id: sys_write { id: sys_setitimer { id: sys_read { id: sys_select { id: sys_writev { id: sys_poll { id: sys_recvmsg { id: sys_ioctl Totals: Hits: 67612 Entries: 72 Dropped: 0 [ 14] } hitcount: [202] } hitcount: [ 1] } hitcount: [ 38] } hitcount: [ 0] } hitcount: [ 23] } hitcount: [ 20] } hitcount: [ 7] } hitcount: [ 47] } hitcount: [ 16] } hitcount:slide 10:
More advanced counts! # tcptop Tracing... Output every 1 secs. Hit Ctrl-C to endslide 11:gt; 19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681 PID COMM sshd sshd sshd LADDR 100.66.3.172:22 100.66.3.172:22 100.66.3.172:22 100.66.3.172:22 RADDR 100.127.69.165:6684 100.127.69.165:6684 100.127.69.165:25219 100.127.69.165:7165 RX_KB TX_KB
TCP session logging, via tcp_set_state() sock:inet_sock_set_state # ./tcplife PID COMM LADDR 22597 recordProg 127.0.0.1 3277 redis-serv 127.0.0.1 22598 curl 22604 curl 22624 recordProg 127.0.0.1 3277 redis-serv 127.0.0.1 22647 recordProg 127.0.0.1 3277 redis-serv 127.0.0.1 […] LPORT RADDR 46644 127.0.0.1 28527 127.0.0.1 61620 52.205.89.26 44400 52.204.43.121 46648 127.0.0.1 28527 127.0.0.1 46650 127.0.0.1 28527 127.0.0.1 RPORT TX_KB RX_KB MS 0 0.23 0 0.28 1 91.79 1 121.38 0 0.22 0 0.27 0 0.21 0 0.26slide 12:
Kernel drop tracing via tcp_drop() … (wriUen yesterday in the YOTEL Boston) # ./tcpdrop.py TIME PID IP SADDR:SPORT >gt; DADDR:DPORT 20:49:06 0 4 10.32.119.56:443 >gt; 10.66.65.252:22912 tcp_drop+0x1 tcp_v4_do_rcv+0x135 tcp_v4_rcv+0x9c7 ip_local_deliver_finish+0x62 ip_local_deliver+0x6f ip_rcv_finish+0x129 ip_rcv+0x28f __netif_receive_skb_core+0x432 __netif_receive_skb+0x18 netif_receive_skb_internal+0x37 napi_gro_receive+0xc5 ena_clean_rx_irq+0x3c3 ena_io_poll+0x33f net_rx_action+0x140 __softirqentry_text_start+0xdf irq_exit+0xb6 do_IRQ+0x82 […] STATE (FLAGS) CLOSE (ACK)slide 13:
tcp_drop Was: discard: __kfree_skb(skb); Now: discard: tcp_drop(sk, skb); Future?: discard: tcp_drop(sk, skb, reason);slide 14:
tcp_stats (future?) Now: __NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS); __TCP_INC_STATS(net, TCP_MIB_CSUMERRORS); Future?: tcp_stats(sk, LINUX_MIB_LISTENDROPS); […] • for custom per-session statsslide 15:
User-level instrumentaNon as well, eg, DNS lookups: # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 16:
Advanced things, like Off-CPU Nme flame graphsslide 17:
Even more advanced: off-wake flame graphs Waker task Waker stack Stack Direc?on Wokeup Blocked stack Blocked taskslide 18:
Chain graphs: merge all wakeup stacksslide 19:
More networking tools todo TCP RTT histograms by host TCP buffer analysis TCP/IP inter-stack latencyslide 20:
A Linux Tracing Timeline 1990’s: Sta?c tracers, prototype dynamic tracers 2000: LTT + DProbes (dynamic tracing; not integrated) 2004: kprobes (2.6.9) 2005: DTrace (not Linux), SystemTap (out-of-tree) 2008: drace (2.6.27) 2009: perf_events (2.6.31) 2009: tracepoints (2.6.32) 2010-2016: drace & perf_events enhancements 2012: uprobes (3.5) 2014-2018: enhanced BPF patches: supporNng tracing events 2016-2018: drace hist triggers 2018: ply or bparace? also: LTTng, ktap, sysdig, ...slide 21:
The Tracing Landscape, May 2018 Ease of use (less brutal) (my opinion) (eBPF) sysdig (many) perf stap LTTng recent changes (alpha) (brutal) bpdrace ply/BPF (mature) (h i s t t rigge rs) drace bcc/BPF C/BPF Stage of Development Raw BPF Scope & Capabilityslide 22:
ply hBps://github.com/iovisor/ply # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); } kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]); @start[tid()] = nil; }' 2 probes active ^Cde-activating probes [...] @ns: [ 512, 1k) [ 1k, 2k) [ 2k, 4k) [ 4k, 8k) [ 8k, 16k) [ 16k, 32k) [ 32k, 64k) [ 64k, 128k) [128k, 256k) [256k, 512k) [512k, 1M) [...] 3 |######## 7 |################### 12 |################################| 3 |######## 2 |##### 0 | 0 | 3 |######## 1 |### 1 |### 2 |#####slide 23:
bpdrace hBps://github.com/ajor/bpdrace # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/ { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }' Attaching 2 probes... @ns: [0, 1] [2, 4) [4, 8) [8, 16) [16, 32) [32, 64) [64, 128) [128, 256) [256, 512) [512, 1k) [1k, 2k) [2k, 4k) [4k, 8k) [8k, 16k) [16k, 32k) [32k, 64k) 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |@@@@@ 20 |@@@@@@@@@@@@@@@@@@@ 4 |@@@ 14 |@@@@@@@@@@@@@ 53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 2 |@slide 24:
BTF • BPF Type Format • Adding in Linux 4.18 (coming soon) • Last bit needed for a bpdrace/ply high-level language that can walk structs like Kernel Cslide 25:
Resources hBps://github.com/iovisor/bcc hBp://www.brendangregg.com/ebpf.html hBp://www.brendangregg.com/blog/ hBps://github.com/ajor/bpdrace hBps://github.com/iovisor/ply