SysdigSummit2016_DesigningTracingTools.pdf

sysdig CCWFS 2016: Designing Tracing Tools

Video: https://www.youtube.com/watch?v=uibLwoVKjec

Keynote by Brendan Gregg for Sysdig Camp-Con-World-Fest-Summit 2016.

Description: "You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"

	next prev 1/40
	next prev 2/40
	next prev 3/40
	next prev 4/40
	next prev 5/40
	next prev 6/40
	next prev 7/40
	next prev 8/40
	next prev 9/40
	next prev 10/40
	next prev 11/40
	next prev 12/40
	next prev 13/40
	next prev 14/40
	next prev 15/40
	next prev 16/40
	next prev 17/40
	next prev 18/40
	next prev 19/40
	next prev 20/40
	next prev 21/40
	next prev 22/40
	next prev 23/40
	next prev 24/40
	next prev 25/40
	next prev 26/40
	next prev 27/40
	next prev 28/40
	next prev 29/40
	next prev 30/40
	next prev 31/40
	next prev 32/40
	next prev 33/40
	next prev 34/40
	next prev 35/40
	next prev 36/40
	next prev 37/40
	next prev 38/40
	next prev 39/40
	next prev 40/40

PDF: SysdigSummit2016_DesigningTracingTools.pdf

Keywords (from pdftotext):

slide 1:

Designing
Tracing
Tools
Brendan Gregg, Senior Performance Architect

slide 2:

Wielding Superpowers

slide 3:

I'm currently developing more tracing tools (bcc/BPF)

slide 4:

Tool Design
• For tool developers
• For everyone else: what you can ask for
– Tool templates
– GUI visualizations
• The following is applicable to all tracers
– sysdig, bcc/BPF, DTrace, SystemTap, LTTng, …

slide 5:

Methodologies

slide 6:

Methodology-driven Design
• Methodologies provide ideas for purposeful tools
• Find/draw a functional diagram, apply methods
Operating Systems
See: http://www.brendangregg.com/methodology.html

slide 7:

Methodology Examples
Eg, at the syscall layer (well known & documented):
• Workload Characterization
– exec() or open() per-event trace
(execsnoop, opensnoop)
– connect()/accept() per-event trace (tcpconnect, tcpaccept)
– read()/write() size histogram
(one-liners)
• Latency Analysis
– read()/write() latency histogram
(funclatency, …)
• USE Method
– network utilization by thread
(not done yet)
– syscall errors
(fserrors, soerrors)

slide 8:

CLI Tracing Tools

slide 9:

CLI Templates
1. Per event output
*snoop, *slower 0, …
2. Filtered event output
*slower
3. Interval summary
*stat, *top
4. Count summary
*count
5. Histogram summary
*dist, *latency
6. Heatmap summary
spectrogram.lua, subsecoffset.lua, …

slide 10:

Template 1: Per Event Output
Examples: *snoop, *slower 0, …
# opensnoop
PID
COMM
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
10085 sshd
[…]
FD ERR PATH
0 /lib/x86_64-linux-gnu/libkeyutils.so.1
0 /lib/x86_64-linux-gnu/libresolv.so.2
0 /lib/x86_64-linux-gnu/libgpg-error.so.0
0 /dev/urandom
2 /lib/x86_64-linux-gnu/.libcrypto.so.1.0.0.hmac
2 /proc/sys/crypto/fips_enabled
0 /proc/filesystems
0 /dev/null
0 /proc/10085/fd
0 /usr/lib/ssl/openssl.cnf
0 /etc/gai.conf
0 /etc/nsswitch.conf
0 /etc/ld.so.cache
0 /lib/x86_64-linux-gnu/libnss_compat.so.2
0 /etc/ld.so.cache
0 /lib/x86_64-linux-gnu/libnss_nis.so.2

slide 11:

Template 2: Filtered Event Output
Examples: *slower
# sysdig -c fileslower 1
TIME
PROCESS
2014-04-13 20:40:43.973 cksum
2014-04-13 20:40:44.187 cksum
2014-04-13 20:40:44.689 cksum
2014-04-13 20:40:45.005 cksum
2014-04-13 20:40:45.193 cksum
[…]
TYPE
read
read
read
read
read
LAT(ms) FILE
2 /mnt/partial.0.0
1 /mnt/partial.0.0
2 /mnt/partial.0.0
2 /mnt/partial.0.0
1 /mnt/partial.0.0
Tools like this can also do all event output:
# sysdig -c fileslower 0
TIME
PROCESS
2014-04-13 20:59:04.414 ls
2014-04-13 20:59:04.414 ls
2014-04-13 20:59:04.414 ls
2014-04-13 20:59:04.414 ls
2014-04-13 20:59:04.414 ls
2014-04-13 20:59:04.415 ls
2014-04-13 20:59:04.415 ls
[...]
TYPE
read
read
read
read
read
read
read
LAT(ms) FILE
0 /lib/x86_64-linux-gnu/librt.so.1
0 /lib/x86_64-linux-gnu/libacl.so.1
0 /lib/x86_64-linux-gnu/libc.so.6
0 /lib/x86_64-linux-gnu/libdl.so.2
0 /lib/x86_64-linux-gnu/libattr.so.1
0 /proc/filesystems
0 /proc/filesystems

slide 12:

Template 3: Interval Summary
Examples: *stat, *top
# dcstat
TIME
08:11:47:
08:11:48:
08:11:49:
08:11:50:
08:11:51:
08:11:52:
08:11:53:
08:11:54:
08:11:55:
08:11:56:
08:11:57:
[…]
REFS/s
SLOW/s
MISS/s
HIT%

slide 13:

Template 4: Count Summary
Examples: *count
# funccount 'vfs_*'
Tracing... Ctrl-C to end.
ADDR
FUNC
ffffffff811efe81 vfs_create
ffffffff811f24a1 vfs_rename
ffffffff81215191 vfs_fsync_range
ffffffff81231df1 vfs_lock_file
ffffffff811e8dd1 vfs_fstatat
ffffffff811e8d71 vfs_fstat
ffffffff811e4381 vfs_write
ffffffff811e8c71 vfs_getattr_nosec
ffffffff811e8d41 vfs_getattr
ffffffff811e3221 vfs_open
ffffffff811e4251 vfs_read
Detaching...
COUNT

slide 14:

Template 5: Histogram Summary
Examples: *dist, *latency
# biolatency
Tracing block device I/O... Hit Ctrl-C to end.
usecs
: count
distribution
4 ->gt; 7
: 0
8 ->gt; 15
: 0
16 ->gt; 31
: 0
32 ->gt; 63
: 0
64 ->gt; 127
: 1
128 ->gt; 255
: 12
|********
256 ->gt; 511
: 15
|**********
512 ->gt; 1023
: 43
|*******************************
1024 ->gt; 2047
: 52
|**************************************|
2048 ->gt; 4095
: 47
|**********************************
4096 ->gt; 8191
: 52
|**************************************|
8192 ->gt; 16383
: 36
|**************************
16384 ->gt; 32767
: 15
|**********
32768 ->gt; 65535
: 2
65536 ->gt; 131071
: 2

slide 15:

Template 6: Heatmap Summary
Example: subsecoffset.lua (aka "spectrogram")

slide 16:

slide 17:

Valuable
Know what already exists, and what doesn't

slide 18:

Low Overhead (or documented)
sysdig
Kernel
sysdig
driver
1. enable
3. output
lua
program
2. async
read
syscalls
ring
buffer
• Understand tracing internals
– For example, sysdig's design has ~20x lower overhead than strace
(it still has overhead: test and measure to see if it's acceptable)
– Tracing overhead is usually relative to event rate
• Design for low overhead, and document expectations

slide 19:

Documentation
• Good tools have 3 docs:
.TH
Title heading
Code comments
.SH
Section heading
Man page
.IP
Indented paragraph
Examples file
.TP
Indented paragraph with label
Bold
• Man page
– troff, docbook, …
• Examples file:
common man macros (see groff_man(7))
Demonstrations of biosnoop, the Linux eBPF/bcc version.
biosnoop traces block device I/O (disk I/O), and prints a line of output
per I/O. Example:
# ./biosnoop
TIME(s)
[...]
COMM
supervise
PID
DISK
xvda1
SECTOR
BYTES
LAT(ms)

slide 20:

Concise, intuitive, self-explanatory
# ./iolatency
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
>gt;=(ms) .. gt; 1
1 ->gt; 2
2 ->gt; 4
4 ->gt; 8
8 ->gt; 16
[…]
: I/O
: 4381
: 9
: 5
: 0
: 1
|Distribution
|######################################|
• Useful startup message
– What I'm tracing, when there's output, when I'll end
• Vigorous tooling is concise
– No wasted text; leave less useful output for non-default options
• Unix philosophy: do one thing and do it well

slide 21:

POSIX-style Arguments
# ./biolatency -h
usage: biolatency [-h] [-T] [-Q] [-m] [-D] [interval] [count]
Summarize block device I/O latency as a histogram
positional arguments:
interval
output interval, in seconds
count
number of outputs
optional arguments:
-h, --help
-T, --timestamp
-Q, --queued
-m, --milliseconds
-D, --disks
examples:
./biolatency
./biolatency 1 10
./biolatency -mT 1
./biolatency -Q
./biolatency -D
show this help message and exit
include timestamp on output
include OS queued time in I/O time
millisecond histogram
print a histogram per disk device
# summarize block I/O latency as a histogram
# print 1 second summaries, 10 times
# 1s summaries, milliseconds, and timestamps
# include OS queued time in I/O time
# show each disk device separately

slide 22:

POSIX-style Arguments
Option
Alternate
Expectation
--all
all events
-c CMD
--cmd …
run this command
-d SECONDS
--duration …
duration of tool execution
--help
help
-i FILE
--input …
input file
-i SECONDS
--interval …
summary interval
-n name
--name …
this process name only
-o FILE
--output …
output file
-p PID
--pid …
this process ID only
--by-process
per-process ID breakdown
-P PORT
--port …
this TCP port only
-t or -T
--[no]timestamp
include or exclude timestamps
--verbose
verbose output
--extended, --errors
extended output, or only failures
[interval [count]]
summary interval, and # of outputs

slide 23:

Testing, Testing, Testing
• If you can't write the workload, you can't write the tool
– Be it 10 lines of C, some shell, or dd
– dd if=/dev/urandom of=/dev/null bs=1k count=23
• Known workload testing: create 23 events
• Testing can be time consuming
– I spend more time testing a tool than writing it
– Automatic tool testing is a difficult problem

slide 24:

Example: gethostlatency
# gethostlatency
TIME
PID
COMM
06:10:24 28011 wget
06:10:28 28127 wget
06:10:41 28404 wget
06:10:48 28544 curl
06:11:10 29054 curl
06:11:16 29195 curl
06:11:25 29404 curl
06:11:28 29475 curl
LATms HOST
90.00 www.iovisor.org
0.00 www.iovisor.org
9.00 www.netflix.com
35.00 www.netflix.com.au
31.00 www.plumgrid.com
3.00 www.facebook.com
72.00 foo
1.00 foo

slide 25:

Example: ext4slower
# ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME
COMM
PID
T BYTES
OFF_KB
06:49:17 bash
R 128
06:49:17 cksum
R 39552
06:49:17 cksum
R 96
06:49:17 cksum
R 96
06:49:17 cksum
R 10320
06:49:17 cksum
R 65536
06:49:17 cksum
R 55400
06:49:17 cksum
R 36792
06:49:17 cksum
R 15008
06:49:17 cksum
R 6123
repository
06:49:17 cksum
R 6280
06:49:17 cksum
R 27696
06:49:17 cksum
R 58080
06:49:17 cksum
R 906
[…]
LAT(ms) FILENAME
7.75 cksum
1.34 [
5.36 2to3-2.7
14.94 2to3-3.4
6.82 411toppm
4.01 a2p
8.77 ab
16.34 aclocal-1.14
19.31 acpi_listen
17.23 add-apt18.40 addpart
2.16 addr2line
10.11 ag
6.30 ec2-meta-data

slide 26:

Example: biolatency
# biolatency -m 1 5
Tracing block device I/O... Hit Ctrl-C to end.
msecs
0 ->gt; 1
2 ->gt; 3
4 ->gt; 7
8 ->gt; 15
16 ->gt; 31
32 ->gt; 63
64 ->gt; 127
[…]
: count
: 36
: 1
: 3
: 17
: 33
: 7
: 6
distribution
|**************************************|
|***
|*****************
|**********************************
|*******
|******

slide 27:

GUI Tracing Tools

slide 28:

GUI Visualizations
1. Event logs
2. Tables
3. Line graphs
4. Histograms
5. Heatmaps (spectrographs)
6. Waterfall charts
7. Directed graphs
8. Flame graphs
9. Flame charts

slide 29:

Visualization 1: Event Logs
https://commons.wikimedia.org/wiki/File:Wireshark_screenshot.png

slide 30:

Visualization 2: Tables

slide 31:

Visualization 3: Line Graphs
http://www.paradyn.org/html/screen-shots.html

slide 32:

Visualization 4: Histograms
Or a density plot
Or as a frequency trail (can cascade)

slide 33:

Visualization 5: Heat Maps
eg, Oracle ZFS Storage Appliance Analytics (DTrace-based)

slide 34:

Visualization 5: Spectrograms

slide 35:

Visualization 6: Waterfall Charts

slide 36:

Visualization 7: Directed Graphs

slide 37:

Visualization 8: Flame Graphs
fstat from disk
directory read
from disk
file read
from disk
path read from disk
pipe write
Commonly used with CPU profilers. Also useful for tracers: off-CPU time, ...

slide 38:

Visualization 9: Flame Charts

slide 39:

Desirable Attributes
• Valuable
– Methodologies provide ideas for purposeful metrics
• Documented
– Tool tips, wikis
• Tested
• Real Time
• Dashboards
– To support methodologies

slide 40:

Thank You!
http://www.brendangregg.com
http://slideshare.net/brendangregg
bgregg@netflix.com
@brendangregg
References & Links:
– http://www.brendangregg.com/heatmaps.html
– http://www.brendangregg.com/flamegraphs.html
– http://www.slideshare.net/brendangregg/monitorama-2015-netflix-instance-analysis