Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

sysdig CCWFS 2016: Designing Tracing Tools

Video: https://www.youtube.com/watch?v=uibLwoVKjec

Keynote by Brendan Gregg for Sysdig Camp-Con-World-Fest-Summit 2016.

Description: "You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"

next
prev
1/40
next
prev
2/40
next
prev
3/40
next
prev
4/40
next
prev
5/40
next
prev
6/40
next
prev
7/40
next
prev
8/40
next
prev
9/40
next
prev
10/40
next
prev
11/40
next
prev
12/40
next
prev
13/40
next
prev
14/40
next
prev
15/40
next
prev
16/40
next
prev
17/40
next
prev
18/40
next
prev
19/40
next
prev
20/40
next
prev
21/40
next
prev
22/40
next
prev
23/40
next
prev
24/40
next
prev
25/40
next
prev
26/40
next
prev
27/40
next
prev
28/40
next
prev
29/40
next
prev
30/40
next
prev
31/40
next
prev
32/40
next
prev
33/40
next
prev
34/40
next
prev
35/40
next
prev
36/40
next
prev
37/40
next
prev
38/40
next
prev
39/40
next
prev
40/40

PDF: SysdigSummit2016_DesigningTracingTools.pdf

Keywords (from pdftotext):

slide 1:
    Designing
    Tracing
    Tools
    Brendan Gregg, Senior Performance Architect
    
slide 2:
    Wielding Superpowers
    
slide 3:
    I'm currently developing more tracing tools (bcc/BPF)
    
slide 4:
    Tool Design
    • For tool developers
    • For everyone else: what you can ask for
    – Tool templates
    – GUI visualizations
    • The following is applicable to all tracers
    – sysdig, bcc/BPF, DTrace, SystemTap, LTTng, …
    
slide 5:
    Methodologies
    
slide 6:
    Methodology-driven Design
    • Methodologies provide ideas for purposeful tools
    • Find/draw a functional diagram, apply methods
    Operating Systems
    See: http://www.brendangregg.com/methodology.html
    
slide 7:
    Methodology Examples
    Eg, at the syscall layer (well known & documented):
    • Workload Characterization
    – exec() or open() per-event trace
    (execsnoop, opensnoop)
    – connect()/accept() per-event trace (tcpconnect, tcpaccept)
    – read()/write() size histogram
    (one-liners)
    • Latency Analysis
    – read()/write() latency histogram
    (funclatency, …)
    • USE Method
    – network utilization by thread
    (not done yet)
    – syscall errors
    (fserrors, soerrors)
    
slide 8:
    CLI Tracing Tools
    
slide 9:
    CLI Templates
    1. Per event output
    *snoop, *slower 0, …
    2. Filtered event output
    *slower
    3. Interval summary
    *stat, *top
    4. Count summary
    *count
    5. Histogram summary
    *dist, *latency
    6. Heatmap summary
    spectrogram.lua, subsecoffset.lua, …
    
slide 10:
    Template 1: Per Event Output
    Examples: *snoop, *slower 0, …
    # opensnoop
    PID
    COMM
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    […]
    FD ERR PATH
    0 /lib/x86_64-linux-gnu/libkeyutils.so.1
    0 /lib/x86_64-linux-gnu/libresolv.so.2
    0 /lib/x86_64-linux-gnu/libgpg-error.so.0
    0 /dev/urandom
    2 /lib/x86_64-linux-gnu/.libcrypto.so.1.0.0.hmac
    2 /proc/sys/crypto/fips_enabled
    0 /proc/filesystems
    0 /dev/null
    0 /proc/10085/fd
    0 /usr/lib/ssl/openssl.cnf
    0 /etc/gai.conf
    0 /etc/nsswitch.conf
    0 /etc/ld.so.cache
    0 /lib/x86_64-linux-gnu/libnss_compat.so.2
    0 /etc/ld.so.cache
    0 /lib/x86_64-linux-gnu/libnss_nis.so.2
    
slide 11:
    Template 2: Filtered Event Output
    Examples: *slower
    # sysdig -c fileslower 1
    TIME
    PROCESS
    2014-04-13 20:40:43.973 cksum
    2014-04-13 20:40:44.187 cksum
    2014-04-13 20:40:44.689 cksum
    2014-04-13 20:40:45.005 cksum
    2014-04-13 20:40:45.193 cksum
    […]
    TYPE
    read
    read
    read
    read
    read
    LAT(ms) FILE
    2 /mnt/partial.0.0
    1 /mnt/partial.0.0
    2 /mnt/partial.0.0
    2 /mnt/partial.0.0
    1 /mnt/partial.0.0
    Tools like this can also do all event output:
    # sysdig -c fileslower 0
    TIME
    PROCESS
    2014-04-13 20:59:04.414 ls
    2014-04-13 20:59:04.414 ls
    2014-04-13 20:59:04.414 ls
    2014-04-13 20:59:04.414 ls
    2014-04-13 20:59:04.414 ls
    2014-04-13 20:59:04.415 ls
    2014-04-13 20:59:04.415 ls
    [...]
    TYPE
    read
    read
    read
    read
    read
    read
    read
    LAT(ms) FILE
    0 /lib/x86_64-linux-gnu/librt.so.1
    0 /lib/x86_64-linux-gnu/libacl.so.1
    0 /lib/x86_64-linux-gnu/libc.so.6
    0 /lib/x86_64-linux-gnu/libdl.so.2
    0 /lib/x86_64-linux-gnu/libattr.so.1
    0 /proc/filesystems
    0 /proc/filesystems
    
slide 12:
    Template 3: Interval Summary
    Examples: *stat, *top
    # dcstat
    TIME
    08:11:47:
    08:11:48:
    08:11:49:
    08:11:50:
    08:11:51:
    08:11:52:
    08:11:53:
    08:11:54:
    08:11:55:
    08:11:56:
    08:11:57:
    […]
    REFS/s
    SLOW/s
    MISS/s
    HIT%
    
slide 13:
    Template 4: Count Summary
    Examples: *count
    # funccount 'vfs_*'
    Tracing... Ctrl-C to end.
    ADDR
    FUNC
    ffffffff811efe81 vfs_create
    ffffffff811f24a1 vfs_rename
    ffffffff81215191 vfs_fsync_range
    ffffffff81231df1 vfs_lock_file
    ffffffff811e8dd1 vfs_fstatat
    ffffffff811e8d71 vfs_fstat
    ffffffff811e4381 vfs_write
    ffffffff811e8c71 vfs_getattr_nosec
    ffffffff811e8d41 vfs_getattr
    ffffffff811e3221 vfs_open
    ffffffff811e4251 vfs_read
    Detaching...
    COUNT
    
slide 14:
    Template 5: Histogram Summary
    Examples: *dist, *latency
    # biolatency
    Tracing block device I/O... Hit Ctrl-C to end.
    usecs
    : count
    distribution
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 1
    128 ->gt; 255
    : 12
    |********
    256 ->gt; 511
    : 15
    |**********
    512 ->gt; 1023
    : 43
    |*******************************
    1024 ->gt; 2047
    : 52
    |**************************************|
    2048 ->gt; 4095
    : 47
    |**********************************
    4096 ->gt; 8191
    : 52
    |**************************************|
    8192 ->gt; 16383
    : 36
    |**************************
    16384 ->gt; 32767
    : 15
    |**********
    32768 ->gt; 65535
    : 2
    65536 ->gt; 131071
    : 2
    
slide 15:
    Template 6: Heatmap Summary
    Example: subsecoffset.lua (aka "spectrogram")
    
slide 16:
slide 17:
    Valuable
    Know what already exists, and what doesn't
    
slide 18:
    Low Overhead (or documented)
    sysdig
    Kernel
    sysdig
    driver
    1. enable
    3. output
    lua
    program
    2. async
    read
    syscalls
    ring
    buffer
    • Understand tracing internals
    – For example, sysdig's design has ~20x lower overhead than strace
    (it still has overhead: test and measure to see if it's acceptable)
    – Tracing overhead is usually relative to event rate
    • Design for low overhead, and document expectations
    
slide 19:
    Documentation
    • Good tools have 3 docs:
    .TH
    Title heading
    Code comments
    .SH
    Section heading
    Man page
    .IP
    Indented paragraph
    Examples file
    .TP
    Indented paragraph with label
    Bold
    • Man page
    – troff, docbook, …
    • Examples file:
    common man macros (see groff_man(7))
    Demonstrations of biosnoop, the Linux eBPF/bcc version.
    biosnoop traces block device I/O (disk I/O), and prints a line of output
    per I/O. Example:
    # ./biosnoop
    TIME(s)
    [...]
    COMM
    supervise
    PID
    DISK
    xvda1
    SECTOR
    BYTES
    LAT(ms)
    
slide 20:
    Concise, intuitive, self-explanatory
    # ./iolatency
    Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
    >gt;=(ms) .. gt; 1
    1 ->gt; 2
    2 ->gt; 4
    4 ->gt; 8
    8 ->gt; 16
    […]
    : I/O
    : 4381
    : 9
    : 5
    : 0
    : 1
    |Distribution
    |######################################|
    • Useful startup message
    – What I'm tracing, when there's output, when I'll end
    • Vigorous tooling is concise
    – No wasted text; leave less useful output for non-default options
    • Unix philosophy: do one thing and do it well
    
slide 21:
    POSIX-style Arguments
    # ./biolatency -h
    usage: biolatency [-h] [-T] [-Q] [-m] [-D] [interval] [count]
    Summarize block device I/O latency as a histogram
    positional arguments:
    interval
    output interval, in seconds
    count
    number of outputs
    optional arguments:
    -h, --help
    -T, --timestamp
    -Q, --queued
    -m, --milliseconds
    -D, --disks
    examples:
    ./biolatency
    ./biolatency 1 10
    ./biolatency -mT 1
    ./biolatency -Q
    ./biolatency -D
    show this help message and exit
    include timestamp on output
    include OS queued time in I/O time
    millisecond histogram
    print a histogram per disk device
    # summarize block I/O latency as a histogram
    # print 1 second summaries, 10 times
    # 1s summaries, milliseconds, and timestamps
    # include OS queued time in I/O time
    # show each disk device separately
    
slide 22:
    POSIX-style Arguments
    Option
    Alternate
    Expectation
    --all
    all events
    -c CMD
    --cmd …
    run this command
    -d SECONDS
    --duration …
    duration of tool execution
    --help
    help
    -i FILE
    --input …
    input file
    -i SECONDS
    --interval …
    summary interval
    -n name
    --name …
    this process name only
    -o FILE
    --output …
    output file
    -p PID
    --pid …
    this process ID only
    --by-process
    per-process ID breakdown
    -P PORT
    --port …
    this TCP port only
    -t or -T
    --[no]timestamp
    include or exclude timestamps
    --verbose
    verbose output
    --extended, --errors
    extended output, or only failures
    [interval [count]]
    summary interval, and # of outputs
    
slide 23:
    Testing, Testing, Testing
    • If you can't write the workload, you can't write the tool
    – Be it 10 lines of C, some shell, or dd
    – dd if=/dev/urandom of=/dev/null bs=1k count=23
    • Known workload testing: create 23 events
    • Testing can be time consuming
    – I spend more time testing a tool than writing it
    – Automatic tool testing is a difficult problem
    
slide 24:
    Example: gethostlatency
    # gethostlatency
    TIME
    PID
    COMM
    06:10:24 28011 wget
    06:10:28 28127 wget
    06:10:41 28404 wget
    06:10:48 28544 curl
    06:11:10 29054 curl
    06:11:16 29195 curl
    06:11:25 29404 curl
    06:11:28 29475 curl
    LATms HOST
    90.00 www.iovisor.org
    0.00 www.iovisor.org
    9.00 www.netflix.com
    35.00 www.netflix.com.au
    31.00 www.plumgrid.com
    3.00 www.facebook.com
    72.00 foo
    1.00 foo
    
slide 25:
    Example: ext4slower
    # ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    06:49:17 bash
    R 128
    06:49:17 cksum
    R 39552
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 10320
    06:49:17 cksum
    R 65536
    06:49:17 cksum
    R 55400
    06:49:17 cksum
    R 36792
    06:49:17 cksum
    R 15008
    06:49:17 cksum
    R 6123
    repository
    06:49:17 cksum
    R 6280
    06:49:17 cksum
    R 27696
    06:49:17 cksum
    R 58080
    06:49:17 cksum
    R 906
    […]
    LAT(ms) FILENAME
    7.75 cksum
    1.34 [
    5.36 2to3-2.7
    14.94 2to3-3.4
    6.82 411toppm
    4.01 a2p
    8.77 ab
    16.34 aclocal-1.14
    19.31 acpi_listen
    17.23 add-apt18.40 addpart
    2.16 addr2line
    10.11 ag
    6.30 ec2-meta-data
    
slide 26:
    Example: biolatency
    # biolatency -m 1 5
    Tracing block device I/O... Hit Ctrl-C to end.
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    […]
    : count
    : 36
    : 1
    : 3
    : 17
    : 33
    : 7
    : 6
    distribution
    |**************************************|
    |***
    |*****************
    |**********************************
    |*******
    |******
    
slide 27:
    GUI Tracing Tools
    
slide 28:
    GUI Visualizations
    1. Event logs
    2. Tables
    3. Line graphs
    4. Histograms
    5. Heatmaps (spectrographs)
    6. Waterfall charts
    7. Directed graphs
    8. Flame graphs
    9. Flame charts
    
slide 29:
    Visualization 1: Event Logs
    https://commons.wikimedia.org/wiki/File:Wireshark_screenshot.png
    
slide 30:
    Visualization 2: Tables
    
slide 31:
    Visualization 3: Line Graphs
    http://www.paradyn.org/html/screen-shots.html
    
slide 32:
    Visualization 4: Histograms
    Or a density plot
    Or as a frequency trail (can cascade)
    
slide 33:
    Visualization 5: Heat Maps
    eg, Oracle ZFS Storage Appliance Analytics (DTrace-based)
    
slide 34:
    Visualization 5: Spectrograms
    
slide 35:
    Visualization 6: Waterfall Charts
    
slide 36:
    Visualization 7: Directed Graphs
    
slide 37:
    Visualization 8: Flame Graphs
    fstat from disk
    directory read
    from disk
    file read
    from disk
    path read from disk
    pipe write
    Commonly used with CPU profilers. Also useful for tracers: off-CPU time, ...
    
slide 38:
    Visualization 9: Flame Charts
    
slide 39:
    Desirable Attributes
    • Valuable
    – Methodologies provide ideas for purposeful metrics
    • Documented
    – Tool tips, wikis
    • Tested
    • Real Time
    • Dashboards
    – To support methodologies
    
slide 40:
    Thank You!
    http://www.brendangregg.com
    http://slideshare.net/brendangregg
    bgregg@netflix.com
    @brendangregg
    References & Links:
    – http://www.brendangregg.com/heatmaps.html
    – http://www.brendangregg.com/flamegraphs.html
    – http://www.slideshare.net/brendangregg/monitorama-2015-netflix-instance-analysis