CS2006_dtrace_workshop

Context-Switch: DTrace Workshop 2006

Workshop slides for a 4-day Context-Switch class in London by Brendan Gregg.

	next prev 1/233
	next prev 2/233
	next prev 3/233
	next prev 4/233
	next prev 5/233
	next prev 6/233
	next prev 7/233
	next prev 8/233
	next prev 9/233
	next prev 10/233
	next prev 11/233
	next prev 12/233
	next prev 13/233
	next prev 14/233
	next prev 15/233
	next prev 16/233
	next prev 17/233
	next prev 18/233
	next prev 19/233
	next prev 20/233
	next prev 21/233
	next prev 22/233
	next prev 23/233
	next prev 24/233
	next prev 25/233
	next prev 26/233
	next prev 27/233
	next prev 28/233
	next prev 29/233
	next prev 30/233
	next prev 31/233
	next prev 32/233
	next prev 33/233
	next prev 34/233
	next prev 35/233
	next prev 36/233
	next prev 37/233
	next prev 38/233
	next prev 39/233
	next prev 40/233
	next prev 41/233
	next prev 42/233
	next prev 43/233
	next prev 44/233
	next prev 45/233
	next prev 46/233
	next prev 47/233
	next prev 48/233
	next prev 49/233
	next prev 50/233
	next prev 51/233
	next prev 52/233
	next prev 53/233
	next prev 54/233
	next prev 55/233
	next prev 56/233
	next prev 57/233
	next prev 58/233
	next prev 59/233
	next prev 60/233
	next prev 61/233
	next prev 62/233
	next prev 63/233
	next prev 64/233
	next prev 65/233
	next prev 66/233
	next prev 67/233
	next prev 68/233
	next prev 69/233
	next prev 70/233
	next prev 71/233
	next prev 72/233
	next prev 73/233
	next prev 74/233
	next prev 75/233
	next prev 76/233
	next prev 77/233
	next prev 78/233
	next prev 79/233
	next prev 80/233
	next prev 81/233
	next prev 82/233
	next prev 83/233
	next prev 84/233
	next prev 85/233
	next prev 86/233
	next prev 87/233
	next prev 88/233
	next prev 89/233
	next prev 90/233
	next prev 91/233
	next prev 92/233
	next prev 93/233
	next prev 94/233
	next prev 95/233
	next prev 96/233
	next prev 97/233
	next prev 98/233
	next prev 99/233
	next prev 100/233
	next prev 101/233
	next prev 102/233
	next prev 103/233
	next prev 104/233
	next prev 105/233
	next prev 106/233
	next prev 107/233
	next prev 108/233
	next prev 109/233
	next prev 110/233
	next prev 111/233
	next prev 112/233
	next prev 113/233
	next prev 114/233
	next prev 115/233
	next prev 116/233
	next prev 117/233
	next prev 118/233
	next prev 119/233
	next prev 120/233
	next prev 121/233
	next prev 122/233
	next prev 123/233
	next prev 124/233
	next prev 125/233
	next prev 126/233
	next prev 127/233
	next prev 128/233
	next prev 129/233
	next prev 130/233
	next prev 131/233
	next prev 132/233
	next prev 133/233
	next prev 134/233
	next prev 135/233
	next prev 136/233
	next prev 137/233
	next prev 138/233
	next prev 139/233
	next prev 140/233
	next prev 141/233
	next prev 142/233
	next prev 143/233
	next prev 144/233
	next prev 145/233
	next prev 146/233
	next prev 147/233
	next prev 148/233
	next prev 149/233
	next prev 150/233
	next prev 151/233
	next prev 152/233
	next prev 153/233
	next prev 154/233
	next prev 155/233
	next prev 156/233
	next prev 157/233
	next prev 158/233
	next prev 159/233
	next prev 160/233
	next prev 161/233
	next prev 162/233
	next prev 163/233
	next prev 164/233
	next prev 165/233
	next prev 166/233
	next prev 167/233
	next prev 168/233
	next prev 169/233
	next prev 170/233
	next prev 171/233
	next prev 172/233
	next prev 173/233
	next prev 174/233
	next prev 175/233
	next prev 176/233
	next prev 177/233
	next prev 178/233
	next prev 179/233
	next prev 180/233
	next prev 181/233
	next prev 182/233
	next prev 183/233
	next prev 184/233
	next prev 185/233
	next prev 186/233
	next prev 187/233
	next prev 188/233
	next prev 189/233
	next prev 190/233
	next prev 191/233
	next prev 192/233
	next prev 193/233
	next prev 194/233
	next prev 195/233
	next prev 196/233
	next prev 197/233
	next prev 198/233
	next prev 199/233
	next prev 200/233
	next prev 201/233
	next prev 202/233
	next prev 203/233
	next prev 204/233
	next prev 205/233
	next prev 206/233
	next prev 207/233
	next prev 208/233
	next prev 209/233
	next prev 210/233
	next prev 211/233
	next prev 212/233
	next prev 213/233
	next prev 214/233
	next prev 215/233
	next prev 216/233
	next prev 217/233
	next prev 218/233
	next prev 219/233
	next prev 220/233
	next prev 221/233
	next prev 222/233
	next prev 223/233
	next prev 224/233
	next prev 225/233
	next prev 226/233
	next prev 227/233
	next prev 228/233
	next prev 229/233
	next prev 230/233
	next prev 231/233
	next prev 232/233
	next prev 233/233

PDF: CS2006_dtrace_workshop_slides.pdf

Keywords (from pdftotext):

slide 1:

DTrace Workshop
Context-Switch presents,
DTrace on Solaris 10
London, June, 2006
Brendan Gregg

slide 2:

This document has been provided on an “as is” basis, WITHOUT WARRANTY OF ANY KIND, either
expressed or implied, without even the implied warranty of merchantability or fitness for a particular
purpose. This document could include technical inaccuracies, typographical errors and even spelling errors
(or at the very least, Australian spelling).
Various content, including diagrams and material from the DTraceToolkit, are copyright © 2006 by
Brendan Gregg.
This document was NOT written by Sun Microsystems, and opinions expressed are not those of Sun
Microsystems unless by coincidence.

slide 3:

This Workshop
DAY 1
● Solaris 9 Performance Monitoring
● Introducing DTrace
DAY 2
● Programming in D
● The DTrace Mentality
DAY 3
● Fixing DTrace Faults

slide 4:

Chapters
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 5:

References
Some topics presented are based on,
Solaris Performance and Tools: DTrace
and MDB Techniques for Solaris 10 and
OpenSolaris (McDougall/Mauro/Gregg)

slide 6:

Slides
These slides cover key topics
I will cover many side topics, elaborate
further, answer questions, run demos, ...
as such, when I deviate from the slides –
don't panic! It's part of the course.

slide 7:

Chapter 1
Solaris 9 Performance Monitoring
What can be observed
● What can't be observed
● Key DTrace wins
● Start here

slide 8:

Example Fault
Where has my CPU gone?
using Solaris 9 tools
... interactive demo ...

slide 9:

The Solaris Toolkit
Consists of many, many tools
ps, prstat, ptree, pstack, pmap
• vmstat, mpstat, iostat, netstat
• sar
• kstat, ndd, mdb
• truss, sotruss, apptrace, prex
• cpustat, trapstat, lockstat
• ...
These is no single tool to rule them all
(not even DTrace)

slide 10:

I have tried
sysperfstat
$ sysperfstat 1
------ Utilisation -----Time
%CPU
%Mem %Disk
%Net
23:27:41
0.85 44.11
23:27:42
3.00 80.98
23:27:43
2.00 80.98
23:27:44
17.00 80.98
23:27:45
46.00 80.49 22.05 54.20
23:27:46
50.00 79.83 14.19 78.18
23:27:47
48.00 79.39
8.04 80.94
23:27:48
54.00 79.62
3.06 70.89
23:27:49
39.00 79.43
6.78 74.52
------ Saturation -----CPU
Mem
Disk
Net
Such a tool serves one role but not all.
In this case, it provides the “view from 20,000 feet”.

slide 11:

Coping With Many Tools
Categorise by Approach
● Categorise by Resource Type
Checklists of tools (these slides)
Study some, be aware what else exists
just remembering that something is doable is valuable!

slide 12:

Categorise by Approach
1. Monitoring
Monitoring multiple hosts
Gathering long term data
2. Identification
Examining system-wide health
3. Analysis
Focusing on details

slide 13:

Examples
1. Monitoring
SNMP, sar, SunMC
2. Identification
kstat (vmstat, mpstat, iostat)
procfs (ps, prstat)
mnttab (df)
3. Analysis
truss, apptrace, prex, lockstat, ...

slide 14:

Hints
Be careful when using sar:
sar has sampling issues
● the default configuration needs tuning
● try truss -ft ioctl sar -u 1 5 (yes, sar
reads everything. compare to mpstat)
Solaris 10 now uses Net-SNMP
USM – usernames, passwords, encryption
● VACM – restricting views
(these greatly improve security)

slide 15:

Categorise by Resource Type
CPU
vmstat, mpstat, sar, prstat, ps
Memory
vmstat, swap, ::memstat, prstat, ps, pmap
Disk
iostat, taztool, iosnoop, iotop
Network
netstat, kstat, ndd, nx.se, nicstat

slide 16:

Tools Checklist
The following list of tools serves to:
• provide a checklist
• show what is doable – you'll remember later
• show what can be done – the right tool for
the job
• show what can't be done – where DTrace
can help
• provide starting points for using DTrace

slide 17:

Tools Checklist
uptime
/usr/ucb/ps
prstat
prstat -m
vmstat
mpstat
iostat
prex/tnf*
psio
netstat -i
nx.se
nicstat
netstat -rn
sar
sar -u
sar -q
kstat
K9Toolkit
ndd
checkcable
cpustat
CacheKit
busstat
trapstat
lockstat
lockstat -I
truss
sotruss
apptrace
truss -ua.out
adb
mdb -k
mdb -p
lastcomm
BSM auditing
snmpwalk

slide 18:

uptime
$ uptime
2:38am
up 347 day(s),
9:08,
5 users,
load average: 0.01, 0.02, 0.03
load average: 1, 5 and 15 minute averages.
These were once the average length of the combined
dispatcher queues + currently running threads, sampled
during clock().
● It is now microstate accounting based, and is all the
threads usr + sys + CPU latency times.
● load averages tend to over-simplify CPU issues.
Can't customise the interval
• DTrace can trace scheduler activity
BTW – 347 days is nothing; I maintain the “Sun Book of Records” (search google),
where the current record is 2001 days!

slide 19:

$ ps -ef
UID
PID PPID
root
root
root
root
root 2316
nobody 22112 22111
root 2061
root
root
STIME TTY
May 28 ?
May 28 ?
May 28 ?
May 28 ?
May 28 ?
Feb 15 ?
May 28 ?
May 28 ?
May 28 ?
TIME CMD
0:08 sched
0:48 /etc/init 0:02 pageout
89:39 fsflush
0:00 /usr/lib/saf/sac -t 300
231:44 (squid)
0:12 /usr/sbin/rpcbind
0:00 /usr/lib/sysevent/syseventd
0:01 /usr/lib/picl/picld
Useful to check long-term processes
• Default fields not hugely useful, use -o
• DTrace can access similar procfs statistics, and
gather many additional statistics

slide 20:

/usr/ucb/ps
$ /usr/ucb/ps auxww
USER
PID %CPU %MEM
SZ RSS TT
root
3768 3.0 0.8 3216 2856 pts/2
root
3453 0.2 0.8 4544 2904 ?
root
498 0.1 0.8 4680 2944 ?
brendan 17545 0.1 0.6 2648 2080 pts/2
brendan
3769 0.1 0.3 984 808 pts/2
brendan
2408 0.1 1.4 6800 5112 ?
root
2037 0.1 0.3 2024 800 ?
root
3765 0.1 0.3 1016 816 ?
nobody
22112 0.1 22.79150484352 ?
START TIME COMMAND
O 16:24:40 0:00 /usr/ucb/ps -auxww
S 15:59:10 0:01 /usr/lib/ssh/sshd
May 03 0:29 /usr/lib/ssh/sshd
May 08 0:04 /bin/bash
S 16:24:40 0:00 head
May 28 92:28 SCREEN
May 28 168:21 /usr/sbin/in.routed
S 16:24:28 0:00 sleep 15
Feb 15 231:44 (squid)
Sorting by %CPU is nice
• Fields colliding is not nice
• pargs is better at viewing full arg listings

slide 21:

prstat
$ prstat
PID USERNAME SIZE
RSS STATE PRI NICE
TIME CPU PROCESS/NLWP
3453 root
4544K 2904K sleep
0:00:01 0.7% sshd/1
2408 brendan 6800K 5112K sleep
1:32:29 0.7% screen-3.9.8/1
5558 irc
8152K 3832K sleep
0:15:41 0.4% irssi/1
3944 brendan 4584K 4224K cpu0
0:00:00 0.4% prstat/1
25442 brendan 2312K 296K sleep
0:05:05 0.1% telnet/1
3939 root
1016K 816K sleep
0:00:00 0.0% sleep/1
17545 brendan 2648K 2080K sleep
0:00:04 0.0% bash/1
2037 root
2024K 800K sleep
2:48:21 0.0% in.routed/1
Total: 114 processes, 197 lwps, load averages: 0.04, 0.02, 0.03
Great summary view
• Doesn't wallop the CPU
• DTrace can also provide updating summaries, of
a simple nature

slide 22:

prstat -m
$ prstat -m
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
3999 brendan 0.4 1.3 0.0 0.0 0.0 0.0 98 0.0
0 259
0 prstat/1
2408 brendan 0.7 0.1 0.0 0.0 0.0 0.0 99 0.0
0 30
0 screen-3.9.8/1
19763 brendan 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0
0 13
0 sdtperfmeter/1
2150 root
0.0 0.0 - 100 8
0 26
0 nscd/25
78 root
0.0 0.0 - 100 0
0 picld/4
73 root
0.0 0.0 - 100 0
0 syseventd/14
2061 root
0.0 0.0 - 100 0
0 rpcbind/1
22112 root
0.0 0.0 - 100 2
0 squid/1
Total: 114 processes, 197 lwps, load averages: 0.00, 0.02, 0.02
Microstate accounting – great breakdown of
process time (very useful!)
• Restricted to pre-determined accounting states
• DTrace can measure custom states

slide 23:

vmstat
$ vmstat 1 3
kthr
memory
page
disk
r b w
swap free re mf pi po fr de sr dd dd f0 s3
0 0 21 422864 68800 3 16 5 0 0 0 0 0 0 0 0
0 0 51 434616 55560 32 174 0 0 0 0 0 0 0 0 0
0 0 51 434616 55560 0
0 0 0 0 0 0 0 0 0 0
$ vmstat -p 1 3
memory
swap free re
422864 68800
434616 55560
434616 55560
page
mf fr de
executable
epi epo epf
faults
cpu
cs us sy id
126 106 135 1 0 98
403 442 123 1 3 96
401 250 116 0 0 100
anonymous
api apo apf
filesystem
fpi fpo fpf
Best system-wide view
• Can't view statistics by-process, by-zone, ...
• DTrace can!

slide 24:

mpstat
$ mpstat 5
CPU minf mjf xcal
intr ithr
csw icsw migr smtx
srw syscl
Good by-CPU summary
• DTrace can also measure by-CPU
usr sys
wt idl
0 98
0 96
0 98
0 97

slide 25:

iostat
$ iostat -xnmPz 1
r/s
w/s
extended device statistics
kr/s
kw/s wait actv wsvc_t asvc_t
0.1 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.4 0.0 0.0
3.1 0.0 0.0
0.5 0.0 0.0
0.0 0.0 0.0
%b device
0 c0t0d0s0
0 c0t0d0s1
0 c0t0d0s3
0 c0t0d0s4
0 c0t0d0s5
0 c0t0d0s7 (/data)
0 c0t2d0s0 (/)
0 c0t2d0s1 (/export/home)
0 c0t2d0s3 (/squidcache)
0 mars:vold(pid2295)
Excellent per-disk/partition/controller summary
• No by-process summary, or event details
• DTrace solves both

slide 26:

prex/tnfxtract/tnfdump
prex: enables static probes to record event
details to a buffer
tnfxtract: extracts buffer contents
tnfdump: converts buffer contents to text
DTrace has this exact functionality, and
more (dynamic)

slide 27:

prex/tnfxtract/tnfdump
# tnfdump out1.tnf
probe
tnf_name: "pagein" tnf_string: "keys vm pageio io;file ../../
common/os/
bio.c;line 1333;"
probe
tnf_name: "strategy" tnf_string: "keys io blockio;file ../../
common/os/
driver.c;line 411;"
probe
tnf_name: "biodone" tnf_string: "keys io blockio;file ../../
common/os/b
io.c;line 1222;"
---------------- ---------------- ----- ----- ---------- -------------------------- -----------------------Elapsed (ms)
Delta (ms)
PID LWPID
TID
CPU Probe Name
Data / Description . . .
---------------- ---------------- ----- ----- ---------- -------------------------- -----------------------0.000000
0.000000 5926
1 0xd423fe00
0 pagein
vnode: 0xd678de40 offset: 0 size: 8192
0.029433 5926
1 0xd423fe00
0 strategy
device: 26738688 block: 1616 size: 8192 buf: 0xd7c32aa8 flags: 34078801
0 0xd3a00de0
0 biodone
device: 26738688 block: 1616 buf: 0xd7c32aa8

slide 28:

psio
# ./psio 1
UID
PID PPID %I/O
STIME TTY
brendan 13271 10093 65.4 23:20:16 pts/20
root
0 0.0
Mar 16 ?
root
0 0.0
Mar 16 ?
root
0 0.0
Mar 16 ?
TIME CMD
0:01 grep brendan contents
0:16 sched
0:10 /etc/init 0:00 pageout
http://www.brendangregg.com/psio.html
• Solved %Disk I/O by-process
• Could only run for short intervals
• DTrace takes this much further – tracing
events, by-process info, ...

slide 29:

netstat -i
$ netstat -i 1 5
input
hme0
output
packets errs packets errs colls
206999694 32
223490272 0
input (Total)
output
packets errs packets errs colls
326931321 32
343421899 0
An approximation of activity
• No further details for analysis
• Packets are not bytes
• Both kstat and DTrace provide more info

slide 30:

nx.se
$ se nx.se 1
Current tcp RtoMin is 400, interval 1, start Thu May 11 18:22:37 2006
18:22:38 Iseg/s Oseg/s InKB/s OuKB/s Rst/s Atf/s Ret% Icn/s Ocn/s
tcp
Name
Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s
hme0
0.34 0.000 0.000
18:22:39 Iseg/s Oseg/s InKB/s OuKB/s Rst/s Atf/s Ret% Icn/s Ocn/s
tcp
Name
Ipkt/s Opkt/s InKB/s OuKB/s IErr/s OErr/s Coll% NoCP/s Defr/s
hme0
0.34 0.000 0.000
From the SE Toolkit
• Provides many more useful details

slide 31:

nicstat
$ nicstat 1
Time
Int
18:23:49 hme0
18:23:50 hme0
18:23:51 hme0
18:23:52 hme0
18:23:53 hme0
rKb/s
wKb/s
rPk/s
wPk/s
rAvs
wAvs
%Util
http://www.brendangregg.com/k9toolkit.html
• Uses kstat, written in both C and Perl
• Provides bytes, %Utilisation
Sat

slide 32:

netstat -rn
$ netstat -rn
Routing Table: IPv4
Destination
Gateway
Flags Ref
Use
Interface
-------------------- -------------------- ----- ----- ------ --------220.244.170.56
1 18146 hme0:1
1 12537 hme0:2
1 32151 hme0
0 hme0
default
1 326156
9116268837 lo0
Check for unexpected routes
• You discover problems after the fact
• DTrace can snoop route changes live

slide 33:

sar
$ sar -u 1 5
SunOS mars 5.9 Generic_118558-05 sun4u
18:33:46
18:33:47
18:33:48
18:33:49
18:33:50
18:33:51
%usr
%sys
%wio
05/11/2006
%idle
System Activity Reporter
• Prints a variety of kstats
• Can collect historic data, to identify long
term patterns
• Has several issues

slide 34:

sar -u
$ sar -u
SunOS mars 5.9 Generic_118558-05 sun4u
00:00:01
01:00:01
02:00:01
03:00:01
04:00:00
05:00:01
%usr
%sys
%wio
05/11/2006
%idle
This shows historic data from sar -u
• %wio is always zero in Solaris 10
• Note that the system is idle

slide 35:

sar -q
$ sar -q
SunOS mars 5.9 Generic_118558-05 sun4u
05/11/2006
00:00:01 runq-sz %runocc swpq-sz %swpocc
01:00:01
02:00:01
03:00:01
04:00:00
05:00:01
sar -q shows run queue sizes
• This is the same period as before, and
shows a >gt; 1.0 run queue size. huh?

slide 36:

kstat
$ kstat -pm cpu_info
cpu_info:0:cpu_info0:chip_id
cpu_info:0:cpu_info0:class
misc
cpu_info:0:cpu_info0:clock_MHz 333
cpu_info:0:cpu_info0:cpu_type
sparcv9
cpu_info:0:cpu_info0:crtime
cpu_info:0:cpu_info0:device_ID 0
cpu_info:0:cpu_info0:fpu_type
sparcv9
cpu_info:0:cpu_info0:implementation
UltraSPARC-IIi
Great resource
• libkstat, Sun::Solaris::Kstat, or /usr/bin/kstat
netstat -k is now deprecated
Many kstats are not documented
• kstat finds problems, DTrace analyses them

slide 37:

K9Toolkit
http://www.brendangregg.com/k9toolkit.html
• A Perl kstat collection
• Contains
sysperfstat
• checkcable
• nicstat
• prtdevs
• ...

slide 38:

ndd
# ndd /dev/hme link_speed
# ndd /dev/rtls link_speed
operation failed: Invalid argument
Accesses read/write settings for network
related drivers
• consistancy issues between interface types

slide 39:

checkcable
# checkcable
Interface
Link Duplex
hme0
FULL
hme1
HALF
Speed
AutoNEG
http://www.brendangregg.com
• Translates both ndd and kstats
• May not be needed in the long term
(interface kstats are getting better)

slide 40:

cpustat
# cpustat -c pic0=EC_ref,pic1=EC_hit 1 5
time cpu event
pic0
pic1
0 tick
0 tick
0 tick
0 tick
0 tick
1 total
Examine CPU PICs
• Observe I$, D$, E$ performance
• Use cputrack for per-process info
• Info available is CPU-type dependant

slide 41:

CacheKit
# ccachestat 5 5
--- I-Cache --total
miss
%hit
7424k
107k
7941k
107k
39082k
504k
19926k
248k
11028k
146k
--- D-Cache --total
miss
%hit
2476k
39k
3502k
53k
15243k
293k
4985k
103k
5340k
95k
--- E-Cache --total
miss
%hit
879k
200k
860k
211k
3420k
487k
1531k
277k
1856k
326k
http://www.brendangregg.com/cachekit.html
• Digs out useful info, calculates hit ratios
• Needs updates to measure newer CPUs

slide 42:

busstat
# busstat -w ac,pic0=clock_cycles,pic1=mem_bank0_rds 2 100
time dev
event0
pic0
event1
ac0
clock_cycles
mem_bank0_rds
ac1
clock_cycles
mem_bank0_rds
ac0
clock_cycles
mem_bank0_rds
Measures bus PICs
• No public PIC documentation (yet)
Anyone actually used this outside of Sun?
pic1

slide 43:

trapstat
# trapstat -t 10 1
cpu m| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
-----+-------------------------------+-------------------------------+---0 u|
2393 0.1
3950 0.7 |
11926 0.6
4205 0.8 | 2.2
0 k|
105 0.0
0 0.0 |
96358 5.2
214 0.1 | 5.3
=====+===============================+===============================+====
ttl |
2498 0.2
3950 0.7 |
108284 5.7
4419 0.9 | 7.5
# trapstat
trapstat: not implemented on i86pc
MMU statistics
• Check this to see if MPSS needs tuning

slide 44:

lockstat
# lockstat -k sleep 5
Adaptive mutex block: 6 events in 5.638 seconds (1 events/sec)
Count indv cuml rcnt
nsec Lock
Caller
------------------------------------------------------------------------------2 33% 33% 0.00
71995 0x30000e723c0
clock
1 17% 50% 0.00
85133 0x300055ffcb0
rctl_test_entity
1 17% 67% 0.00
77176 0x300040b2888
rctl_test_entity
1 17% 83% 0.00
70159 0x30000e72280
clock
1 17% 100% 0.00
73192 0x30000e722c0
clock
------------------------------------------------------------------------------•
Kernel lock statistics

slide 45:

lockstat -I
# lockstat -kIi 997 sleep 5
Profiling interrupt: 5199 events in 5.208 seconds (998 events/sec)
Count indv cuml rcnt
nsec CPU+PIL
Caller
-----------------------------------------------------------------------------1962 38% 38% 0.00
6751 cpu[0]
(usermode)
5% 43% 0.00
7605 cpu[0]
mutex_enter
4% 46% 0.00
7970 cpu[0]
dnlc_lookup
3% 50% 0.00
7271 cpu[0]
utl0
3% 52% 0.00
6866 cpu[0]
lookuppnvp
2% 55% 0.00
6476 cpu[0]
syscall_mstate
2% 57% 0.00
7142 cpu[0]
ufs_lookup
2% 59% 0.00
2041 cpu[0]
idle
-I samples the kernel
• DTrace can both sample and trace

slide 46:

truss
$ truss date
execve("/usr/bin/date", 0xFFBFF754, 0xFFBFF75C) argc = 1
resolvepath("/usr/bin/date", "/usr/bin/date", 1023) = 13
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
stat("/usr/bin/date", 0xFFBFF528)
= 0
open("/var/ld/ld.config", O_RDONLY)
Err#2 ENOENT
stat("/usr/lib/libc.so.1", 0xFFBFF030)
= 0
resolvepath("/usr/lib/libc.so.1", "/usr/lib/libc.so.1", 1023) = 18
Trace syscalls and signals
truss is “violent”. can slow target by 70%
Can't trace all processes at the same time
truss behaves synchronously to syscalls
DTrace buffers system-wide syscall details and
reads the buffer asynchronously. slows target

slide 47:

sotruss
$ sotruss date
date
date
date
date
date
date
date
date
date
date
->gt;
->gt;
->gt;
->gt;
->gt;
->gt;
->gt;
->gt;
->gt;
->gt;
libc.so.1:*atexit(0xff3bfaac, 0x21800, 0x0)
libc.so.1:*atexit(0x11654, 0x21800, 0x0)
libc.so.1:*setlocale(0x6, 0x116dc, 0xff3509e0)
libc.so.1:*textdomain(0x116e0, 0x116dc, 0xff3509e0)
libc.so.1:*getopt(0x1, 0xffbff684, 0x116f0)
libc.so.1:*time(0x21d98, 0xffbff684, 0x116f0)
libc.so.1:*nl_langinfo(0x3a, 0xffbff684, 0x21d98)
libc.so.1:*localtime(0x21d98, 0x3a, 0x21d98)
libc_psr.so.1:*memcpy(0xffbff5f4, 0xff2427b0, 0x24)
libc.so.1:*strftime(0x21d9c, 0x400, 0xff22b094)
Shared library tracing
• Added in Solaris 2.6
• Not so clever with function arguments

slide 48:

apptrace
$ apptrace date
date
->gt; libc.so.1:atexit(func = 0xff3bfaac) = 0x0
date
->gt; libc.so.1:atexit(func = 0x11654) = 0x0
date
->gt; libc.so.1:setlocale(category = 0x6, locale = "") = "C"
date
->gt; libc.so.1:textdomain(domainname = "SUNW_OST_OSCMD") =
"SUNW_OST_OSCMD"
date
->gt; libc.so.1:getopt(argc = 0x1, argv = 0xffbff67c, optstring = "a:u")
= 0xffffffff errno = 0 (Error 0)
date
->gt; libc.so.1:strftime(s = "Fri May 12 17:16:12 ", maxsize = 0x400,
format = "%a %b %e %T %Z %Y", timeptr = 0xffbff5ec) = 0x1c
Shared library tracing
• Added in Solaris 8
• Evaluates arguments
• DTrace has this functionality

slide 49:

truss -ua.out
$ truss -ua.out banner hello
->gt; _init(0x20f84, 0x21000, 0x0, 0x0)
gt; main(0x2, 0xffbff744, 0xffbff750, 0x21000)
brk(0x000216A0)
brk(0x000236A0)
->gt; banner(0xffbff84b, 0x2144c, 0x0, 0x0)
->gt; banset(0x20, 0x2144c, 0x0, 0x0)
gt; convert(0x68, 0x2144c, 0x0, 0x2164a)

slide 50:

adb/mdb
Perform core dump analysis – processes
• Perform clash dump analysis – kernel
• mdb was added in Solaris 8

slide 51:

mdb -k
# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs ip sctp usba
random fctl nca cpc nfs sppp sd audiosup crypto ]
>gt; ::mappings
BASE
LIMIT
SIZE NAME
18f4000
8f4000 ktextseg
18f4000
1c80000
38c000 kvalloc
10000000 kvseg32
edd00000
f0000000
2300000 kdebugseg
2a100000000
2a13ead8000
3ead8000 kpseg
2a750000000
2a753adc000
3adc000 kmapseg
40000000000 kvseg
916000 kmem64
fffffa0000000000 fffffe0000000000
40000000000 kpmseg
>gt;
Visit kernel-land,

slide 52:

mdb -k
>gt; ::pgrep syslog | ::print proc_t
p_exec = 0x30002a822c0
p_as = 0x300011326c8
p_lockp = 0x30000e72780
p_crlock = {
_opaque = [ 0 ]
p_cred = 0x3000034ebe8
p_swapcnt = 0
p_stat = '\002'
p_wcode = '\0'
p_pidflag = 0
... walk kernel structures
• DTrace does this too

slide 53:

mdb -p
# mdb -p `pgrep -o syslogd`
Loading modules: [ ld.so.1 libc.so.1 ]
>gt; ::mappings
BASE
LIMIT
SIZE NAME
12000 /usr/sbin/syslogd
2000 /usr/sbin/syslogd
30000 /usr/sbin/syslogd
fe87a000 fe87c000
2000 [ anon ]
fe97a000 fe97c000
2000 [ anon ]
fe980000 fe982000
2000 [ anon ]
fe982000 fe99c000
1a000 [ anon ]
fe99c000 fe99e000
2000 [ anon ]
fe99e000 fe9b8000
1a000 [ anon ]
Visit user-land
• DTrace does this too (copyin/copyout)

slide 54:

process accounting
# lastcomm
man
more
root
root
root
root
root
root
term/a
term/a
term/a
term/a
term/a
term/a
0.02 secs Fri May 12 18:25
0.00 secs Fri May 12 18:25
0.01 secs Fri May 12 18:25
0.00 secs Fri May 12 18:25
0.01 secs Fri May 12 18:25
0.00 secs Fri May 12 18:25
Log process creation
• Extended accounting adds task and flow
• Limited details, no process args
• DTrace does this much better for short
intervals

slide 55:

BSM auditing
Logs plenty of details
• Should be customised
• Not really an as-needed troubleshooting tool
• DTrace serves this role

slide 56:

Net-SNMP
$ snmpwalk -v1 -c public localhost|more
SNMPv2-MIB::sysDescr.0 = STRING: SunOS jupiter 5.10 Generic i86pc
SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.3
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (774102766) 89 days,
14:17:07.66
SNMPv2-MIB::sysContact.0 = STRING: "System administrator"
SNMPv2-MIB::sysName.0 = STRING: jupiter
SNMPv2-MIB::sysLocation.0 = STRING: "System administrators office"
SNMPv2-MIB::sysServices.0 = INTEGER: 72
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (90) 0:00:00.90
SNMPv2-MIB::sysORID.1 = OID: IF-MIB::ifMIB
SNMPv2-MIB::sysORID.2 = OID: SNMPv2-MIB::snmpMIB
SNMPv2-MIB::sysORID.3 = OID: TCP-MIB::tcpMIB
SNMPv2-MIB::sysORID.4 = OID: IP-MIB::ip
SNMPv2-MIB::sysORID.5 = OID: UDP-MIB::udpMIB
Added to Solaris 10

slide 57:

... Unanswered Questions
disk I/O by process. easily.
• network I/O by process.
short-lived process analysis. easily.
interrupt driver on-CPU time
So what else couldn't you answer on older
Solaris?

slide 58:

End of Chapter 1
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 59:

Chapter 2
Introducing DTrace
What DTrace is
● What role DTrace plays
● A one-liner Demonstration
● DTrace Scripting
● DTrace resources

slide 60:

DTrace
Dynamic Tracing
Solaris 10 & OpenSolaris
Difficult to fully explain! We will use,
a) features
b) what it is like
c) what is “DTrace”
d) when to use it
e) demonstrations

slide 61:

DTrace Features
A framework for performance observability
and debugging in real-time
• Examines from user space to the kernel,
user-land functions and instructions
• library calls and instructions
• system calls
• kernel functions
• device driver functions
... from the same tool
• Is safe to use in production

slide 62:

DTrace Features
Can examine applications without restarting
them
• Has low impact when running, and zero
impact when not
• Dynamically inserts trace points into the
kernel and running applications, called
probes
• Can trace over 30,000 kernel probes

slide 63:

DTrace Features
Can trace as many application probes as it
finds functions and instructions (possibly
millions)
• Can trace system boot activity – before init!
• Provides a C-like language for writing
custom scripts
• Has solved countless long-term
performance issue mysteries – often easily
• One of the most significant additions to the
field of Operating Systems

slide 64:

What DTrace is like
DTrace has similar features to the following,
truss
• apptrace
• truss -ua.out
• prex/tnf*
• lockstat
• mdb -k
• mdb -p
• C, awk
• ...
tracing system calls
tracing library calls
tracing user functions
tracing kernel functions
profiling the kernel
access to kernel VM
access to user VM
programming languages
+ more

slide 65:

What is “DTrace”
D programs
DTrace one-liners
/usr/sbin
intrstat(1M)
dtrace(1M)
humans
libdtrace(3LIB)
*lockstat(1M)
user-land
kernel-land
(event horizon)
dtrace(7D)
Providers
syscall
Probes
dtrace
The
DTrace
Framework

slide 66:

When to use DTrace
Sysadmins can use DTrace for performance
analysis and troubleshooting,
1. Monitoring
SNMP, sar, SunMC, ...
2. Identification
kstat (vmstat, mpstat, iostat)
procfs (ps, prstat), ...
3. Analysis
-->gt; DTrace

slide 67:

When to use DTrace
Application developers can use DTrace for
code profiling and fault finding,
1. Development
IDEs, vim/emacs
2. Testing
Compiler profiling, coded statistics
-->gt; DTrace gt; DTrace

slide 68:

Demonstration
This demonstration will introduce key
DTrace components and terminology
The aim is to explain the following,
# dtrace -n 'syscall::exec*:return { trace(execname); }'
dtrace: description 'syscall::exec*:return ' matched 2 probes
CPU
FUNCTION:NAME
exece:return
staroffice
exece:return
grep
exece:return
grep
exece:return
soffice
exece:return
dirname
exece:return
expr
exece:return
basename
exece:return
expr
exece:return
sopatchlevel.sh
exece:return
dirname
exece:return
expr

slide 69:

Listing Probes
# dtrace -l
PROVIDER
dtrace
dtrace
dtrace
fbt
fbt
fbt
MODULE
pool
pool
pool
FUNCTION NAME
BEGIN
END
ERROR
pool_info entry
pool_info return
pool_detach entry
# dtrace -l | wc -l
dtrace -l lists probes
• For this demonstration, there were 44796 probes.
This number varies depending on the OS build,
and which providers have been recently used.

slide 70:

Finding Probes
Since there are so many probes, being able to
find useful probes is a crucial task
• grep can be used to filter the dtrace -l
output, since grep is well known and RE's are
powerful
# dtrace -l | grep syscall | grep exec
syscall
syscall
syscall
syscall
Here we matched two forms of exec()
exec entry
exec return
exece entry
exece return

slide 71:

Specifying Probes #1
# dtrace -ln 'syscall::exec:entry'
PROVIDER
MODULE
syscall
FUNCTION NAME
exec entry
# dtrace -ln 'syscall::exec*:'
PROVIDER
MODULE
syscall
syscall
syscall
syscall
FUNCTION NAME
exec entry
exec return
exece entry
exece return
First, a single probe was matched by specifying
its fully qualified probe name
• Second, multiple probes were matched. “*” is a
wildcard, as are blank fields “::”

slide 72:

Specifying Probes #2
Short cuts exist – dropping left-hand fields is
the same as wildcards.
• For example, the following are the same
dtrace:::BEGIN
• :::BEGIN
• BEGIN
dtrace:::BEGIN fires when DTrace begins
tracing, much the same as awk's BEGIN (and
used for the same reasons too – printing headers).

slide 73:

Specifying Probes #3
The following are not the same,
syscall::read:entry
• read:entry
# dtrace -ln 'read:entry'
PROVIDER
MODULE
syscall
fbt
genunix
pid581
libc.so.1
FUNCTION NAME
read entry
read entry
read entry
If we only intended to match the syscall read(),
this shortcut matches others by mistake.
Brendan's Style Hint:
Use fully-qualified probe names, they are safest

slide 74:

Probe Names
# dtrace -ln 'syscall::exec:entry'
Provider
A library of probes.
The provider defines
the remaining fields.
PROVIDER
syscall
Module
Function
Name
To best understand these last fields,
1. choose provider
2. see chapter in the DTrace Guide
MODULE
FUNCTION NAME
exec entry

slide 75:

Tracing
After finding the desired probes (-l), you
are ready to trace activity,
# dtrace -n 'syscall::exec*:'
dtrace: description 'syscall::exec*:' matched 4 probes
CPU
FUNCTION:NAME
exece:entry
exece:return
exece:entry
exece:return
exece:entry
This is the default output
• Each output line represents a probe firing

slide 76:

Default Output
# dtrace -n 'syscall::exec*:'
dtrace: description 'syscall::exec*:' matched 4 probes
CPU
FUNCTION:NAME
exece:entry
exece:return
DTrace
Probe-id.
not so
interesting
The CPU-id.
if this changes,
the output may
be shuffled!
Last 2 fields of
probe that fired
How many probes
your probe description
actually matched

slide 77:

Actions
A custom action can be performed when a
probe fires
# dtrace -n 'syscall::exec*: { trace(execname); }'
dtrace: description 'syscall::exec*: ' matched 4 probes
CPU
FUNCTION:NAME
exece:entry
bash
exece:return
date
exece:entry
bash
exece:return
trace() prints one argument
• execname is the process name

slide 78:

Process Creation
PID 81
bash
fork()
PID 83
bash
exec()
PID 83
fork variants: fork1(2), forkall(2), vfork(2)
• exec variants: exec(2), exece(2)

slide 79:

Syscall Entry and Return
PID 81
bash
fork()
PID 83
bash
PID 83
exec()
entry
return
# dtrace -n 'syscall::exec*: { trace(execname); }'
dtrace: description 'syscall::exec*: ' matched 4 probes
CPU
FUNCTION:NAME
exece:entry
bash
exece:return
Both syscall entry and return can be traced

slide 80:

Tracing Process Creation
# dtrace -n 'syscall::exec*:return { trace(execname); }'
dtrace: description 'syscall::exec*:return ' matched 2 probes
CPU
FUNCTION:NAME
exece:return
staroffice
exece:return
grep
exece:return
grep
exece:return
soffice
exece:return
dirname
exece:return
expr
exece:return
basename
exece:return
expr
exece:return
sopatchlevel.sh
exece:return
dirname
For exec() the return probe is traced, as this is when
the destination execname is available on-CPU
This shows new processes from running “soffice”

slide 81:

Terminology
A summary of DTrace terminology,
Consumer
Probe
Action
# dtrace -n 'syscall::exec*:return { trace(execname); }'
Provider Module Function Name
It's important to be able to talk-the-talk...

slide 82:

Provider Intro
There are many providers,
syscall
• vminfo
• sysinfo
• io
• sched
• profile
• dtrace
• pid
• fbt
• ...
System call entry and return probes
Virtual Memory statistic probes
Classic sysinfo statistic probes
Disk and NFS events
system scheduling events
fixed sampling
program BEGIN/END probes
user-level tracing
raw kernel tracing

slide 83:

Provider Questions
1.Where are syscalls documented?
2.The pid provider traces user-level activity:
Where are library functions documented?
Where are user functions documented?
3.The fbt provider traces kernel-level activity:
Where are kernel functions documented?
4.DTrace provides its own custom providers.
Where would they be documented?

slide 84:

Provider Documentation
Documentation
man(2)
DTrace Guide
Providers
syscall
vminfo
sched
profile
dtrace
library funcs
user funcs
pid
Solaris Internals 2nd Ed kernel
cvs.opensolaris.org
funcs
fbt
man(3C)
source code
“The Best Of”
(Stable)

slide 85:

Action Intro
trace() prints 1 argument
• printf() may also be used
• DTrace variable builtins include,
execname
• pid
• timestamp
• probefunc
• probename
• curthread
• curpsinfo
process name
process ID
time since boot, nanoseconds
probe function component
probe name component
pointer to current thread
pointer to psinfo like structure

slide 86:

printf Example
Enhancing our previous one-liner,
# dtrace -n 'syscall::exec*:return { printf("%6d %s", pid, execname); }'
dtrace: description 'syscall::exec*:return ' matched 2 probes
CPU
FUNCTION:NAME
exece:return 28751 staroffice
exece:return 28752 grep
exece:return 28755 grep
exece:return 28753 soffice
format characters include,
Style:
spaces after
commas.
%d integers
%f floats
%s strings
%S safe strings – escaped binary characters
%Y text formatted time (use with “walltimestamp”)

slide 87:

walltimestamp
# dtrace -n 'syscall::exec*:return { printf("%-20Y %6d %s", walltimestamp, pid,
execname); }'
dtrace: description 'syscall::exec*:return ' matched 2 probes
CPU
FUNCTION:NAME
exece:return 2006 May 22 02:12:36 28762 staroffice
exece:return 2006 May 22 02:12:36 28763 grep
exece:return 2006 May 22 02:12:36 28765 grep
exece:return 2006 May 22 02:12:36 28766 soffice
exece:return 2006 May 22 02:12:36 28767 dirname
exece:return 2006 May 22 02:12:36 28764 expr
exece:return 2006 May 22 02:12:36 28768 basename
Our output still contains the DTrace defaults.
This can be eliminated using -q.
• Our command is >gt; 80 characters.
This can be better managed using a script file.

slide 88:

quiet mode
# dtrace -qn 'syscall::exec*:return { printf("%-20Y %6d %s\n",
walltimestamp, pid, execname); }'
2006 May 22 02:14:36 28773 staroffice
2006 May 22 02:14:36 28774 grep
2006 May 22 02:14:36 28776 grep
2006 May 22 02:14:36 28777 soffice
2006 May 22 02:14:36 28778 dirname
2006 May 22 02:14:36 28775 expr
2006 May 22 02:14:36 28779 basename
2006 May 22 02:14:36 28780 expr
2006 May 22 02:14:36 28781 sopatchlevel.sh
2006 May 22 02:14:36 28782 dirname
dtrace -q suppresses the default output
• A \n is now needed to terminate each line

slide 89:

DTrace Script File
# cat -n exec.d
1 #!/usr/sbin/dtrace -qs
3 syscall::exec*:return
4 {
printf("%-20Y %6d %s\n", walltimestamp, pid, execname);
6 }
# chmod 755 exec.d
# ./exec.d
2006 May 22 02:20:59 28788 staroffice
2006 May 22 02:20:59 28789 grep
2006 May 22 02:20:59 28791 grep
2006 May 22 02:20:59 28792 soffice
2006 May 22 02:20:59 28793 dirname
Style:
tab indent
actions.
So far, so good. The script file is easier to edit.
• Some improvements can be made...

slide 90:

DTrace Scripting
# cat -n exec.d
1 #!/usr/sbin/dtrace -s
3 #pragma D option quiet
5 dtrace:::BEGIN
6 {
printf("%-20s %6s %s\n", "TIME", "PID", "CMD");
8 }
10 syscall::exec*:return
11 {
printf("%-20Y %6d %s\n", walltimestamp, pid, execname);
13 }
# ./exec.d
TIME
PID CMD
2006 May 22 02:24:17 28801 staroffice
2006 May 22 02:24:17 28802 grep
2006 May 22 02:24:17 28804 grep
2006 May 22 02:24:17 28805 soffice

slide 91:

DTrace Scripting
Seems like awk!
• “pragma D option quiet” is the same as -q,
however is more obvious (especially when
more options are used).
• Printing a heading serves to both label the
output, and to indicate when DTrace has
began tracing.
dtrace:::BEGIN probe fires at start (headings)
• dtrace:::END
probe fires at end (reports)

slide 92:

curpsinfo->gt;pr_psargs
To print more than just the execname: try,
# dtrace -qn 'syscall::exec*:return { printf("%6d %s\n", pid,
curpsinfo->gt;pr_psargs); }'
29396 /bin/sh /usr/bin/soffice
29397 grep StarOffice 7 /export/home/brendan/.sversionrc
29398 grep StarOffice 7 /export/home/brendan/.sversionrc
29396 /bin/sh /usr/staroffice7/program/soffice
29400 /usr/bin/sh /usr/bin/dirname /usr/staroffice7/program/soffice
29400 /usr/bin/expr /usr/staroffice7/program/soffice/ : \(/\)/*[^/]*//*$
| /usr/staro
29402 /usr/bin/sh /usr/bin/basename /usr/staroffice7/program/soffice
29402 /usr/bin/expr //usr/staroffice7/program/soffice : \(.*[^/]\)/*$ : .
*/\(..*\) :
29404 /bin/sh /usr/staroffice7/program/sopatchlevel.sh
29405 /usr/bin/sh /usr/bin/dirname /
usr/staroffice7/program/sopatchlevel.sh
29405 /usr/bin/expr /usr/staroffice7/program/sopatchlevel.sh/ : \(/\)/*
[^/]*//*$ | /u
29406 uname -s

slide 93:

DTrace Resources
OpenSolaris DTrace Community
http://www.opensolaris.org/os/community/dtrace
DTrace Guide
http://docs.sun.com/app/docs/doc/817-6223
DTrace Tools
http://www.brendangregg.com/dtrace.html
Solaris Performance and Tools: DTrace and
MDB Techniques for Solaris 10 and
OpenSolaris. McDougall/Mauro/Gregg
ISBN: 0-13-156819-1

slide 94:

End of Chapter 2
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 95:

Chapter 3
Command Line DTrace
The syscall Provider
● Entry/Return Arguments
● Predicates
● Aggregations
● Stack tracing
● Intro to sysinfo Provider
● Intro to profile Provider

slide 96:

syscall Provider
This provider traces syscall entry and
returns
• syscalls are the interface between user-land
and the kernel, and reflect much of an
applications behaviour
• syscalls are well documented – man(2)
• This provider is a great place to start
learning DTrace

slide 97:

Entry Arguments
The syscall entry arguments are available as
unsigned 64-bit ints, with the names: arg0,
arg1, arg2, ...
These arguments can be casted if need be
to the appropriate types. eg, (int)arg0
• Typed versions may be provided as args[0],
args[1], args[2], ...
Arguments are listed in the syscall's man
page

slide 98:

Entry Argument Examples
syscall::read:entry,
arg0
arg1
arg2
syscall::mkdir:entry,
arg0
arg1
(int) file descriptor
(void *) buffer
(size_t) requested read size
(char *) path
(mode_t) mode
String pointers can't be read unless you
dereference them. Use copyinstr().

slide 99:

Entry Arguments: Integers
# dtrace -n 'syscall::read:entry { trace(arg2); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
CPU
FUNCTION:NAME
read:entry
read:entry
read:entry
read:entry
read:entry
read:entry
read:entry
read:entry
Integer arguments are easy to use.
• Here read()'s arg2 is printed – the requested
read size. Most are over 1 Kbyte.

slide 100:

Entry Arguments: Strings
# dtrace -n 'syscall::open:entry { trace(copyinstr(arg0)); }'
dtrace: description 'syscall::open:entry ' matched 1 probe
CPU
FUNCTION:NAME
open:entry
/var/ld/ld.config
open:entry
/usr/lib/libc.so.1
open:entry
/var/ld/ld.config
open:entry
/usr/lib/libgen.so.1
open:entry
/usr/lib/libc.so.1
open:entry
/var/ld/ld.config
open:entry
/usr/lib/libgen.so.1
open:entry
/usr/lib/libc.so.1
open:entry
/var/ld/ld.config
open:entry
/usr/lib/libc.so.1
copyinstr() is a convienent function to fetch
strings from user-land to the kernel.
• copyin() also exists, for fetching any data.

slide 101:

Return Code
The syscall return code is available from the
return probe as arg0.
• The man page for the syscall should explain
what the return code is.
• In general, the return code is,
>gt;= 0
failure
success
If a syscall fails, the error code is available
as the DTrace built-in “errno”. See
/usr/include/sys/errno.h for a description.

slide 102:

Return Code Example
For example, syscall::read:return has,
arg0
number of successful bytes read,
or -1 for failure
# dtrace -n 'syscall::read:return { trace(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
CPU
FUNCTION:NAME
read:return
read:return
read:return
read:return
read:return
read:return
read:return
Most actual bytes read are

slide 103:

Return Code Hints
The return code is often typed as an int, but
sometimes (x86) isn't,
# dtrace -n 'syscall::open:return { trace(arg0); }'
dtrace: description 'syscall::open:return ' matched 1 probe
CPU
FUNCTION:NAME
open:return
open:return
open64:return
# dtrace -n 'syscall::open:return { trace((int)arg0); }'
dtrace: description 'syscall::open:return ' matched 1 probe
CPU
FUNCTION:NAME
open:return
open:return
open64:return
Here a “cat /etc/shadow” was run as non-root on
x86, with arg0 uncasted and casted.

slide 104:

Questions
Would it be useful to only trace syscalls that
failed? (ie, return arg0 == -1)
• Would it be useful to trace syscalls with
unusual entry arguments: eg, read()s of a
size less than 8 bytes?
• Would it be useful to match on execname
and pid?
• You could just dump everything and grep
later. But there is a better way...

slide 105:

Predicates
Allow filtering of trace data
• Are a boolean expression used to determine
if an action is performed
• For example,
# dtrace -n 'syscall::read:entry /arg2

slide 106:

Predicates
General Syntax,
probe-description
/predicate/
action;
action;
Style:
place the probe,
predicate and
braces on separate
lines.
This is the most common way DTrace does
if-then-else statements (which DTrace does
not have directly).

slide 107:

Predicate Examples
/execname == “ls”/ process is “ls” - will
match multiple processes, all called “ls”
• /(int)arg0

slide 108:

truss-like DTrace
# dtrace -n 'syscall:::entry /execname == "ls"/ { printf("%x %x %x",
arg0, arg1, arg2); }'
dtrace: description 'syscall:::entry ' matched 231 probes
CPU
FUNCTION:NAME
resolvepath:entry 8047fed 804738c 3ff
sysconfig:entry 6 468d fec1d444
resolvepath:entry d27fd9dc 804738c 3ff
xstat:entry 2 8047fed 80477b8
open:entry d27f7a24 0 0
xstat:entry 2 d27fbf38 8047070
resolvepath:entry d27fbf38 80470f8 3ff
open:entry d27fbf38 0 0
This snoops syscalls from all processes called “ls”
It prints three arguments in hex (or tries to)
The output is verbose. Much more so for bigger
apps. Remember truss -c?

slide 109:

Aggregations
A DTrace variable type.
• In rough order of complexity,
char
• int
Common
• long
Programming
• double
Language
• float
Variable Types
• pointer
• array
• associative array
• aggregation
• struct
• object
Complexity

slide 110:

Why Aggregations First
Aggregations are introduced early because
we want you to use them as much as
possible!
• Their advantages are:
They summarise the data in often the most
appropriate way – providing the final report
• Aggregations are fast – they do not lock between
CPUs in the same way as other data types

slide 111:

Aggregation count()
# dtrace -n 'syscall:::entry { @fred = count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
The Aggregation name is “@fred”
The Aggregation function is “count()” - which counts
occurrences
When Ctrl-C is hit, DTrace prints @fred. This is for
convience. Manually printing aggregations is
possible if desired.
Here, DTrace observed 11783 system calls.

slide 112:

Aggregation by execname
# dtrace -n 'syscall:::entry { @fred[execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
svc.configd
svc.startd
nautilus
httpd
nscd
java
bash
dtrace
sshd
acroread
find
The key is now execname, using “@fred[execname]”
This is producing a frequency count by execname report
Here, “find” processes caused 15,068 syscalls

slide 113:

Aggregation by syscall
# dtrace -n 'syscall:::entry /execname == "find"/ { @fred[probefunc] =
count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
write
fcntl
fsat
close
fstat64
fchdir
getdents64
acl
gtime
lstat64
“probefunc” is the 3rd probe field, which for the syscall
provider is the syscall name
Here, we can see which syscalls find is calling

slide 114:

Aggregation Syntax
The general syntax is,
@name[key] = function(args)
• The name and the key are optional
• Examples,
@num[execname] = count();
• @[execname] = count();
• @total = count();
Multiple keys can be used. Eg,
@num[pid, execname] = count();
Style:
I like to name
my aggregates.

slide 115:

Multiple Keys
# dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
gnome-terminal
gnome-terminal
nautilus
java
sshd
acroread
dtrace
nscd
find
The key is now [pid, execname]
• DTrace has printed them in neat columns
Any fancier key combinations, and DTrace may not be
so neat – we'll need to tweak the output.

slide 116:

Aggregating Functions
So far we have only seen count().
Which is certainly fairly useful.
• Aggregating functions,
count()
count occurances
• sum(value)
sum value
• avg(value)
average value
• min(value)
find value minimum
• max(value)
find value maximum
• quantize(value)
power-of-2 distribution plot
• lquantize(value, min, max, step)
linear distribution plot

slide 117:

Aggregation Demos
The Aggregating functions will be used to
analyse the behaviour of the following “wc”
command:
# wc /usr/share/man/windex
This file is 931886 bytes (910 Kbytes)
• The “wc” command will read through the
input file – so there should be many syscalls
to trace.

slide 118:

Aggregation sum()
# dtrace -n 'syscall::read:entry { @bytes[execname] = sum(arg2); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
bash
sshd
This sums the requested read() bytes by execname
“wc” requested 942080 bytes. This almost matches the
file,
# ls -l /usr/share/man/windex
-rw-r--r-1 root
root
931886 Dec 22 14:23 /usr/share/man/windex
Why would there be a difference?

slide 119:

sum() on return
Summing the return is now attempted,
# dtrace -n 'syscall::read:return { @bytes[execname] = sum(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
sshd
Perfect match!
# ls -l /usr/share/man/windex
-rw-r--r-1 root
root
931886 Dec 22 14:23 /usr/share/man/windex
Although be careful: syscalls can return -1 for
failure – which would wreck our sum() value. A
predicate could be used to avoid this.

slide 120:

Questions
What would be the size of each read?
• How many reads would occur?

slide 121:

Aggregation avg()
The average read size can be aggregated:
# dtrace -n 'syscall::read:return { @bytes[execname] = avg(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
bash
sshd
The average read size is 8103 bytes.
• Since 8 Kbytes == 8192 bytes, it sounds like
there is an 8 Kbyte ceiling. Could we check
this?

slide 122:

Aggregation min(), max()
The following fetches the min then max:
# dtrace -n 'syscall::read:return { @bytes[execname] = min(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
bash
sshd
# dtrace -n 'syscall::read:return { @bytes[execname] = max(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
bash
sshd
8.00 Kbytes is indeed the maximum.

slide 123:

Aggregation count()
Counting syscalls is always useful:
# dtrace -n 'syscall::read:entry { @num[execname] = count(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
bash
sshd
115 reads()s occurred.
• This makes sense,
average read size: 8103 bytes
• file size: 931886 bytes
• 931886 / 8103 = 115.005

slide 124:

Aggregations so far
By running,
count()
• sum()
• min(), max()
• avg()
we have learnt much about the behaviour of
the wc command.
• Similar knowledge can be gained much
more quickly using quantize().

slide 125:

Aggregation quantize()
# dtrace -n 'syscall::read:return { @dist[execname] = quantize(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
value ------------- Distribution ------------- count
-1 |
0 |
1 |
2 |
4 |
8 |
16 |
32 |
64 |
128 |
256 |
512 |
1024 |
2048 |
4096 |
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 113
16384 |

slide 126:

Distribution Plots
value ------------- Distribution ------------- count
-1 |
0 |
1 |
2 |
4 |
8 |
16 |
32 |
64 |
128 |
256 |
512 |
mostly
byte
reads
1024 |
2048 |
4096 |
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 113
16384 |
occurrence
was at least
this value
number of
occurrences
Conveys a count, min, max, avg, (and with
some mental calculations) a sum; all at once!

slide 127:

Aggregation lquantize()
When power-of-2 distributions are not
suitable, lquantize provides linear
customisable distributions.
• The arguments are,
lquantize(value, min, max, step)

slide 128:

Aggregation lquantize()
# dtrace -n 'syscall::read:entry { @dist[execname] = lquantize(arg0, 0,
128, 1); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
bash
value ------------- Distribution ------------- count

slide 129:

Raw Data
Remember that raw data can always be
dumped, for other post processing:
# dtrace -n 'syscall::read:return /execname == "wc"/ { trace(arg0); }'
dtrace: description 'syscall::read:return ' matched 1 probe
CPU
FUNCTION:NAME
read:return
read:return
read:return
read:return
read:return
read:return
Aggregations are great summaries, but
sometimes you just want the raw data.

slide 130:

Stack Traces
DTrace provides 3 stack trace functions:
● stack()
● ustack()
● jstack()
kernel stack trace
user stack trace
java stack trace
These explain why your event is occurring –
the ancestory.
dtrace(1M) needs the process to still exist to
evaluate symbols for ustack() and jstack().

slide 131:

ustack()
# dtrace -n 'syscall::read:return /execname == "bash"/ { ustack(); }'
dtrace: description 'syscall::read:return ' matched 1 probe
CPU
FUNCTION:NAME
read:return
libc.so.1`_read+0x15
bash`rl_getc+0x1f
bash`rl_read_key+0xad
bash`readline_internal_char+0x5c
bash`0x80abf72
bash`0x80abf8c
bash`readline+0x37
bash`0x80675ad
bash`0x8067525
bash`0x8067d2b
bash`0x80686b4
bash`0x8068275
bash`yyparse+0x12f
bash`parse_command+0x56
bash`read_command+0x8c
bash`reader_loop+0xdd
bash`main+0x638
bash`0x806395a

slide 132:

ustack() short-lived processes
Now we will run ustack() on the read() from
the short-lived “wc” command:
# dtrace -n 'syscall::read:return /execname == "wc"/ { ustack(); }'
dtrace: description 'syscall::read:return ' matched 1 probe
CPU
FUNCTION:NAME
read:return
0xd279e465
0xd27856de
0x8050c70
0x8050a56
wc completes and dissapears before the
dtrace(1M) command can read its symbol
table.

slide 133:

ustack() and mdb
One technique is to use mdb to set a
breakpoint on exit:
$ mdb /usr/bin/wc
>gt; exit:b
>gt; :r /usr/share/man/windex
11821 113846 931886 /usr/share/man/windex
mdb: stop at 0x805090c
mdb: target stopped at:
PLT:exit:
jmp
*0x8062014
>gt;
# dtrace -n 'syscall::read:return /execname == "wc"/ { ustack(); }'
dtrace: description 'syscall::read:return ' matched 1 probe
CPU
FUNCTION:NAME
read:return
libc.so.1`_read+0x15
libc.so.1`fread+0xb6
wc`main+0x184
wc`0x8050a56

slide 134:

ustack() aggregations
The entire stack trace can be used as a key.
• This measures the most frequent stack trace
that caused the probe.
• The stack can be truncated using an integer
argument: eg, ustack(5) for 5 lines only.
# dtrace -n 'syscall::read:return /execname == "zsh"/ { @num[ustack
()] = count(); }'
dtrace: description 'syscall::read:return ' matched 1 probe
libc.so.1`_read+0x15
libc.so.1`getlogin+0x2a
zsh`createparamtable+0x1e2
zsh`setupvals+0x5bd
zsh`zsh_main+0x1ae
zsh`main+0xe
zsh`0x8059b9e
This stack trace
occurred only once

slide 135:

... continued
libc.so.1`_read+0x15
zle.so`0xd251fd22
zle.so`getkey+0x130
zle.so`0xd251f520
zle.so`getkeymapcmd+0x48
zle.so`getkeycmd+0x2c
zle.so`zlecore+0x6b
zle.so`zleread+0x403
zsh`autoload_zleread+0x3e
zsh`0x8079c59
zsh`ingetc+0x6d
zsh`0x80734fa
zsh`gettok+0x18
zsh`yylex+0x17
zsh`parse_event+0x23
zsh`loop+0x91
zsh`zsh_main+0x1d0
zsh`main+0xe
zsh`0x8059b9e
This entire
stack trace
occurred
20 times

slide 136:

Back to the Syscall Provider
So far we have just used the syscall provider
• There are some difficulties to be aware of:
when tracing writes – be aware that there is
write(), writev(), pwrite(), pwrite64()
• when tracing reads – be aware that there is
read(), readv(), pread(), pread64()
There is also readlink() - for symlinks
• other variants exist. open(), open64()...
readv() and writev() cause problems –
fetching the size from an iovec isn't easy

slide 137:

sysinfo Provider Intro
sysinfo provides probes that trace regular
system activity
• sysinfo:::readch traces all the read variants,
and provides the successful bytes read as
arg0
• sysinfo:::writech traces all the write variants,
and provides the successful bytes written as
arg0
• Don't need to worry about syscall return of -1

slide 138:

Bytes Read by Process Name
sysinfo does this easily,
# dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'
dtrace: description 'sysinfo:::readch ' matched 4 probes
bash
sshd
Handles read variants
• This could use quantize(), or trace writech
• We should now be able to construct many
useful one-liners using syscall, sysinfo and
aggregations

slide 139:

profile Provider Intro
profile samples activity at a custom rate
• The maximum sample rate is 5000 Hertz
# dtrace -n 'profile:::profile-100hz { @num[execname] = count(); }'
dtrace: description 'profile:::profile-100hz ' matched 1 probe
dirname
bash
sopatchlevel.sh
uname
expr
soffice
staroffice
javaldx
soffice.bin
pagein
sched
This is sampling execname at 100 Hertz

slide 140:

profile and ustack()
# dtrace -n 'profile:::profile-100hz /execname == "prstat"/ {
@num[ustack(5)] = count(); }'
dtrace: description 'profile:::profile-100hz ' matched 1 probe
libc.so.1`memcpy+0x41
prstat`0x8053a54
prstat`main+0x791
prstat`0x8051dc2
libc.so.1`_pread+0x15
prstat`0x8053606
prstat`0x80537d3
prstat`main+0x791
prstat`0x8051dc2
This can help identify where a process is
spending its time

slide 141:

Using profile
The profile provider allows simple and useful
one-liners and scripts to be written
The overheads of using profile are fixed to
the sampling rate; unlike other providers,
where a high frequency of traced events can
slow the target.
Be aware of sampling issues. If accurate
measurements are desired, use other
providers to measure on an event-by-event
basis.

slide 142:

End of Chapter 3
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 143:

Chapter 4
DTrace one-liners
List useful one-liners
● Discuss when to use them

slide 144:

One-Liners Are Great
Remember “handy one-liners for sed”?
• One-liners can provide simple solutions to
common problems
• One-liners may not need the same arduous
approval procedures that scripts may require
• Someone once emailed me to say:
thanks for the DTraceToolkit. I got as far as the
one-liners, and found I could do everything I
needed to.
(so much for writing 100+ scripts!)

slide 145:

Useful One-Liners
The DTrace one-liners list is kept in the
DTraceToolkit under Docs/oneliners.txt
(http://www.opensolaris.org/os/community/dtrace/dtracetoolkit)
So far we have covered enough DTrace
syntax to understand most of them.
Many probes they use will be new, but
readily understandable in such a practical
context.
I will demonstrate many of these live...

slide 146:

One-liners #1
# New processes with arguments,
dtrace -n 'proc:::exec-success { trace(curpsinfo->gt;pr_psargs); }'
# Files opened by process name,
dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'
# Files created using creat() by process name,
dtrace -n 'syscall::creat*:entry { printf("%s %s",execname,copyinstr(arg0)); }'
# Syscall count by process name,
dtrace -n 'syscall:::entry { @num[execname] = count(); }'
# Syscall count by syscall,
dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
# Syscall count by process ID,
dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }'
# Read bytes by process name,
dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'
# Write bytes by process name,
dtrace -n 'sysinfo:::writech { @bytes[execname] = sum(arg0); }'

slide 147:

One-liners #2
# Read size distribution by process name,
dtrace -n 'sysinfo:::readch { @dist[execname] = quantize(arg0); }'
# Write size distribution by process name,
dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }'
# Disk size by process ID,
dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->gt;b_bcount); }'
# Disk size aggregation
dtrace -n 'io:::start { @size[execname] = quantize(args[0]->gt;b_bcount); }'
# Pages paged in by process name,
dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }'
# Minor faults by process name,
dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }'
# Interrupts by CPU,
dtrace -n 'sdt:::interrupt-start { @num[cpu] = count(); }'
# CPU cross calls by process name,
dtrace -n 'sysinfo:::xcalls { @num[execname] = count(); }'

slide 148:

One-liners #3
# Lock time by process name,
dtrace -n 'lockstat:::adaptive-block { @time[execname] = sum(arg1); }'
# Lock distribution by process name,
dtrace -n 'lockstat:::adaptive-block { @time[execname] = quantize(arg1); }'
# Kernel funtion calls by module
dtrace -n 'fbt:::entry { @calls[probemod] = count(); }'
# Stack size for processes
dtrace -n 'sched:::on-cpu { @[execname] = max(curthread->gt;t_procp->gt;p_stksize);}'
# Kill all top processes when they are invoked,
dtrace -wn 'syscall::exece:return /execname == "top"/ { raise(9); }'
# New processes with arguments and time,
dtrace -qn 'syscall::exec*:return { printf("%Y %s\n",walltimestamp,
curpsinfo->gt;pr_psargs); }'
# Successful signal details,
dtrace -n 'proc:::signal-send /pid/ { printf("%s -%d %d",execname,args[2],
args[1]->gt;pr_pid); }'

slide 149:

End of Chapter 4
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 150:

Chapter 5
DTrace Mentality 1
Approach Strategies
● Thinking in DTrace

slide 151:

Strategy #1: snoop or summary
snoop
watch events as they occur
raw data shows all details
may be too verbose
achieved using trace() or printf()
summary
produce a summary report of data
often the fastest way to process data
may lose information due to summarising
achieved using aggregates @

slide 152:

Snoop
# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU
FUNCTION:NAME
exece:return
staroffice
exece:return
grep
exece:return
grep
exece:return
soffice
exece:return
dirname
exece:return
expr
Output “chugs” along at 1 Hertz. To improve:
command line
#pragma D option switchrate=10hz
script
dtrace -x switchrate=10hz
Output can be shuffled slightly – check the CPU
column, or print timestamp and post-process.

slide 153:

Summary
# dtrace -n 'syscall::exece:return { @num[execname] = count(); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
soffice.bin
pagein
javaldx
basename
sopatchlevel.sh
soffice
staroffice
dirname
grep
expr
uname
The most frequent occurring item is often an
issue, but not always.

slide 154:

Strategy #2: drill down analysis
Start broad, then focus on potential issues
• For example:
# vmstat 1
kthr
memory
page
disk
r b w
swap free re mf pi po fr de sr cd s0 -- -0 0 41 857772 109964 20 86 24 1 1 0 4 2 0 0 0
0 0 51 707120 75408 0 36 0 0 0 0 0 0 0 0 0
0 0 51 707120 75408 0
0 0 0 0 0 0 0 0 0 0
0 0 51 707120 75408 0
0 0 0 0 0 0 0 0 0 0
0 0 51 707120 75408 0
0 0 0 0 0 0 0 0 0 0
faults
cpu
cs us sy id
277 416 216 1 1 98
290 1218 268 1 3 96
279 1285 269 1 4 95
285 1188 263 0 3 97
276 1154 234 1 3 96
vmstat provides a general system-wide view - a
good starting point. Chapter 1 lists others.
• What may be interesting from this output?

slide 155:

drill down analysis
vmstat showed many syscalls
• DTrace can identify if a single process
(name) is responsible:
# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
svc.configd
inetd
nscd
httpd
svc.startd
sshd
java
dtrace
acroread
top

slide 156:

drill down analysis
top is calling most of the system calls
• Now the syscall type is identified:
# dtrace -n 'syscall:::entry /execname == "top"/ { @num[probefunc] =
count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
pollsys
write
gtime
sysconfig
getuid
uadmin
llseek
getdents64
ioctl
open
read
close

slide 157:

drill down analysis
open(), read(), close() were all called 2121
times – perhaps by the same function.
• DTrace can now aggregate the user stack
for these functions:
# dtrace -n 'syscall::open:entry /execname == "top"/ {
@num[ustack()] = count(); }'
dtrace: description 'syscall::open:entry ' matched 1 probe
libc.so.1`__open+0x15
libc.so.1`open+0x77
top`getptable+0xe5
top`get_process_info+0x14
top`main+0x695
top`_start+0x80

slide 158:

drill down analysis
libc.so.1`__open+0x15
libc.so.1`open+0x77
top`getptable+0xe5
top`get_process_info+0x14
top`main+0x695
top`_start+0x80
culprit
The same stack trace is seen for all syscalls
• Great – we have identified the function
responsible for calling so many syscalls
• What can be done next?

slide 159:

further drill down analysis
Other analysis that can then be performed:
examine arguments to syscalls to understand the
nature of the activity.
measure elapsed and on-cpu timestamps to prove
that this large number of syscalls really is an issue (it
may not be!)
read the source code to the application (if available) to
fully understand the issue and suggest a fix.
examine activity from other providers other than the
syscall layer.
A lot more can be done. We have only just begun!

slide 160:

Strategy #3: frequency count
When in doubt, frequency count.
# dtrace -n 'sysinfo::: /execname == "top"/ { @num[probename] =
count(); }'
dtrace: description 'sysinfo::: ' matched 59 probes
trap
writech
syswrite
outch
inv_swtch
pswitch
readch
sysread
namei
Be aware that this is counting occurrences that a
probe was fired. ie, the readch statistic probe was
called 2727 times. Don't infer too much...

slide 161:

Frequency Count
# dtrace -n 'sysinfo::: /execname == "top"/ { @num[probename] =
sum(arg0); }'
dtrace: description 'sysinfo::: ' matched 59 probes
syswrite
inv_swtch
pswitch
sysread
namei
writech
outch
readch
Now we can see the value of the statistic.
readch is at 814464 bytes.

slide 162:

Frequency Count
# dtrace -ln mib:::| wc
Why study 485 probes if most are never
called? If it's never called, it's never called!
# dtrace -n 'mib::: { @num[probename] = count(); }'
dtrace: description 'mib::: ' matched 485 probes
udpInDatagrams
ipInDelivers
tcpInAckBytes
tcpRttUpdate
tcpInDataInorderSegs
tcpInAckSegs
tcpInDataInorderBytes
ipOutRequests
tcpOutDataBytes
tcpOutDataSegs
ipInReceives

slide 163:

Strategy #4: known count
Cause a fault or workload a known number
of times. Probes of interest will occur that
known number of times (or a multiple).
Here a non-root user modifies /etc/motd and
tries to save 17 times:
# dtrace -n 'syscall:::entry /execname == "vi"/ { @num[probefunc] =
count(); }'
dtrace: description 'syscall:::entry ' matched 231 probes
getpid
stat64
ioctl
read
write

slide 164:

Yellow Pig
I like to create 17 events, as 17 is not a
commonly occurring number, and is more
likely to be related to your test. A similar
moderately sized prime would be 23.
• The number of eyelashes on a yellow pig is
supposedly 17
• Hence, the “Yellow Pig number” is 17
• “Yellow Pig Day” is July 17th, celebrated by
mathematicians
• (I'm not making this up - try google!)

slide 165:

Strategy #5: aggregate stacks
Stack Traces can be aggregated in at least
two ways:
A) Event probes
when an event of interest occurs, the stack is
aggregated
• this can explain why the event is taking place – the
functions that lead up to the event
B) Profile sampling
the stack trace is aggregated at 100hz (at least)
• this is a crude but often effective way to show where
the application is spending most of its time

slide 166:

Thinking in DTrace
DTrace itself is an easy language, with the
syntax well defined in the DTrace Guide
• Being successful with DTrace is about the
application – being able to think in DTrace
• Tips for thinking in DTrace
practise, practise, practise
• know the existing tools inside out (chapter 1)
• know the OS well (Solaris Internals 2nd edition)
• study source code, and how applications are
constructed and operate
• think outside the box!

slide 167:

End of Chapter 5
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 168:

Chapter 6
Providers
Provider Guide
● Using fbt
● Using pid

slide 169:

Providers
syscall
System call entry and return probes
sysinfo
Classic sysinfo statistic probes
vminfo
Virtual Memory statistic probes
Disk and NFS events
proc
Process events such as creation
sched
System scheduling events
lockstat Kernel synchronisation lock events
plockstat User synchronisation lock events
mib
MIB statistic updates
fasttrap User location tracing
fpuinfo
SPARC FPU simulation events

slide 170:

Providers
sdt
profile
dtrace
fbt
pid
Statically defined tracing
Fixed sampling
Program BEGIN/END probes
Raw kernel tracing
User-level tracing
Check the DTrace Guide for additional
providers

slide 171:

syscall
Example one-liner,
# Syscall count by syscall,
dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
Probes the interface between user-land and
the kernel
• Provides many insights into application
behaivour
• syscalls are documented in section 2 of the
man pages

slide 172:

sysinfo
Example one-liner,
# Read bytes by process name,
dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'
Probes the sysinfo statistics, which are
provided by kstat to tools such as mpstat
• Provides many insights into application and
system behaivour
• Documented in the DTrace Guide (chap 23)
• Also see /usr/include/sys/sysinfo.h,
cpu_sysinfo.h

slide 173:

vminfo
Example one-liner,
# Pages paged in by process name,
dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }'
Probes the vminfo statistics, which are
provided by kstat to tools such as vmstat
• Provides many insights into virtual memory
behaivour
• Documented in the DTrace Guide (chap 24)
• Also see /usr/include/sys/sysinfo.h,
cpu_vminfo.h

slide 174:

Example one-liner,
# Disk size aggregation
dtrace -n 'io:::start { @sz[execname] = quantize(args[0]->gt;b_bcount); }'
Probes I/O events, for both disk and NFS
• Provides a probe for the start and finish of
each I/O transaction
• Provides crucial data for understanding how
applications are driving the disks
• Documented in the DTrace Guide (chap 27)

slide 175:

proc
Example one-liner,
# New processes with arguments,
dtrace -n 'proc:::exec-success { trace(curpsinfo->gt;pr_psargs); }'
Probes process events such as exec, exit,
thread creation and exit, and signals
• Provides high level probes so that process
analysis can be performed easily
• Documented in the DTrace Guide (chap 25)

slide 176:

sched
Example one-liner,
# Times the schedular begins to run a thread,
dtrace -n 'sched:::on-cpu { @on[execname] = count(); }'
Probes thread scheduling events
• Provides high level probes so that schedular
analysis can be performed easily
• Documented in the DTrace Guide (chap 26)
• Also see Solaris Internals 2nd edition

slide 177:

fbt
Example one-liner,
# Kernel funtion calls by module
dtrace -n 'fbt:::entry { @calls[probemod] = count(); }'
Function Boundary Tracing – traces raw
kernel function events
• These probes are “unstable” - their names
and arguments can and do change between
minor releases of Solaris (which is why we
have the higher level providers)
• fbt is usually the bulk of the probes: 30,000+

slide 178:

fbt
Since fbt can trace the entire kernel,
observing just about any kernel behaviour is
possible
• Enabling all fbt probes (30,000+) will slow
the kernel noticeably. Be selective
• Documentation for the fbt probes (which is
really documentation for the kernel) is in:
The DTrace Guide (chap 20)
● Solaris Internals 2nd edition
● cvs.opensolaris.org

slide 179:

pid
Example one-liner,
# Trace user-level function entries,
dtrace -n 'pid$target:a.out::entry' -c 'some_command'
The pid provider runs a command (-c cmd),
or traces a particular process (-p PID).
• The pid provider can:
trace user functions
● trace user instructions
● trace library functions
● trace library instructions
The pid provider can create millions of probes

slide 180:

pid:::
The provider name is,
pidPID
Where PID is the target to trace
• pid81
Trace PID 81
• pid$target
Trace the target of -c or -p
The full syntax is,
pidPID:segment:function:name
segment is the name of the mapping. eg,
libc.so.1 (or “libc”) a library
• a.out
the binary
function is the function name. eg, “main”
• name is either: entry, return, or an instruction
address

slide 181:

pid demonstrations
user functions,
# dtrace -n 'pid$target:a.out::entry' -c 'ping mars'
dtrace: description 'pid$target:a.out::entry' matched 16 probes
mars is alive
dtrace: pid 29890 has exited
CPU
FUNCTION:NAME
0 39187
__fsr:entry
0 39188
main:entry
0 39191
send_scheduled_probe:entry
0 39198
set_IPv4_options:entry
0 39190
schedule_sigalrm:entry
0 39199
check_reply:entry
0 39195
seq_match:entry
0 39191
send_scheduled_probe:entry

slide 182:

pid demonstrations
user instructions,
# dtrace -n 'pid$target:a.out:set_IPv4_options:' -c 'ping mars'
dtrace: description 'pid$target:a.out:set_IPv4_options:' matched 169 probes
mars is alive
dtrace: pid 29900 has exited
CPU
FUNCTION:NAME
set_IPv4_options:entry
set_IPv4_options:0
set_IPv4_options:1
set_IPv4_options:3
set_IPv4_options:6
set_IPv4_options:7
set_IPv4_options:8
set_IPv4_options:9
set_IPv4_options:10
set_IPv4_options:12
set_IPv4_options:19
set_IPv4_options:1b
set_IPv4_options:1f
set_IPv4_options:15d
set_IPv4_options:15e
set_IPv4_options:15f
set_IPv4_options:160
set_IPv4_options:162
set_IPv4_options:163
set_IPv4_options:return

slide 183:

pid demonstrations
library functions,
# dtrace -n 'pid$target:::entry' -c 'ping mars'
dtrace: description 'pid$target:::entry' matched 5318 probes
mars is alive
dtrace: pid 29905 has exited
CPU
FUNCTION:NAME
0 39320
call_array:entry
0 39321
call_init:entry
0 39352
leave:entry
0 39351
fmap_setup:entry
0 39596
munmap:entry
0 39296
rt_bind_clear:entry
0 39287
_rt_bind_clear:entry
0 39294
rt_mutex_unlock:entry
0 39289
_rt_null:entry
0 39296
rt_bind_clear:entry
0 39287
_rt_bind_clear:entry
0 65195
libc_init:entry
0 63314
atexit:entry
0 64852
lmalloc:entry

slide 184:

pid demonstrations
library instructions,
# dtrace -n 'pid$target:libc:getpid:' -c 'ping mars'
dtrace: description 'pid$target:libc:getpid:' matched 9 probes
mars is alive
dtrace: pid 29914 has exited
CPU
FUNCTION:NAME
0 39188
getpid:entry
0 39189
getpid:0
0 39190
getpid:5
0 39191
getpid:6
0 39192
getpid:b
0 39193
getpid:d
0 39194
getpid:13
0 39195
getpid:15
0 39187
getpid:return
0 39188
getpid:entry
0 39189
getpid:0
0 39190
getpid:5
0 39191
getpid:6

slide 185:

End of Chapter 6
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 186:

Chapter 7
The D Language
Comments
● Variable Types
● Macros
● Script Options
● Pragmas

slide 187:

Comments
The D language is related to the C
programming language. This includes
comment syntax:
/* this is a one line comment */
* This is a
* block comment.

slide 188:

Variable Types
DTrace tries to figure out the variable type
on first declaration
• Variable types can be predeclared
• Types include:
integer
● string
● pointer
● associative array
● aggregate
● thread-local
● clause-local

slide 189:

Integers
Using integers:
int mycount;
● mycount = 1;
● mycount++;
● /mycount >gt; 10/
declaration
declaration / setting
increment by 1
testing in a predicate
A declaration of the form “int mycount;” must
be placed outside of any action block. They
are usually placed at the top of the program.

slide 190:

Strings
Using strings:
string name;
● name = “Fred Nurke”;
● /name == “Fred”/
● name = 0;
declaration
declaration / setting
predicate testing
memory cleanup
Strings are null terminated
• printf() supports,
%s string format
• %S
safe string format (escapes odd chars)

slide 191:

Pointers
Used in the same way as C programming
• DTrace can walk most structures without
casting (eg, vnode_t, proc_t, ...)
• Special builtins,
curthread
● curpsinfo
pointer to the on-cpu thread
poniter to a psinfo like structure
Eg,
curpsinfo->gt;pr_psargs
80 char arg list
● curthread->gt;t_procp->gt;p_lwpcnt
# of LWPs

slide 192:

Associative Arrays
Also known as “hashed arrays”, or
“key/value arrays”
Using associative arrays:
string name[int];
● name[3] = “Fred”;
● /name[3] == “Fred”/
● name[3] = 0;
● name[“sd0”, 1] = 4;
decleration
decleration / setting
predicate testing
memory cleanup
two key array
Try to use aggregates instead

slide 193:

Aggregates
Discussed in Chapter 3
• Using Aggregates:
@name[keys] = func();
setting
● printa(“%s... %@d”, @name);
printing
● trunc(@name);
clearing
● trunc(@name, size);
truncating to size
● clear(@name);
clearing values
● normalize(@name, n);
dividing by n
Aggregates perform well, and produce great reports

slide 194:

Thread-local
A variable stored private to each thread.
• Using thread-local:
self->gt;start = timestamp;
setting
● self int start;
declaration
● /self->gt;start >gt; 0/
predicates
● self->gt;start = 0;
memory cleanup
Thread-local variables are crucial, and are
used heavily when scripting in DTrace.

slide 195:

Clause-local
A temporary variable for calculations within
an action clause
• Using clause-local:
this->gt;delta = 5;
● this int delta;
setting
declaration
These are used to improve performance,
rather than using global integers or strings
that require lock overheads between CPUs
• Only use these within one action

slide 196:

Kernel Variables
Any symbol from the kernel can be read,
along with its struct members
• Example kernel variables:
`freemem
● `utsname.nodename
free mem in pages
hostname
Often only the source code can explain what
these really are
• Module variables can also be read (prefix
with the module name)

slide 197:

Macro Variables
DTrace provides some useful macro
variables for scripting:
$1, $2, ...
arguments to script as ints
● $$1, $$2, ... arguments to script as strings
● $pid
PID of dtrace(1M)
● $uid
real UID of dtrace(1M)
● $target
PID target of dtrace (-c or -p)
For the argument variables to assume
default values (0 or “”), use:
#pragma D option defaultargs

slide 198:

Script Options
When writing complex scripts, it can be
useful for them to process options. Eg:
# ./DTraceToolkit-0.96/dvmstat -h
USAGE: dvmstat [-h] { -p PID | -n name | command }
-p PID
# examine this PID
-n name
# examine this process name
eg,
dvmstat -p 1871
# examine PID 1871
dvmstat -n tar
# examine processes called "tar"
dvmstat df -h
# run and examine "df -h"
Currently this is best achieved by embedding the
DTrace script in a shell script, and using the shell's
getopts function. Be careful to write neatly and follow
programming best practices – such scripts get
messy fast.

slide 199:

Pragmas
These change the behaviour of DTrace, and
often an alternative to command line options
• Useful pragmas:
#pragma D option quiet
● #pragma D option switchrate=10hz
● #pragma D option bufsize=16m
● #pragma D option flowindent
● #pragma D option destructive
Others are listed in the DTrace Guide
(chapter 16)

slide 200:

Destructive Actions
These do more than just read data:
stop()
● raise(sig)
● chill(ns)
● system(cmd)
freeze the current process
send this sig to the current PID
pause for nanoseconds
run this shell command
copyout(buf, addr, bytes)
copy this
data to user-land
Use extreme caution when using destructive
actions

slide 201:

End of Chapter 7
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 202:

Chapter 8
Advanced Scripting
Step through an example script
● Discuss script internals

slide 203:

Example: shortlived.d
The following is from the DTraceToolkit:
# ./shortlived.d
Tracing... Hit Ctrl-C to stop.
short lived processes:
0.924 secs
total sample duration:
4.308 secs
Total time by process name,
soffice
3 ms
sopatchlevel.sh
10 ms
grep
14 ms
uname
24 ms
javaldx
33 ms
expr
36 ms
soffice.bin
93 ms
pagein
696 ms
Total time by PPID,
30 ms
93 ms
786 ms

slide 204:

shortlived.d
shortlived.d calculates CPU time consumed
by short-lived processes (processes that
began and ended while DTrace was tracing).
• A common performance issue is short-lived
processes hogging the CPU, which can be
difficult to identify using traditional tools
such as prstat.
• It is useful to discuss the entire internals of
this script, from comments to code.

slide 205:

shortlived.d
#!/usr/sbin/dtrace -qs
* shortlived.d - determine time spent by short lived processes.
Written in DTrace (Solaris 10 3/05).
* 27-May-2006, ver 0.94
* USAGE:
shortlived.d
# wait, then hit Ctrl-C
* Applications that run many short lived processes can cause load
* on the system that is difficult to identify - the processes
* aren't sampled in time by programs such as prstat. This program
* illustrates how much time was spent processing those extra
* processes, and a table of process name by total times for each.
* SEE ALSO: execsnoop
* Notes:
* - The measurements are minimum values, not all of the overheads
caused by process generation and destruction are measured (DTrace
can do so, but the script would become seriously complex).
* - The summary values are accurate, the by program and by PPID values
are usually slightly smaller due to rounding errors.

slide 206:

shortlived.d
* COPYRIGHT: Copyright (c) 2005, 2006 Brendan Gregg.
* CDDL HEADER START
* The contents of this file are subject to the terms of the
* Common Development and Distribution License, Version 1.0 only
* (the "License"). You may not use this file except in compliance
* with the License.
* You can obtain a copy of the license at Docs/cddl1.txt
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
* CDDL HEADER END
* 22-Apr-2005 Brendan Gregg
Created this.

slide 207:

shortlived.d
* Start
dtrace:::BEGIN
/* save start time */
start = timestamp;
/* this is time spent on shortlived processes */
procs = 0;
/* print header */
printf("Tracing... Hit Ctrl-C to stop.\n");

slide 208:

shortlived.d
* Measure parent fork time
syscall::*fork*:entry
/* save start of fork */
self->gt;fork = vtimestamp;
syscall::*fork*:return
/arg0 != 0 && self->gt;fork/
/* record on-CPU time for the fork syscall */
procs += vtimestamp - self->gt;fork;
self->gt;fork = 0;
* Measure child processes time

slide 209:

shortlived.d
syscall::*fork*:return
/arg0 == 0/
/* save start of child process */
self->gt;start = vtimestamp;
/* memory cleanup */
self->gt;fork = 0;
proc:::exit
/self->gt;start/
/* record on-CPU time for process execution */
this->gt;oncpu = vtimestamp - self->gt;start;
procs += this->gt;oncpu;
/* sum on-CPU by process name and ppid */
@Times_exec[execname] = sum(this->gt;oncpu/1000000);
@Times_ppid[ppid] = sum(this->gt;oncpu/1000000);
/* memory cleanup */
self->gt;start = 0;

slide 210:

shortlived.d
* Print report
dtrace:::END
this->gt;total = timestamp - start;
printf("short lived processes: %6d.%03d secs\n",
procs/1000000000, (procs%1000000000)/1000000);
printf("total sample duration: %6d.%03d secs\n",
this->gt;total/1000000000, (this->gt;total%1000000000)/1000000);
printf("\nTotal time by process name,\n");
printa("%18s %@12d ms\n", @Times_exec);
printf("\nTotal time by PPID,\n");
printa("%18d %@12d ms\n", @Times_ppid);

slide 211:

End of Chapter 8
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 212:

Chapter 9
The DTraceToolkit
Introduce the DTraceToolkit
● Explain Layout
● A Recipe For Getting Started

slide 213:

DTraceToolkit
http://www.opensolaris.org/os/community/dtrace/dtracetoolkit
http://www.brendangregg.com/dtrace.html
Version 0.96 contains over 100 scripts
• Freeware (CDDL)
• Serves to:
● Provide tools to analyse to common problems
● Provide numerous demonstrations of
programming in D
● Promote DTrace
Written for Solaris 10 3/05 as much as possible

slide 214:

Main Parts
The DTraceToolkit is currently 3 main parts:
The Scripts
● Man Pages
● Examples
Man
Docs/Examples
There is a man page for every script, and an
example file for every script.
• It took considerable time to write 100+ man
pages – so please read them!
• The example files are often more useful
than the man pages

slide 215:

Installation
Isn't required – the scripts can be run
immediately after extraction
• If desired, an install script has been
provided. It prompts for install options, and
defaults to installing under /opt/DTT
# ./install
DTraceToolkit Installation
--------------------------DTraceToolkit version 0.96, 24-Apr-2006
hit Ctrl-C any time you wish to quit.
Enter target directory for installation [/opt/DTT]:

slide 216:

Script Layout
The scripts are positioned in a hierarchy: the
most useful are in the top directory, and
others are in meaningful subdirectories
• The “Bin” directory contains symlinks to all
the scripts
DTraceToolkit-0.96# ls -F
Apps/
Guide
Bin/
Kernel/
Cpu/
License@
Disk/
Locks/
Docs/
Man/
Extra/
Mem/
FS/
Net/
Proc/
README@
System/
User/
Version
Zones/
dappprof*
dapptrace*
dexplorer*
dtruss*
dvmstat*
errinfo*
execsnoop*
install*
iosnoop*
iotop*
opensnoop*
procsystime*
rwsnoop*
rwtop*

slide 217:

The Scripts
If they end in .d, they are pure DTrace. See
the man page or try reading their header.
• If they don't end in .d, they are wrapped in
shell or Perl. See their man page and try
running them with -h.
• If they begin with,
● Tracing...
Hit Ctrl-C to end.
they are tracing events
● Sampling... Hit Ctrl-C to end.
they are sampling events using the profile provider

slide 218:

The Man Pages
Man Pages can be read using,
# MANPATH=$MANPATH:/opt/DTT/Man
# or wherever
# man iosnoop
Reformatting page. Please Wait... done
USER COMMANDS
NAME
iosnoop(1m)
iosnoop - snoop I/O events as they occur. Uses DTrace.
SYNOPSIS
iosnoop [-a|-A|-Deghinostv] [-d device]
mount_point] [-n name] [-p PID]
[-f
filename]
[-m
DESCRIPTION
iosnoop prints I/O events as they happen, with useful
details such as UID, PID, block number, size, filename, etc.
This is useful to determine the process responsible for
using the disks, as well as details on what activity the
process is requesting. Behaviour such as random or sequential I/O can be observed by reading the block numbers.

slide 219:

The Example Files
Read using,
# more Docs/Examples/bitesize_example.txt
In this example, bitesize.d was run for several seconds then Ctrl-C was hit.
As bitesize.d runs it records how processes on the system are accessing the
disks - in particular the size of the I/O operation. It is usually desirable
for processes to be requesting large I/O operations rather than taking many
small "bites".
The final report highlights how processes performed. The find command mostly
read 1K blocks while the tar command was reading large blocks - both as
expected.
# bitesize.d
Tracing... Hit Ctrl-C to end.
PID
CMD
-bash\0
value ------------- Distribution ------------- count
512 |
1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@

slide 220:

Script Subdirectories
So far there is,
Bin/
Apps/
Cpu/
Disk/
Extra/
Kernel/
Locks/
Mem/
Net/
Proc/
System/
User/
Zones/
Symlinks to the scripts
Application specific scripts
Scripts for CPU analysis
Scripts for disk I/O analysis
Misc scripts
Scripts for kernel analysis
Scripts for lock analysis
Scripts for memory analysis
Scripts for network analysis
Scripts for process analysis
Scripts for system analysis
Scripts for user based activity analysis
Scripts for analysis by zone
Remember to check the top-level scripts first

slide 221:

DTraceToolkit Recipe
Running the following scripts in this order
has proved a useful recipe:
1)execsnoop
2)iosnoop
3)opensnoop
4)errinfo
5)procsystime
6)rwtop
7)iotop
8)dvmstat
9)Disk/bitesize.d

slide 222:

dexplorer
This exists to capture DTrace data in the
most broad possible way
• It creates a tar file containing the output from
numerous DTrace commands
• If you only had 5 minutes to “DTrace” a
server, dexplorer may be the best tool to run
– it creates many files for later offline
analysis
• A html-iser for dexplorer has yet to be written
(it won't be hard to do)

slide 223:

Behind the DTraceToolkit
The DTraceToolkit is a product of much
testing – far more time has gone into testing
the scripts than actually writing them
• The scripts need to be tested on:
● SPARC and x86
● Solaris 10 3/05, Solaris 10 1/06, ...
● Different builds of OpenSolaris (makes a huge
difference for any fbt based scripts)
● All possible workloads
If a script's output is 99% correct, is that
“correct”?

slide 224:

Testing
A script's output that is incorrect 1% of the time, is
incorrect.
Imagine a “ps -ef” that has 1 bad line out of every 100.
Only in some cases is this acceptable – where the data
is clearly marked as an estimation.
For example, prstat's “SIZE” and “RSS” are not 100%
accurate, but they are still useful estimates as we are
aware that they are estimates.

slide 225:

End of Chapter 9
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 226:

Chapter 10
DTrace Mentality 2
More DTrace Strategies

slide 227:

Strategy #6: Elapsed Time
Elapsed time is the calculated using,
start = timestamp;
elapsed = timestamp - start;
Elapsed time explains why an application is
experiencing slow response times
• Elapsed time includes disk I/O times,
network I/O times, scheduling latency, ...
• Elapsed times can be measured at the
application, treating the OS as a black box

slide 228:

Measuring Times
Some difficulties when measuring these:
Knowing what is the start probe and what is the
end probe.
● If you are lucky, the provider will already have it as
:entry and :return, or :start and :done.
● If you are unlucky, you will need to find the probes
from what is available. Try a previous strategy –
“known count”.
Associating the start probe to the end probe
● If you are lucky, they will be in the same thread and
you can use a thread-local variable. Eg, self->gt;start
● If you are unlucky, you'll need associative arrays and
some other ID as the key.

slide 229:

Strategy #7: On-CPU Time
On-CPU time is the calculated using,
start = vtimestamp;
oncpu = vtimestamp - start;
On-CPU time explains why the CPUs are
busy
• On-CPU time excludes disk I/O times,
network I/O times, scheduling latency, ...

slide 230:

Elapsed vs On-CPU
DTraceToolkit-0.96# more Docs/Examples/dtruss_example.txt
In the following example, syscall elapsed and overhead times are measured.
Elapsed times represent the time from syscall start to finish; overhead
times measure the time spent on the CPU,
# dtruss -eon bash
PID/LWP
ELAPSD
3911/1:
3911/1: 1001579
3911/1:
3911/1: 1019129
3911/1:
3911/1:
3911/1:
3911/1: 1094323
3911/1:
3911/1: 1210496
3911/1:
3911/1:
3911/1:
CPU SYSCALL(args)
= return
26 write(0x2, "l\0", 0x1)
= 1 0
43 read(0x0, "s\0", 0x1)
= 1 0
26 write(0x2, "s\0", 0x1)
= 1 0
43 read(0x0, " \001\0", 0x1)
= 1 0
26 write(0x2, " \0", 0x1)
= 1 0
43 read(0x0, "-\0", 0x1)
= 1 0
26 write(0x2, "-\001\0", 0x1)
= 1 0
42 read(0x0, "l\0", 0x1)
= 1 0
27 write(0x2, "l\001\0", 0x1)
= 1 0
44 read(0x0, "\r\0", 0x1)
= 1 0
28 write(0x2, "\n\001\0", 0x1)
= 1 0
1 lwp_sigmask(0x3, 0x2, 0x0)
= 0xFFBFFEFF 0
63 ioctl(0x0, 0x540F, 0x80F6D00)
= 0 0

slide 231:

Strategy #8: Milestones
Try to understand the behaviour of the target
by identifying application-level activity
milestones. For example:
establishing a connection
• beginning a transaction
• completing a transaction
• writing to a log
These may be identified using:
the probe name – ie, the function name
• ustack() when solaris statistics are triggered
• engaging in “known count” strategy
• dereferencing random strings

slide 232:

End of Chapter 10
Chapter 1 – Solaris 9 Performance Tools
Chapter 2 – Introducing DTrace
Chapter 3 – Command Line DTrace
Chapter 4 – DTrace one-liners
Chapter 5 – DTrace Mentality 1
Chapter 6 – Providers
Chapter 7 – The D Language
Chapter 8 – Advanced Scripting
Chapter 9 – The DTraceToolkit
Chapter 10 – DTrace Mentality 2

slide 233:

Workshops
DTrace Workshops
Now it's time for you to get your hands dirty solving a
variety of problems
I'll create problems on your systems at random
When you solve them, don't shout out the answer!
Many people find this this is the best way to learn –
you need to think on your own
Good Luck!