LinuxConEU2014_LinuxPerfTools.pdf

LinuxCon Europe 2014: Linux Performance Tools 2014

Talk by Brendan Gregg for LinuxCon Europe, 2014.

Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY

Description: "There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems."

	next prev 1/90
	next prev 2/90
	next prev 3/90
	next prev 4/90
	next prev 5/90
	next prev 6/90
	next prev 7/90
	next prev 8/90
	next prev 9/90
	next prev 10/90
	next prev 11/90
	next prev 12/90
	next prev 13/90
	next prev 14/90
	next prev 15/90
	next prev 16/90
	next prev 17/90
	next prev 18/90
	next prev 19/90
	next prev 20/90
	next prev 21/90
	next prev 22/90
	next prev 23/90
	next prev 24/90
	next prev 25/90
	next prev 26/90
	next prev 27/90
	next prev 28/90
	next prev 29/90
	next prev 30/90
	next prev 31/90
	next prev 32/90
	next prev 33/90
	next prev 34/90
	next prev 35/90
	next prev 36/90
	next prev 37/90
	next prev 38/90
	next prev 39/90
	next prev 40/90
	next prev 41/90
	next prev 42/90
	next prev 43/90
	next prev 44/90
	next prev 45/90
	next prev 46/90
	next prev 47/90
	next prev 48/90
	next prev 49/90
	next prev 50/90
	next prev 51/90
	next prev 52/90
	next prev 53/90
	next prev 54/90
	next prev 55/90
	next prev 56/90
	next prev 57/90
	next prev 58/90
	next prev 59/90
	next prev 60/90
	next prev 61/90
	next prev 62/90
	next prev 63/90
	next prev 64/90
	next prev 65/90
	next prev 66/90
	next prev 67/90
	next prev 68/90
	next prev 69/90
	next prev 70/90
	next prev 71/90
	next prev 72/90
	next prev 73/90
	next prev 74/90
	next prev 75/90
	next prev 76/90
	next prev 77/90
	next prev 78/90
	next prev 79/90
	next prev 80/90
	next prev 81/90
	next prev 82/90
	next prev 83/90
	next prev 84/90
	next prev 85/90
	next prev 86/90
	next prev 87/90
	next prev 88/90
	next prev 89/90
	next prev 90/90

PDF: LinuxConEU2014_LinuxPerfTools.pdf

Keywords (from pdftotext):

slide 1:

Oct, 2014
Linux	
  Performance	
  Tools	
  
Brendan	
  Gregg	
  
Senior	
  Performance	
  Architect	
  
Performance	
  Engineering	
  Team
	
  
	
  	
  
	
  
bgregg@ne8lix.com	
  
@brendangregg

slide 2:

A	
  quick	
  tour	
  of	
  many	
  tools…

slide 3:

• Massive	
  AWS	
  EC2	
  Linux	
  cloud	
  
– Tens	
  of	
  thousands	
  of	
  instances	
  
– Autoscale	
  by	
  ~3k	
  each	
  day	
  
– CentOS	
  and	
  Ubuntu	
  
• FreeBSD	
  for	
  content	
  delivery	
  
– Approx	
  33%	
  of	
  US	
  Internet	
  traﬃc	
  at	
  night	
  
• Performance	
  is	
  criRcal	
  
– Customer	
  saRsfacRon:	
  >gt;50M	
  subscribers	
  
– $$$	
  price/performance	
  
– Develop	
  tools	
  for	
  cloud-‐wide	
  analysis;	
  
use	
  server	
  tools	
  as	
  needed	
  
• Just	
  launched	
  in	
  Europe!

slide 4:

Brendan	
  Gregg	
  
• Senior	
  Performance	
  Architect,	
  Ne8lix	
  
– Linux	
  and	
  FreeBSD	
  performance	
  
– Performance	
  Engineering	
  team	
  (@coburnw)	
  
• Recent	
  work:	
  
– Linux	
  perf-‐tools,	
  using	
  crace	
  &	
  perf_events	
  
– Systems	
  Performance,	
  PrenRce	
  Hall	
  
• Previous	
  work	
  includes:	
  
– USE	
  Method,	
  ﬂame	
  graphs,	
  uRlizaRon	
  &	
  
latency	
  heat	
  maps,	
  DTrace	
  tools,	
  ZFS	
  L2ARC	
  
• Twijer	
  @brendangregg	
  (these	
  slides)

slide 5:

Agenda	
  
• Methodologies	
  &	
  Tools	
  
• Tool	
  Types:	
  
– Observability	
  
Basic	
  
Intermediate	
  
Advanced	
  
– Benchmarking	
  
– Tuning	
  
– StaRc	
  
• Tracing	
  
	
  	
  
Aim:	
  to	
  show	
  what	
  can	
  be	
  done	
  
Knowing	
  that	
  something	
  can	
  be	
  done	
  is	
  more	
  important	
  than	
  
knowing	
  how	
  to	
  do	
  it.

slide 6:

Methodologies	
  &	
  Tools

slide 7:

Methodologies	
  &	
  Tools	
  
• There	
  are	
  dozens	
  of	
  performance	
  tools	
  for	
  Linux	
  
– Packages:	
  sysstat,	
  procps,	
  coreuRls,	
  …	
  
– Commercial	
  products	
  
• Methodologies	
  can	
  provide	
  guidance	
  for	
  
choosing	
  and	
  using	
  tools	
  eﬀecRvely

slide 8:

An3-‐Methodologies	
  
• The	
  lack	
  of	
  a	
  deliberate	
  methodology…	
  
• Street	
  Light	
  AnR-‐Method:	
  
– 1.	
  Pick	
  observability	
  tools	
  that	
  are	
  
• Familiar	
  
• Found	
  on	
  the	
  Internet,	
  or	
  at	
  random	
  
– 2.	
  Run	
  tools	
  
– 3.	
  Look	
  for	
  obvious	
  issues	
  
• Drunk	
  Man	
  AnR-‐Method:	
  
– Tune	
  things	
  at	
  random	
  unRl	
  the	
  problem	
  goes	
  away

slide 9:

Methodologies	
  
• For	
  example,	
  the	
  USE	
  Method:	
  
– For	
  every	
  resource,	
  check:	
  
• URlizaRon	
  
• SaturaRon	
  
• Errors	
  
• 5	
  Whys:	
  
– Ask	
  “why?”	
  5	
  Rmes	
  
• Other	
  methods	
  include:	
  
– Workload	
  characterizaRon,	
  drill-‐down	
  analysis,	
  event	
  
tracing,	
  baseline	
  stats,	
  staRc	
  performance	
  tuning,	
  …	
  
• Start	
  with	
  the	
  quesRons,	
  then	
  ﬁnd	
  the	
  tools

slide 10:

Command	
  Line	
  Tools	
  
• Useful	
  to	
  study	
  even	
  if	
  you	
  never	
  use	
  them:	
  
GUIs	
  and	
  commercial	
  products	
  ocen	
  use	
  the	
  
same	
  interfaces	
  
Kernel	
  
	
  
/proc,	
  	
  /sys,	
  …	
  
$ vmstat 1!
procs -----------memory---------- ---swap-- …!
r b
swpd
free
buff cache
so …!
9 0
0 29549320 29252 9299060
2 0
0 29547876 29252 9299332
4 0
0 29548124 29252 9299460
5 0
0 29548840 29252 9299592

slide 11:

Tool	
  Types	
  
Type	
  
Observability	
  
Benchmarking	
  
Tuning	
  
StaRc	
  
Characteris.c	
  
Watch	
  acRvity.	
  Safe,	
  usually,	
  depending	
  
on	
  resource	
  overhead.	
  
Load	
  test.	
  CauRon:	
  producRon	
  tests	
  can	
  
cause	
  issues	
  due	
  to	
  contenRon.	
  
Change.	
  Danger:	
  changes	
  could	
  hurt	
  
performance,	
  now	
  or	
  later	
  with	
  load.	
  
Check	
  conﬁguraRon.	
  Should	
  be	
  safe.

slide 12:

Observability	
  Tools

slide 13:

How	
  do	
  you	
  measure	
  these?

slide 14:

Observability	
  Tools:	
  Basic	
  
• upRme	
  
• top	
  (or	
  htop)	
  
• ps	
  
• vmstat	
  
• iostat	
  
• mpstat	
  
• free

slide 15:

upRme	
  
• One	
  way	
  to	
  print	
  load	
  averages:	
  
	
  	
  	
  	
  
$ uptime!
07:42:06 up
8:16,
1 user,
load average: 2.27, 2.84, 2.91!
• A	
  measure	
  of	
  resource	
  demand:	
  CPUs	
  +	
  disks	
  
– Other	
  OSes	
  only	
  show	
  CPUs:	
  easier	
  to	
  interpret	
  
• ExponenRally-‐damped	
  moving	
  averages	
  with	
  
Rme	
  constants	
  of	
  1,	
  5,	
  and	
  15	
  minutes	
  
– Historic	
  trend	
  without	
  the	
  line	
  graph	
  
• Load	
  >gt;	
  #	
  of	
  CPUs,	
  may	
  mean	
  CPU	
  saturaRon	
  
– Don’t	
  spend	
  more	
  than	
  5	
  seconds	
  studying	
  these

slide 16:

top	
  (or	
  htop)	
  
• System	
  and	
  per-‐process	
  interval	
  summary:	
  
$ top - 18:50:26 up 7:43, 1 user, load average: 4.11, 4.91, 5.22!
Tasks: 209 total,
1 running, 206 sleeping,
0 stopped,
2 zombie!
Cpu(s): 47.1%us, 4.0%sy, 0.0%ni, 48.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st!
Mem: 70197156k total, 44831072k used, 25366084k free,
36360k buffers!
Swap:
0k total,
0k used,
0k free, 11873356k cached!
PID USER
PR NI VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
5738 apiprod
0 62.6g 29g 352m S 417 44.2
2144:15 java
1386 apiprod
0 17452 1388 964 R
0 0.0
0:00.02 top
1 root
0 24340 2272 1340 S
0 0.0
0:01.51 init
2 root
0 S
0 0.0
0:00.00 kthreadd
[…]!
• %CPU	
  is	
  summed	
  across	
  all	
  CPUs	
  
• Can	
  miss	
  short-‐lived	
  processes	
  (atop	
  won’t)	
  
• Can	
  consume	
  noRceable	
  CPU	
  to	
  read	
  /proc

slide 17:

htop

slide 18:

ps	
  
• Process	
  status	
  lisRng	
  (eg,	
  “ASCII	
  art	
  forest”):	
  
$ ps -ef f!
UID
PID PPID
[…]!
root
root
28261 4546
prod
28287 28261
prod
28288 28287
prod
3156 28288
root
root
4969 4965
[…]!
C STIME TTY
STAT
TIME CMD!
0 11:08 ?
0 17:24 ?
0 17:24 ?
0 17:24 pts/0
0 19:15 pts/0
0 11:08 ?
0 11:08 ?
0:00 /usr/sbin/sshd -D!
0:00 \_ sshd: prod [priv]!
0:00
\_ sshd: prod@pts/0 !
0:00
\_ -bash!
0:00
\_ ps -ef f!
0:00 /bin/sh /usr/bin/svscanboot!
0:00 \_ svscan /etc/service!
• Custom	
  ﬁelds:	
  
$ ps -eo user,sz,rss,minflt,majflt,pcpu,args!
USER
RSS MINFLT MAJFLT %CPU COMMAND!
root
6085 2272 11928
24 0.0 /sbin/init!
[…]!

slide 19:

vmstat	
  
• Virtual	
  memory	
  staRsRcs	
  and	
  more:	
  
$ vmstat –Sm 1!
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----!
r b
swpd
free
buff cache
cs us sy id wa!
8 0
12 25 34 0 0!
7 0
0 205 186 46 13 0 0!
8 0
8 210 435 39 21 0 0!
8 0
0 218 219 42 17 0 0!
[…]!
• USAGE:	
  vmstat	
  [interval	
  [count]]	
  
• First	
  output	
  line	
  has	
  some	
  summary	
  since	
  boot	
  
values	
  (should	
  be	
  all;	
  parRal	
  is	
  confusing)	
  
• High	
  level	
  CPU	
  summary.	
  “r”	
  is	
  runnable	
  tasks.

slide 20:

iostat	
  
• Block	
  I/O	
  (disk)	
  stats.	
  1st	
  output	
  is	
  since	
  boot.	
  
$ iostat -xmdz 1!
Linux 3.13.0-29 (db001-eb883efa)
08/18/2014
_x86_64_
Device:
rrqm/s
wrqm/s
r/s
w/s
rMB/s
xvda
xvdb
0.00 15299.00
xvdc
0.00 15271.00
md0
0.00 31082.00
(16 CPU)!
wMB/s \ ...!
0.00 / ...!
0.00 \ ...!
0.01 / ...!
0.01 \ ...!
Workload	
  
• Very	
  useful	
  
set	
  of	
  stats	
  
... \ avgqu-sz
... /
... \
... /
... \
await r_await w_await
ResulRng	
  Performance	
  
svctm
%util!
0.00!
86.40!
86.00!
0.00!

slide 21:

mpstat	
  
• MulR-‐processor	
  staRsRcs,	
  per-‐CPU:	
  
$ mpstat –P ALL 1!
[…]!
08:06:43 PM CPU
%usr
08:06:44 PM all 53.45
08:06:44 PM
0 49.49
08:06:44 PM
1 51.61
08:06:44 PM
2 58.16
08:06:44 PM
3 54.55
08:06:44 PM
4 47.42
08:06:44 PM
5 65.66
08:06:44 PM
6 50.00
[…]!
%nice
%sys %iowait
%irq
%soft %steal %guest
• Look	
  for	
  unbalanced	
  workloads,	
  hot	
  CPUs.	
  
%idle!
42.26!
45.45!
41.94!
33.67!
40.40!
49.48!
31.31!
47.92!

slide 22:

free	
  
• Main	
  memory	
  usage:	
  
$ free -m!
total
Mem:
-/+ buffers/cache:
Swap:
used
free
3313!
shared
	
  
• buﬀers:	
  block	
  device	
  I/O	
  cache	
  
• cached:	
  virtual	
  page	
  cache	
  
buffers
cached!
527!

slide 23:

Observability	
  Tools:	
  Basic

slide 24:

Observability	
  Tools:	
  Intermediate	
  
• strace	
  
• tcpdump	
  
• netstat	
  
• nicstat	
  
• pidstat	
  
• swapon	
  
• lsof	
  
• sar	
  (and	
  collectl,	
  dstat,	
  etc.)

slide 25:

strace	
  
• System	
  call	
  tracer:	
  
$ strace –tttT –p 313!
1408393285.779746 getgroups(0, NULL)
= 1 gt;!
1408393285.779873 getgroups(1, [0])
= 1 gt;!
1408393285.780797 close(3)
= 0 gt;!
1408393285.781338 write(1, "LinuxCon 2014!\n", 15LinuxCon 2014!!
) = 15 gt;!
• Eg,	
  -‐jt:	
  Rme	
  (us)	
  since	
  epoch;	
  -‐T:	
  syscall	
  Rme	
  (s)	
  
• Translates	
  syscall	
  args	
  
– Very	
  helpful	
  for	
  solving	
  system	
  usage	
  issues	
  
• Currently	
  has	
  massive	
  overhead	
  (ptrace	
  based)	
  
– Can	
  slow	
  the	
  target	
  by	
  >gt;	
  100x.	
  Use	
  extreme	
  cauRon.

slide 26:

tcpdump	
  
• Sniﬀ	
  network	
  packets	
  for	
  post	
  analysis:	
  
$ tcpdump -i eth0 -w /tmp/out.tcpdump!
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes!
^C7985 packets captured!
8996 packets received by filter!
1010 packets dropped by kernel!
# tcpdump -nr /tmp/out.tcpdump | head
reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet)
20:41:05.038437 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 18...!
20:41:05.038533 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 48...!
20:41:05.038584 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 96...!
[…]!
• Study	
  packet	
  sequences	
  with	
  Rmestamps	
  (us)	
  
• CPU	
  overhead	
  opRmized	
  (socket	
  ring	
  buﬀers),	
  
but	
  can	
  sRll	
  be	
  signiﬁcant.	
  Use	
  cauRon.

slide 27:

netstat	
  
• Various	
  network	
  protocol	
  staRsRcs	
  using	
  -‐s:	
  
$ netstat –s!
• A	
  mulR-‐tool:	
  
[…]!
-‐i:	
  interface	
  stats	
  
-‐r:	
  route	
  table	
  
default:	
  list	
  conns	
  
• netstat	
  -‐p:	
  shows	
  
process	
  details!	
  
• Per-‐second	
  interval	
  
with	
  -‐c	
  
Tcp:!
736455 active connections openings!
176887 passive connection openings!
33 failed connection attempts!
1466 connection resets received!
3311 connections established!
91975192 segments received!
180415763 segments send out!
223685 segments retransmited!
2 bad segments received.!
39481 resets sent!
[…]!
TcpExt:!
12377 invalid SYN cookies received!
2982 delayed acks sent!
[…]!

slide 28:

nicstat	
  
• Network	
  interface	
  stats,	
  iostat-‐like	
  output:	
  
$ ./nicstat 1!
Time
Int
21:21:43
21:21:43
eth0
Time
Int
21:21:44
21:21:44
eth0
Time
Int
21:21:45
21:21:45
eth0
[…]!
rKB/s
rKB/s
rKB/s
wKB/s
wKB/s
wKB/s
rPk/s
rPk/s
rPk/s
wPk/s
wPk/s
wPk/s
rAvs
wAvs %Util
4915.4 4915.4 0.00
139.8 0.00
rAvs
wAvs %Util
0.00 0.00
58.76 40441.3 0.00
rAvs
wAvs %Util
4400.8 4400.8 0.00
54.99 2979.1 0.00
Sat!
0.00!
0.00!
Sat!
0.00!
0.00!
Sat!
0.00!
0.00!
• Check	
  network	
  throughput	
  and	
  interface	
  %uRl	
  
• I	
  wrote	
  this	
  years	
  ago;	
  Tim	
  Cook	
  ported	
  to	
  Linux

slide 29:

pidstat	
  
• Very	
  useful	
  process	
  stats.	
  eg,	
  by-‐thread,	
  disk	
  I/O:	
  
$ pidstat -t 1!
Linux 3.2.0-54 (db002-91befe03) !08/18/2014
!_x86_64_ !(8 CPU)!
08:57:52 PM
TGID
TID
%usr %system %guest
%CPU
08:57:54 PM
- 484.75
0.00 524.58
08:57:54 PM
08:57:54 PM
08:57:54 PM
08:57:54 PM
[…]!
$ pidstat -d 1!
[…]!
08:58:27 PM
PID
kB_rd/s
kB_wr/s kB_ccwr/s Command!
08:58:28 PM
0.00 java!
[…]!
• I	
  usually	
  prefer	
  this	
  over	
  top(1)	
  
CPU
Command!
java!
|__java!
|__java!
|__java!
|__java!

slide 30:

swapon	
  
• Show	
  swap	
  device	
  usage:	
  
	
  $ swapon -s!
	
  
• If	
  you	
  have	
  swap	
  enabled…	
  
Filename
/dev/sda3
Type
partition
Size
Used
Priority!
-1!

slide 31:

lsof	
  
• More	
  a	
  debug	
  tool,	
  lsof(8)	
  shows	
  ﬁle	
  descriptor	
  
usage,	
  which	
  for	
  some	
  apps,	
  equals	
  current	
  
acRve	
  network	
  connecRons:	
  
# lsof -iTCP -sTCP:ESTABLISHED!
COMMAND
PID USER FD TYPE DEVICE SIZE/OFF NODE NAME!
sshd
755 root 3r IPv4 13576887
0t0 TCP bgregg-test-i-f106:ssh->gt;prod100.netflix.com:
15241 (ESTABLISHED)!
platforms 2614 app1 8u IPv4
0t0 TCP localhost:33868->gt;localhost:5433 (ESTABLISHED)!
postgres 2648 app1 7u IPv4
0t0 TCP localhost:5433->gt;localhost:33868 (ESTABLISHED)!
epic_plug 2857 app1 7u IPv4
0t0 TCP localhost:33885->gt;localhost:5433 (ESTABLISHED)!
postgres 2892 app1 7u IPv4
0t0 TCP localhost:5433->gt;localhost:33885 (ESTABLISHED)!
[…]!
• I’d	
  prefer	
  to:	
  	
  echo /proc/PID/fd | wc -l!

slide 32:

sar	
  
• System	
  AcRvity	
  Reporter.	
  Many	
  stats,	
  eg:	
  
$ sar -n TCP,ETCP,DEV 1!
Linux 3.2.55 (test-e4f1a80b)
!08/18/2014
!_x86_64_ !(8 CPU)!
09:10:43 PM IFACE rxpck/s txpck/s
rxkB/s
txkB/s rxcmp/s txcmp/s
09:10:44 PM
09:10:44 PM
eth0 4114.00 4186.00 4537.46 28513.24
09:10:43 PM active/s passive/s
iseg/s
oseg/s!
09:10:44 PM
4107.00 22511.00!
09:10:43 PM atmptf/s estres/s retrans/s isegerr/s
orsts/s!
09:10:44 PM
1.00!
[…]!
• Archive	
  or	
  live	
  mode:	
  (interval	
  [count])	
  
• Well	
  designed.	
  Header	
  naming	
  convenRon,	
  
logical	
  groups:	
  TCP,	
  ETCP,	
  DEV,	
  EDEV,	
  …	
  
rxmcst/s!
0.00!
0.00!

slide 33:

Observability:	
  sar

slide 34:

Other	
  Tools	
  
• You	
  may	
  also	
  use	
  collectl,	
  atop,	
  dstat,	
  or	
  another	
  
measure-‐all	
  tool	
  
• The	
  tool	
  isn’t	
  important	
  
• It’s	
  important	
  to	
  have	
  a	
  way	
  to	
  measure	
  
everything	
  you	
  want	
  
• In	
  cloud	
  environments,	
  you	
  are	
  probably	
  using	
  a	
  
monitoring	
  product,	
  developed	
  in-‐house	
  or	
  
commercial.	
  Same	
  method	
  applies…

slide 35:

How	
  does	
  your	
  monitoring	
  tool	
  
measure	
  these?

slide 36:

Observability	
  Tools:	
  Intermediate

slide 37:

Advanced	
  Observability	
  Tools	
  
• Misc:	
  
– ltrace,	
  ss,	
  iptraf,	
  ethtool,	
  snmpget,	
  lldptool,	
  iotop,	
  
blktrace,	
  slabtop,	
  /proc,	
  pcstat	
  
• CPU	
  Performance	
  Counters:	
  
– perf_events,	
  Rptop,	
  rdmsr	
  
• Advanced	
  Tracers:	
  
– perf_events,	
  crace,	
  eBPF,	
  SystemTap,	
  ktap,	
  LTTng,	
  
dtrace4linux,	
  sysdig	
  
• Some	
  selected	
  demos…

slide 38:

ss	
  
• More	
  socket	
  staRsRcs:	
  
$ ss -mop!
State
Recv-Q Send-Q
Local Address:Port
Peer Address:Port
CLOSE-WAIT 1
127.0.0.1:42295
127.0.0.1:28527
users:(("apacheLogParser",2702,3))!
! mem:(r1280,w0,f2816,t0)!
ESTAB
127.0.0.1:5433
127.0.0.1:41312
timer:(keepalive,36min,0) users:(("postgres",2333,7))!
! mem:(r0,w0,f0,t0)!
[…]!
$ ss –i!
State
Recv-Q Send-Q
Local Address:Port
Peer Address:Port
CLOSE-WAIT 1
127.0.0.1:42295
127.0.0.1:28527
cubic wscale:6,6 rto:208 rtt:9/6 ato:40 cwnd:10 send 145.6Mbps rcv_space:32792!
ESTAB
10.144.107.101:ssh
10.53.237.72:4532
cubic wscale:4,6 rto:268 rtt:71.5/3 ato:40 cwnd:10 send 1.5Mbps rcv_rtt:72
rcv_space:14480!
[…]!

slide 39:

iptraf

slide 40:

iotop	
  
• Block	
  device	
  I/O	
  (disk)	
  by	
  process:	
  
$ iotop!
Total DISK READ:
TID PRIO USER
959 be/4 root
6641 be/4 root
1 be/4 root
2 be/4 root
3 be/4 root
4 be/4 root
5 be/4 root
6 rt/4 root
[…]!
50.47 M/s | Total DISK WRITE:
59.21 M/s!
DISK READ DISK WRITE SWAPIN
IO>gt;
COMMAND
0.00 B/s
0.00 B/s 0.00 % 99.99 % [flush-202:1]!
50.47 M/s
82.60 M/s 0.00 % 32.51 % java –Dnop –X!
0.00 B/s
0.00 B/s 0.00 % 0.00 % init!
0.00 B/s
0.00 B/s 0.00 % 0.00 % [kthreadd]!
0.00 B/s
0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]!
0.00 B/s
0.00 B/s 0.00 % 0.00 % [kworker/0:0]!
0.00 B/s
0.00 B/s 0.00 % 0.00 % [kworker/u:0]!
0.00 B/s
0.00 B/s 0.00 % 0.00 % [migration/0]!
• Needs	
  kernel	
  support	
  enabled	
  
– CONFIG_TASK_IO_ACCOUNTING

slide 41:

slabtop	
  
• Kernel	
  slab	
  allocator	
  memory	
  usage:	
  
$ slabtop!
Active / Total Objects (% used)
: 4692768 / 4751161 (98.8%)!
Active / Total Slabs (% used)
: 129083 / 129083 (100.0%)!
Active / Total Caches (% used)
: 71 / 109 (65.1%)!
Active / Total Size (% used)
: 729966.22K / 738277.47K (98.9%)!
Minimum / Average / Maximum Object : 0.01K / 0.16K / 8.00K!
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
3565575 3565575 100%
0.10K 91425
365700K buffer_head!
314916 314066 99%
0.19K 14996
59984K dentry!
184192 183751 99%
0.06K
11512K kmalloc-64!
138618 138618 100%
0.94K
130464K xfs_inode!
138602 138602 100%
0.21K
29968K xfs_ili!
102116 99012 96%
0.55K
58352K radix_tree_node!
97482 49093 50%
0.09K
9284K kmalloc-96!
22695 20777 91%
0.05K
1068K shared_policy_node!
21312 21312 100%
0.86K
18432K ext4_inode_cache!
16288 14601 89%
0.25K
4072K kmalloc-256!
[…]!

slide 42:

pcstat	
  
• Show	
  page	
  cache	
  residency	
  by	
  ﬁle:	
  
# ./pcstat data0*!
|----------+----------------+------------+-----------+---------|!
| Name
| Size
| Pages
| Cached
| Percent |!
|----------+----------------+------------+-----------+---------|!
| data00
| 104857600
| 25600
| 25600
| 100.000 |!
| data01
| 104857600
| 25600
| 25600
| 100.000 |!
| data02
| 104857600
| 25600
| 4080
| 015.938 |!
| data03
| 104857600
| 25600
| 25600
| 100.000 |!
| data04
| 104857600
| 25600
| 16010
| 062.539 |!
| data05
| 104857600
| 25600
| 0
| 000.000 |!
|----------+----------------+------------+-----------+---------|!
• Uses	
  the	
  mincore(2)	
  syscall.	
  Useful	
  for	
  database	
  
performance	
  analysis.

slide 43:

perf_events	
  (counters)	
  
• Performance	
  Monitoring	
  Counters	
  (PMCs):	
  
$ perf list | grep –i hardware!
cpu-cycles OR cycles
stalled-cycles-frontend OR idle-cycles-frontend
stalled-cycles-backend OR idle-cycles-backend
instructions
[…]!
branch-misses
bus-cycles
L1-dcache-loads
L1-dcache-load-misses
[…]!
rNNN (see 'perf list --help' on how to encode it)
mem:gt;[:access]
[Hardware event]!
[Hardware event]!
[Hardware event]!
[Hardware event]!
[Hardware event]!
[Hardware event]!
[Hardware cache event]!
[Hardware cache event]!
	
  
• IdenRfy	
  CPU	
  cycle	
  breakdowns,	
  esp.	
  stall	
  types	
  
[Raw hardware event … !
[Hardware breakpoint]!
– PMCs	
  not	
  enabled	
  by-‐default	
  in	
  clouds	
  (yet)	
  
– Can	
  be	
  Rme-‐consuming	
  to	
  use	
  (CPU	
  manuals)	
  
• Use	
  ﬂame	
  graphs	
  to	
  visualize	
  sampled	
  stack	
  traces

slide 44:

perf_events	
  CPU	
  Flame	
  Graph	
  
Kernel	
  
TCP/IP	
  
Broken	
  
Java	
  stacks	
  
(missing	
  
frame	
  
pointer)	
  
GC	
  
Locks	
  
Time	
  
Idle	
  
thread	
  
epoll

slide 45:

Rptop	
  
• IPC	
  by	
  process,	
  %MISS,	
  %BUS	
  
• Needs	
  some	
  love.	
  perfmon2	
  library	
  integraRon?	
  
• SRll	
  can’t	
  use	
  it	
  in	
  clouds	
  yet	
  (needs	
  PMCs	
  enabled)

slide 46:

rdmsr	
  
• Model	
  Speciﬁc	
  Registers	
  (MSRs),	
  unlike	
  PMCs,	
  can	
  
be	
  read	
  by	
  default	
  in	
  Xen	
  guests	
  
– Timestamp	
  clock,	
  temp,	
  power,	
  …	
  
– Use	
  rdmsr(1)	
  from	
  the	
  msr-‐tools	
  package	
  to	
  read	
  them	
  
– Uses	
  include	
  (hjps://github.com/brendangregg/msr-‐cloud-‐tools):	
  
./showboost!
	
  ec2-guest#
[...]!
C0_MCYC
C0_ACYC
	
  TIME
06:11:35
UTIL
51%
50%
49%
RATIO
116%
115%
115%
06:11:40
06:11:45
[...]!
ec2-guest# ./cputemp 1!
CPU1 CPU2 CPU3 CPU4!
61 61 60 59!
CPU	
  Temperature	
  
60 61 60 60!
[...]!
MHz!
2900!
2899!
2899!
Real	
  CPU	
  MHz

slide 47:

More	
  Advanced	
  Tools…	
  
• Some	
  others	
  worth	
  menRoning:	
  
Tool	
  
ltrace	
  
ethtool	
  
snmpget	
  
lldptool	
  
blktrace	
  
/proc	
  
pmu-‐tools	
  
Descrip.on	
  
Library	
  call	
  tracer	
  
Mostly	
  interface	
  tuning;	
  some	
  stats	
  
SNMP	
  network	
  host	
  staRsRcs	
  
Can	
  get	
  LLDP	
  broadcast	
  stats	
  
Block	
  I/O	
  event	
  tracer	
  
Many	
  raw	
  kernel	
  counters	
  
On-‐	
  and	
  oﬀ-‐core	
  CPU	
  counter	
  tools

slide 48:

Advanced	
  Tracers	
  
• Many	
  opRons	
  on	
  Linux:	
  
– perf_events,	
  crace,	
  eBPF,	
  SystemTap,	
  ktap,	
  LTTng,	
  
dtrace4linux,	
  sysdig	
  
• Most	
  can	
  do	
  staRc	
  and	
  dynamic	
  tracing	
  
– StaRc:	
  pre-‐deﬁned	
  events	
  (tracepoints)	
  
– Dynamic:	
  instrument	
  any	
  socware	
  (kprobes,	
  
uprobes).	
  Custom	
  metrics	
  on-‐demand.	
  Catch	
  all.	
  
• Many	
  are	
  in-‐development.	
  
– I’ll	
  summarize	
  their	
  state	
  later…

slide 49:

Linux	
  Observability	
  Tools

slide 50:

Linux	
  Observability	
  Tools

slide 51:

Benchmarking	
  Tools

slide 52:

Benchmarking	
  Tools	
  
• MulR:	
  
– UnixBench,	
  lmbench,	
  sysbench,	
  perf	
  bench	
  
• FS/disk:	
  
– dd,	
  hdparm,	
  ﬁo	
  
• App/lib:	
  
– ab,	
  wrk,	
  jmeter,	
  openssl	
  
• Networking:	
  
– ping,	
  hping3,	
  iperf,	
  jcp,	
  traceroute,	
  mtr,	
  pchar

slide 53:

AcRve	
  Benchmarking	
  
• Most	
  benchmarks	
  are	
  misleading	
  or	
  wrong	
  
– You	
  benchmark	
  A,	
  but	
  actually	
  measure	
  B,	
  and	
  
conclude	
  that	
  you	
  measured	
  C	
  
• AcRve	
  Benchmarking	
  
1. Run	
  the	
  benchmark	
  for	
  hours	
  
2. While	
  running,	
  analyze	
  and	
  conﬁrm	
  the	
  
performance	
  limiter	
  using	
  observability	
  tools	
  
• We	
  just	
  covered	
  those	
  tools	
  –	
  use	
  them!

slide 54:

lmbench	
  
• CPU,	
  memory,	
  and	
  kernel	
  micro-‐benchmarks	
  
• Eg,	
  memory	
  latency	
  by	
  stride	
  size:	
  
$ lat_mem_rd 100m 128 >gt; out.latencies!
some R processing…!
L2	
  cache	
  
Main	
  
Memory	
  
L1	
  cache	
  
L3	
  cache

slide 55:

ﬁo	
  
• FS	
  or	
  disk	
  I/O	
  micro-‐benchmarks	
  
$ fio --name=seqwrite --rw=write --bs=128k --size=122374m!
[…]!
seqwrite: (groupid=0, jobs=1): err= 0: pid=22321!
write: io=122374MB, bw=840951KB/s, iops=6569 , runt=149011msec!
clat (usec): min=41 , max=133186 , avg=148.26, stdev=1287.17!
lat (usec): min=44 , max=133188 , avg=151.11, stdev=1287.21!
bw (KB/s) : min=10746, max=1983488, per=100.18%, avg=842503.94,
stdev=262774.35!
cpu
: usr=2.67%, sys=43.46%, ctx=14284, majf=1, minf=24!
IO depths
: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >gt;=64=0.0%!
submit
: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >gt;=64=0.0%!
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >gt;=64=0.0%!
issued r/w/d: total=0/978992/0, short=0/0/0!
lat (usec): 50=0.02%, 100=98.30%, 250=1.06%, 500=0.01%, 750=0.01%!
lat (usec): 1000=0.01%!
lat (msec): 2=0.01%, 4=0.01%, 10=0.25%, 20=0.29%, 50=0.06%!
lat (msec): 100=0.01%, 250=0.01%!
• Results	
  include	
  basic	
  latency	
  distribuRon

slide 56:

pchar	
  
• Traceroute	
  with	
  bandwidth	
  per	
  hop!	
  
$ pchar 10.71.83.1!
[…]!
4: 10.110.80.1 (10.110.80.1)!
Partial loss:
0 / 5 (0%)!
Partial char:
rtt = 9.351109 ms, (b = 0.004961 ms/B), r2 = 0.184105!
stddev rtt = 4.967992, stddev b = 0.006029!
Partial queueing: avg = 0.000000 ms (0 bytes)!
Hop char:
rtt = --.--- ms, bw = 1268.975773 Kbps!
Hop queueing:
avg = 0.000000 ms (0 bytes)!
5: 10.193.43.181 (10.193.43.181)!
Partial loss:
0 / 5 (0%)!
Partial char:
rtt = 25.461597 ms, (b = 0.011934 ms/B), r2 = 0.228707!
stddev rtt = 10.426112, stddev b = 0.012653!
Partial queueing: avg = 0.000000 ms (0 bytes)!
Hop char:
rtt = 16.110487 ms, bw = 1147.210397 Kbps!
Hop queueing:
avg = 0.000000 ms (0 bytes)!
[…]!
• Needs	
  love.	
  Based	
  on	
  pathchar	
  (Linux	
  2.0.30).

slide 57:

Benchmarking	
  Tools

slide 58:

Tuning	
  Tools

slide 59:

Tuning	
  Tools	
  
• Generic	
  interfaces:	
  
– sysctl,	
  /sys	
  
• Many	
  areas	
  have	
  custom	
  tuning	
  tools:	
  
– ApplicaRons:	
  their	
  own	
  conﬁg	
  
– CPU/scheduler:	
  nice,	
  renice,	
  taskset,	
  ulimit,	
  chcpu	
  
– Storage	
  I/O:	
  tune2fs,	
  ionice,	
  hdparm,	
  blockdev,	
  …	
  
– Network:	
  ethtool,	
  tc,	
  ip,	
  route	
  
– Dynamic	
  patching:	
  stap,	
  kpatch

slide 60:

Tuning	
  Methods	
  
• ScienRﬁc	
  Method:	
  
QuesRon	
  
Hypothesis	
  
PredicRon	
  
Test	
  
Analysis	
  
• Any	
  observa3onal	
  or	
  benchmarking	
  tests	
  you	
  can	
  
try	
  before	
  tuning?	
  	
  
• Consider	
  risks,	
  and	
  see	
  previous	
  tools

slide 61:

Tuning	
  Tools

slide 62:

StaRc	
  Tools

slide 63:

StaRc	
  Tools	
  
• StaRc	
  Performance	
  Tuning:	
  check	
  the	
  staRc	
  state	
  
and	
  conﬁguraRon	
  of	
  the	
  system	
  
– CPU	
  types	
  
– Storage	
  devices	
  
– File	
  system	
  capacity	
  
– File	
  system	
  and	
  volume	
  conﬁguraRon	
  
– Route	
  table	
  
– State	
  of	
  hardware	
  
• What	
  can	
  be	
  checked	
  on	
  a	
  system	
  without	
  load

slide 64:

StaRc	
  Tools

slide 65:

Tracing

slide 66:

Tracing	
  Frameworks:	
  Tracepoints	
  
• StaRcally	
  placed	
  at	
  logical	
  places	
  in	
  the	
  kernel	
  
• Provides	
  key	
  event	
  details	
  as	
  a	
  “format”	
  string

slide 67:

Tracing	
  Frameworks:	
  +	
  probes	
  
• kprobes:	
  dynamic	
  kernel	
  tracing	
  
– funcRon	
  calls,	
  returns,	
  line	
  numbers	
  
• uprobes:	
  dynamic	
  user-‐level	
  tracing

slide 68:

Tracing	
  Tools	
  
• OpRons:	
  
– crace	
  
– perf_events	
  
– eBPF	
  
– SystemTap	
  
– ktap	
  
– LTTng	
  
– dtrace4linux	
  
– Oracle	
  Linux	
  DTrace	
  
– sysdig	
  
• Too	
  many	
  choices,	
  and	
  many	
  sRll	
  in-‐development

slide 69:

Imagine	
  Linux	
  with	
  Tracing	
  
• With	
  a	
  programmable	
  tracer,	
  high	
  level	
  tools	
  can	
  
be	
  wrijen,	
  such	
  as:	
  
– iosnoop	
  
– iolatency	
  
– opensnoop	
  
– …

slide 70:

iosnoop	
  
• Block	
  I/O	
  (disk)	
  events	
  with	
  latency:	
  
# ./iosnoop –ts!
Tracing block I/O. Ctrl-C to end.!
STARTs
ENDs
COMM
5982800.302061 5982800.302679 supervise
5982800.302423 5982800.302842 supervise
5982800.304962 5982800.305446 supervise
5982800.305250 5982800.305676 supervise
[…]!
PID
TYPE DEV
202,1
202,1
202,1
202,1
BLOCK
BYTES LATms!
0.62!
0.42!
0.48!
0.43!
# ./iosnoop –h!
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]!
-d device
# device string (eg, "202,1)!
-i iotype
# match type (eg, '*R*' for all reads)!
-n name
# process name to match on I/O issue!
-p PID
# PID to match on I/O issue!
# include queueing time in LATms!
# include start time of I/O (s)!
# include completion time of I/O (s)!
# this usage message!
duration
# duration seconds, and use buffers!
[…]!

slide 71:

iolatency	
  
• Block	
  I/O	
  (disk)	
  latency	
  distribuRons:	
  
# ./iolatency !
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.!
>gt;=(ms) .. gt; 1
: 2104
|######################################|!
1 ->gt; 2
: 280
|######
2 ->gt; 4
: 2
4 ->gt; 8
: 0
8 ->gt; 16
: 202
|####
>gt;=(ms) .. gt; 1
: 1144
|######################################|!
1 ->gt; 2
: 267
|#########
2 ->gt; 4
: 10
4 ->gt; 8
: 5
8 ->gt; 16
: 248
|#########
16 ->gt; 32
: 601
|####################
32 ->gt; 64
: 117
|####
[…]!

slide 72:

opensnoop	
  
• Trace	
  open()	
  syscalls	
  showing	
  ﬁlenames:	
  
# ./opensnoop -t!
Tracing open()s. Ctrl-C to end.!
TIMEs
COMM
PID
postgres
postgres
postgres
postgres
postgres
postgres
postgres
svstat
svstat
stat
stat
stat
stat
stat
stat
[…]!
FD FILE!
0x8 /proc/self/oom_adj!
0x5 global/pg_filenode.map!
0x5 global/pg_internal.init!
0x5 base/16384/PG_VERSION!
0x5 base/16384/pg_filenode.map!
0x5 base/16384/pg_internal.init!
0x5 base/16384/11725!
0x4 supervise/ok!
0x4 supervise/status!
0x3 /etc/ld.so.cache!
0x3 /lib/x86_64-linux-gnu/libselinux…!
0x3 /lib/x86_64-linux-gnu/libc.so.6!
0x3 /lib/x86_64-linux-gnu/libdl.so.2!
0x3 /proc/filesystems!
0x3 /etc/nsswitch.conf!

slide 73:

funcgraph	
  
• Trace	
  a	
  graph	
  of	
  kernel	
  code	
  ﬂow:	
  
# ./funcgraph -Htp 5363 vfs_read!
Tracing "vfs_read" for PID 5363... Ctrl-C to end.!
# tracer: function_graph!
TIME
CPU DURATION
FUNCTION CALLS!
4346366.073832 |
| vfs_read() {!
4346366.073834 |
rw_verify_area() {!
4346366.073834 |
security_file_permission() {!
4346366.073834 |
apparmor_file_permission() {!
4346366.073835 |
0.153 us
common_file_perm();!
4346366.073836 |
0.947 us
4346366.073836 |
0.066 us
__fsnotify_parent();!
4346366.073836 |
0.080 us
fsnotify();!
4346366.073837 |
2.174 us
4346366.073837 |
2.656 us
4346366.073837 |
tty_read() {!
4346366.073837 |
0.060 us
tty_paranoia_check();!
[…]!

slide 74:

kprobe	
  
• Dynamically	
  trace	
  a	
  kernel	
  funcRon	
  call	
  or	
  return,	
  
with	
  variables,	
  and	
  in-‐kernel	
  ﬁltering:	
  
# ./kprobe 'p:open do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'!
Tracing kprobe myopen. Ctrl-C to end.!
postgres-1172 [000] d... 6594028.787166: open: (do_sys_open
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
postgres-1172 [001] d... 6594028.797410: open: (do_sys_open
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
postgres-1172 [001] d... 6594028.797467: open: (do_sys_open
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat”!
^C!
Ending tracing...!
• Add	
  -‐s	
  for	
  stack	
  traces;	
  -‐p	
  for	
  PID	
  ﬁlter	
  in-‐kernel.	
  
• Quickly	
  conﬁrm	
  kernel	
  behavior;	
  eg:	
  did	
  a	
  
tunable	
  take	
  eﬀect?

slide 75:

Imagine	
  Linux	
  with	
  Tracing	
  
• These	
  tools	
  aren’t	
  using	
  dtrace4linux,	
  SystemTap,	
  
ktap,	
  or	
  any	
  other	
  add-‐on	
  tracer	
  
• These	
  tools	
  use	
  exis.ng	
  Linux	
  capabili.es	
  
– No	
  extra	
  kernel	
  bits,	
  not	
  even	
  kernel	
  debuginfo	
  
– Just	
  Linux’s	
  built-‐in	
  8race	
  proﬁler	
  
– Demoed	
  on	
  Linux	
  3.2	
  
• Solving	
  real	
  issues	
  now

slide 76:

crace	
  
• Added	
  by	
  Steven	
  Rostedt	
  and	
  others	
  
since	
  2.6.27	
  
• Already	
  enabled	
  on	
  our	
  servers	
  (3.2+)	
  
– CONFIG_FTRACE,	
  CONFIG_FUNCTION_PROFILER,	
  …	
  
– Use	
  directly	
  via	
  /sys/kernel/debug/tracing	
  
• My	
  front-‐end	
  tools	
  to	
  aid	
  usage	
  
– hjps://github.com/brendangregg/perf-‐tools	
  
– Unsupported	
  hacks:	
  see	
  WARNINGs	
  
– Also	
  see	
  the	
  trace-‐cmd	
  front-‐end,	
  as	
  well	
  as	
  perf	
  
• lwn.net:	
  “Ftrace:	
  The	
  Hidden	
  Light	
  Switch”

slide 77:

My	
  perf-‐tools	
  (so	
  far…)

slide 78:

Tracing	
  Summary	
  
• crace	
  
• perf_events	
  
• eBPF	
  
• SystemTap	
  
• ktap	
  
• LTTng	
  
• dtrace4linux	
  
• sysdig

slide 79:

perf_events	
  
• aka	
  “perf”	
  command	
  
• In	
  Linux.	
  Add	
  from	
  linux-‐tools-‐common,	
  …	
  
• Powerful	
  mulR-‐tool	
  and	
  proﬁler	
  
– interval	
  sampling,	
  CPU	
  performance	
  counter	
  events	
  
– user	
  and	
  kernel	
  dynamic	
  tracing	
  
– kernel	
  line	
  tracing	
  and	
  local	
  variables	
  (debuginfo)	
  
– kernel	
  ﬁltering,	
  and	
  in-‐kernel	
  counts	
  (perf	
  stat)	
  
• Not	
  very	
  programmable,	
  yet	
  
– limited	
  kernel	
  summaries.	
  May	
  improve	
  with	
  eBPF.

slide 80:

perf_events	
  Example	
  
# perf record –e skb:consume_skb -ag!
^C[ perf record: Woken up 1 times to write data ]!
[ perf record: Captured and wrote 0.065 MB perf.data (~2851 samples) ]!
# perf report!
[...]!
74.42% swapper [kernel.kallsyms] [k] consume_skb!
--- consume_skb!
arp_process!
arp_rcv!
__netif_receive_skb_core!
__netif_receive_skb!
netif_receive_skb!
virtnet_poll!
net_rx_action!
__do_softirq!
irq_exit!
do_IRQ!
ret_from_intr!
default_idle!
cpu_idle!
start_secondary!
[…]!
Summarizing	
  stack	
  
traces	
  for	
  a	
  tracepoint	
  
	
  
perf_events	
  can	
  do	
  
many	
  things	
  –	
  hard	
  to	
  
pick	
  just	
  one	
  example

slide 81:

eBPF	
  
• Extended	
  BPF:	
  programs	
  on	
  tracepoints	
  
– High	
  performance	
  ﬁltering:	
  JIT	
  
– In-‐kernel	
  summaries:	
  maps	
  
• Linux	
  in	
  3.18?	
  Enhance	
  perf_events/crace/…?	
  
# ./bitesize 1!
writing bpf-5 ->gt; /sys/kernel/debug/tracing/events/block/block_rq_complete/filter!
I/O sizes:!
Kbytes
: Count!
4 ->gt; 7
: 131!
8 ->gt; 15
: 32!
in-‐kernel	
  summary	
  
16 ->gt; 31
: 1!
32 ->gt; 63
: 46!
64 ->gt; 127
: 0!
128 ->gt; 255
: 15!
[…]!

slide 82:

SystemTap	
  
• Fully	
  programmable,	
  fully	
  featured	
  
• Compiles	
  tracing	
  programs	
  into	
  kernel	
  modules	
  
– Needs	
  a	
  compiler,	
  and	
  takes	
  Rme	
  
• “Works	
  great	
  on	
  Red	
  Hat”	
  
– I	
  keep	
  trying	
  on	
  other	
  distros	
  and	
  have	
  hit	
  trouble	
  in	
  
the	
  past;	
  make	
  sure	
  you	
  are	
  on	
  the	
  latest	
  version.	
  
– I’m	
  liking	
  it	
  a	
  bit	
  more	
  acer	
  ﬁnding	
  ways	
  to	
  use	
  it	
  
without	
  kernel	
  debuginfo	
  (a	
  diﬃcult	
  requirement	
  in	
  
our	
  environment).	
  Work	
  in	
  progress.	
  
• Ever	
  be	
  mainline?

slide 83:

ktap	
  
• Sampling,	
  staRc	
  &	
  dynamic	
  tracing	
  
• Lightweight,	
  simple.	
  Uses	
  bytecode.	
  
• Suited	
  for	
  embedded	
  devices	
  
• Development	
  appears	
  suspended	
  acer	
  
suggesRons	
  to	
  integrate	
  with	
  eBPF	
  (which	
  itself	
  is	
  
in	
  development)	
  
• ktap	
  +	
  eBPF	
  would	
  be	
  awesome:	
  easy,	
  
lightweight,	
  fast.	
  Likely?

slide 84:

sysdig	
  
• sysdig:	
  InnovaRve	
  new	
  tracer.	
  Simple	
  expressions:	
  
sysdig fd.type=file and evt.failed=true!
sysdig evt.type=open and fd.name contains /etc!
sysdig -p"%proc.name %fd.name" "evt.type=accept and proc.name!=httpd”!
• Replacement	
  for	
  strace?	
  (or	
  “perf	
  trace”	
  will)	
  
• Programmable	
  “chisels”.	
  Eg,	
  one	
  of	
  mine:	
  
# sysdig -c fileslower 1!
TIME
PROCESS
2014-04-13 20:40:43.973 cksum
2014-04-13 20:40:44.187 cksum
2014-04-13 20:40:44.689 cksum
[…]!
	
  
TYPE
read
read
read
LAT(ms) FILE!
2 /mnt/partial.0.0!
1 /mnt/partial.0.0!
2 /mnt/partial.0.0!
• Currently	
  syscalls	
  and	
  user-‐level	
  processing	
  only.	
  It	
  is	
  
opRmized,	
  but	
  I’m	
  not	
  sure	
  it	
  can	
  be	
  enough	
  for	
  kernel	
  tracing

slide 85:

Present	
  &	
  Future	
  
• Present:	
  
– crace	
  can	
  serve	
  many	
  needs	
  today	
  
– perf_events	
  some	
  more,	
  esp.	
  with	
  debuginfo	
  
– ad	
  hoc	
  SystemTap,	
  ktap,	
  …	
  as	
  needed	
  
• Future:	
  
– crace/perf_events/ktap	
  with	
  eBPF,	
  for	
  a	
  fully	
  
featured	
  and	
  mainline	
  tracer?	
  
– One	
  of	
  the	
  other	
  tracers	
  going	
  mainline?

slide 86:

The	
  Tracing	
  Landscape,	
  Oct	
  2014	
  
(less	
  brutal)	
  
(my	
  opinion)	
  
Ease	
  of	
  use	
  
sysdig	
  
perf	
  
stap	
  
crace	
  
(alpha)	
  
(brutal)	
  
dtrace4L.
ktap	
  
(mature)	
  
Stage	
  of	
  
Development	
  
eBPF	
  
Scope	
  &	
  Capability

slide 87:

In	
  Summary

slide 88:

In	
  Summary…	
  
• Plus	
  diagrams	
  for	
  benchmarking,	
  tuning,	
  tracing	
  
• Try	
  to	
  start	
  with	
  the	
  quesRons	
  (methodology),	
  to	
  
help	
  guide	
  your	
  use	
  of	
  the	
  tools	
  
• I	
  hopefully	
  turned	
  some	
  unknown	
  unknowns	
  into	
  
known	
  unknowns

slide 89:

References	
  &	
  Links	
  
Systems	
  Performance:	
  Enterprise	
  and	
  the	
  Cloud,	
  PrenRce	
  Hall,	
  2014	
  
hjp://www.brendangregg.com/linuxperf.html	
  
hjp://www.brendangregg.com/perf.html#FlameGraphs	
  
nicstat:	
  hjp://sourceforge.net/projects/nicstat/	
  
Rptop:	
  hjp://Rptop.gforge.inria.fr/	
  
• Tiptop:	
  Hardware	
  Performance	
  Counters	
  for	
  the	
  Masses,	
  Erven	
  Rohou,	
  Inria	
  
Research	
  Report	
  7789,	
  Nov	
  2011.	
  
crace	
  &	
  perf-‐tools	
  
• hjps://github.com/brendangregg/perf-‐tools	
  
• hjp://lwn.net/ArRcles/608497/	
  
MSR	
  tools:	
  hjps://github.com/brendangregg/msr-‐cloud-‐tools	
  
pcstat:	
  hjps://github.com/tobert/pcstat	
  
eBPF:	
  hjp://lwn.net/ArRcles/603983/	
  
ktap:	
  hjp://www.ktap.org/	
  
SystemTap:	
  hjps://sourceware.org/systemtap/	
  
sysdig:	
  hjp://www.sysdig.org/	
  
hjp://www.slideshare.net/brendangregg/linux-‐performance-‐analysis-‐and-‐tools	
  
Tux	
  by	
  Larry	
  Ewing;	
  Linux®	
  is	
  the	
  registered	
  trademark	
  of	
  Linus	
  Torvalds	
  in	
  the	
  U.S.	
  
and	
  other	
  countries.

slide 90:

Thanks	
  
• QuesRons?	
  
• hjp://slideshare.net/brendangregg	
  	
  
• hjp://www.brendangregg.com	
  
• bgregg@ne8lix.com	
  
• @brendangregg