Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Tracing Summit Europe 2014: From DTrace to Linux

Video: https://www.youtube.com/watch?v=TMXZcgnhXvg&list=UU3S1jlPpZUyx8ND3KKPhPvw

Talk at the Tracing Summit 2014, Düsseldorf, by Brendan Gregg.

Description: "What can Linux learn from DTrace: what went well, and what didn't go well, on its path to success? This talk will discuss not just the DTrace software, but lessons from the marketing and adoption of a system tracer, and an inside look at how DTrace was really deployed and used in production environments. It will also cover ongoing problems with DTrace, and how Linux may surpass them and continue to advance the field of system tracing. A world expert and core contributor to DTrace, Brendan now works at Netflix on Linux performance with the various Linux tracers (ftrace, perf_events, eBPF, SystemTap, ktap, sysdig, LTTng, and the DTrace Linux ports), and will summarize his experiences and suggestions for improvements. He has also been contributing to various tracers: recently promoting ftrace and perf_events adoption through articles and front-end scripts, and testing eBPF."

next
prev
1/98
next
prev
2/98
next
prev
3/98
next
prev
4/98
next
prev
5/98
next
prev
6/98
next
prev
7/98
next
prev
8/98
next
prev
9/98
next
prev
10/98
next
prev
11/98
next
prev
12/98
next
prev
13/98
next
prev
14/98
next
prev
15/98
next
prev
16/98
next
prev
17/98
next
prev
18/98
next
prev
19/98
next
prev
20/98
next
prev
21/98
next
prev
22/98
next
prev
23/98
next
prev
24/98
next
prev
25/98
next
prev
26/98
next
prev
27/98
next
prev
28/98
next
prev
29/98
next
prev
30/98
next
prev
31/98
next
prev
32/98
next
prev
33/98
next
prev
34/98
next
prev
35/98
next
prev
36/98
next
prev
37/98
next
prev
38/98
next
prev
39/98
next
prev
40/98
next
prev
41/98
next
prev
42/98
next
prev
43/98
next
prev
44/98
next
prev
45/98
next
prev
46/98
next
prev
47/98
next
prev
48/98
next
prev
49/98
next
prev
50/98
next
prev
51/98
next
prev
52/98
next
prev
53/98
next
prev
54/98
next
prev
55/98
next
prev
56/98
next
prev
57/98
next
prev
58/98
next
prev
59/98
next
prev
60/98
next
prev
61/98
next
prev
62/98
next
prev
63/98
next
prev
64/98
next
prev
65/98
next
prev
66/98
next
prev
67/98
next
prev
68/98
next
prev
69/98
next
prev
70/98
next
prev
71/98
next
prev
72/98
next
prev
73/98
next
prev
74/98
next
prev
75/98
next
prev
76/98
next
prev
77/98
next
prev
78/98
next
prev
79/98
next
prev
80/98
next
prev
81/98
next
prev
82/98
next
prev
83/98
next
prev
84/98
next
prev
85/98
next
prev
86/98
next
prev
87/98
next
prev
88/98
next
prev
89/98
next
prev
90/98
next
prev
91/98
next
prev
92/98
next
prev
93/98
next
prev
94/98
next
prev
95/98
next
prev
96/98
next
prev
97/98
next
prev
98/98

PDF: TracingSummit2014_FromDTraceToLinux.pdf

Keywords (from pdftotext):

slide 1:
    TRACING SUMMIT
    EUROPE
    Oct, 2014
    From	
      DTrace	
      
    To	
      Linux:	
      
    What	
      can	
      Linux	
      learn	
      from	
      DTrace?	
      
    Brendan	
      Gregg	
      
    Senior	
      Performance	
      Architect	
      
    bgregg@ne7lix.com	
      
    @brendangregg	
      
    
slide 2:
    Brendan	
      Gregg	
      
    • DTrace	
      contribu?ons	
      include:	
      
    – Primary	
      author	
      of	
      the	
      DTrace	
      book	
      
    – DTraceToolkit	
      
    – dtrace-­‐cloud-­‐tools	
      
    – DTrace	
      network	
      providers	
      
    • I	
      now	
      work	
      on	
      Linux	
      at	
      Ne7lix	
      
    – using:	
      Jrace,	
      perf_events,	
      SystemTap,	
      ktap,	
      eBPF,	
      …	
      
    – created:	
      perf-­‐tools,	
      msr-­‐cloud-­‐tools	
      
    • Opinions	
      in	
      this	
      talk	
      are	
      my	
      own	
      
    
slide 3:
    Agenda	
      
    1. DTrace	
      
    – What	
      is	
      DTrace,	
      really?	
      
    – Who	
      is	
      DTrace	
      for,	
      really?	
      
    – Why	
      doesn’t	
      Linux	
      have	
      DTrace?	
      
    – What	
      worked	
      well?	
      
    – What	
      didn’t	
      work	
      well?	
      
    2. Linux	
      Tracers	
      
    – Jrace,	
      perf_events,	
      eBPF,	
      …	
      
    	
      	
      
    Topics	
      include	
      adop?on,	
      marke?ng,	
      technical	
      challenges,	
      and	
      
    our	
      usage	
      at	
      Ne7lix.	
      
    
slide 4:
    	
      
    What	
      is	
      DTrace,	
      really?	
      
    
slide 5:
    Technology	
      
    +	
      
    Marke?ng	
      
    	
      
    	
      
    (Like	
      many	
      other	
      company	
      products)	
      
    
slide 6:
    Prior	
      Technology	
      
    Kerninst:	
      kernel	
      dynamic	
      tracing,	
      Solaris	
      2.5.1,	
      1999	
      
    
slide 7:
    Prior	
      Technology	
      
    Early	
      dynamic	
      tracers	
      weren’t	
      safe	
      
    
slide 8:
    Prior	
      Technology	
      
    • Also:	
      
    – Sun’s	
      TNF	
      
    – DProbes:	
      user	
      +	
      kernel	
      dynamic	
      tracing	
      
    – Linux	
      Trace	
      Toolkit	
      (LTT)	
      
    – Others,	
      including	
      offline	
      binary	
      instrumenta?on	
      
    • DProbes	
      and	
      LTT	
      were	
      combined	
      in	
      Nov	
      2000,	
      but	
      not	
      
    integrated	
      into	
      the	
      Linux	
      kernel1	
      
    • Sun	
      set	
      forth	
      to	
      produce	
      a	
      produc?on-­‐safe	
      tool	
      
    1	
      h^p://lkml.iu.edu/hypermail/linux/kernel/0011.3/0183.html	
      
    
slide 9:
slide 10:
    Technology	
      
    • DTrace:	
      
    – Safe	
      for	
      produc?on	
      use	
      
    • You	
      might	
      step	
      on	
      your	
      foot	
      (overhead),	
      but	
      you	
      won’t	
      shoot	
      it	
      off	
      
    – Dynamic	
      tracing,	
      sta?c	
      tracing,	
      and	
      profiling	
      
    – User-­‐	
      and	
      kernel-­‐level,	
      unified	
      
    – Programma?c:	
      filters	
      and	
      summaries	
      
    – Solved	
      countless	
      issues	
      in	
      dev	
      and	
      prod	
      
    • That’s	
      what	
      DTrace	
      is	
      for	
      me	
      
    – An	
      awesome	
      technology,	
      oJen	
      needed	
      to	
      root	
      cause	
      
    kernel	
      &	
      app	
      issues	
      
    • But	
      for	
      most	
      people….	
      
    
slide 11:
    A	
      Typical	
      Conversa?on…	
      
    “Does	
      Linux	
      have	
      DTrace	
      yet?”	
      
    	
      
    “No.”	
      
    “That’s	
      a	
      pity”	
      
    	
      
    	
      
    “Why?”	
      
    “DTrace	
      is	
      awesome!”	
      
    	
      
    	
      
    “Why,	
      specifically?”	
      
    “I’m	
      not	
      sure”	
      
    	
      
    	
      
    “Have	
      you	
      used	
      it?”	
      
    “No.”	
      
    
slide 12:
    Marke?ng	
      
    
slide 13:
    Early	
      Marke?ng	
      
    
slide 14:
    Early	
      Marke?ng	
      
    • DTrace	
      had	
      awesome	
      marke?ng	
      
    – People	
      s?ll	
      want	
      it	
      but	
      don’t	
      really	
      know	
      why	
      
    • Early	
      marke?ng:	
      tradi?onal,	
      $$$	
      
    – Great	
      marke?ng	
      product	
      managers	
      
    • 10	
      Moves	
      Ahead	
      campaign:	
      airports,	
      sta?ons,	
      etc.	
      
    – Sun	
      sales	
      staff	
      pitched	
      DTrace	
      directly	
      
    – Sun	
      technology	
      evangelists	
      
    • Benefits	
      
    – Not	
      another	
      Sun	
      tech	
      no	
      one	
      knew	
      about	
      
    – Compelled	
      people	
      to	
      learn	
      more,	
      try	
      it	
      out	
      
    
slide 15:
    Marke?ng	
      Evolved	
      
    • Sun	
      marke?ng	
      become	
      innova?ve	
      
    – Engineering	
      blogs,	
      BigAdmin	
      
    – Marke?ng	
      staff	
      who	
      used	
      and	
      understood	
      DTrace	
      
    • Who	
      could	
      be^er	
      ar?culate	
      its	
      value	
      
    – Marke?ng	
      more	
      directly	
      from	
      the	
      engineers	
      
    	
      
    
slide 16:
    Later	
      Marke?ng	
      
    
slide 17:
    Later	
      Marke?ng	
      
    • Many	
      ini?a?ves	
      by	
      Deirdré	
      Straughan:	
      
    – Social	
      media,	
      blogs,	
      events,	
      the	
      ponycorn	
      mascot,	
      ...	
      
    – Video	
      and	
      share	
      everything:	
      all	
      meetups,	
      talks	
      
    • Blogs:	
      
    – including	
      h^p://dtrace.org;	
      my	
      own	
      >gt;	
      1M	
      views	
      
    • Books:	
      
    – my	
      own	
      >gt;	
      30k	
      sold	
      
    • Videos:	
      
    – me	
      shou?ng	
      while	
      DTracing	
      disks,	
      ~1M	
      views	
      
    • Language	
      support	
      exposed	
      new	
      communi?es	
      to	
      DTrace	
      
    
slide 18:
slide 19:
slide 20:
    ???	
      
    
slide 21:
    	
      
    Who	
      is	
      DTrace	
      for,	
      really?	
      
    
slide 22:
    DTrace	
      end-­‐users:	
      Current	
      
    Es?mated	
      
    DTrace	
      guide	
      users:	
      ~100	
      
    Script	
      end-­‐users:	
      ~5,000	
      
    Note:	
      91.247%	
      of	
      sta?s?cs	
      are	
      made	
      up	
      
    
slide 23:
    DTrace	
      end-­‐users:	
      Current	
      
    • DTrace	
      guide	
      users:	
      ~100	
      
    – Understand	
      the	
      ~400	
      page	
      Dynamic	
      Tracing	
      Guide	
      
    – Develop	
      their	
      own	
      scripts	
      from	
      scratch	
      
    – Understand	
      overhead	
      intui?vely	
      
    • Script	
      end-­‐users:	
      ~5000	
      
    – DTraceToolkit,	
      Google	
      
    – Run	
      scripts.	
      Some	
      tweaks/customiza?ons.	
      
    
slide 24:
    DTrace	
      end-­‐users:	
      Future	
      
    eg,	
      Oracle	
      ZFS	
      Storage	
      Appliance	
      Analy?cs	
      
    
slide 25:
    DTrace	
      end-­‐users:	
      Future	
      
    Possible	
      Future	
      
    DTrace	
      guide	
      users:	
      ~100	
      
    Script	
      end-­‐users:	
      ~5,000	
      
    GUI	
      end-­‐users:	
      >gt;50,000	
      
    
slide 26:
    Company	
      Usage	
      
    
slide 27:
    Company	
      Usage	
      
    • Prac?cal	
      usage	
      for	
      most	
      companies:	
      
    – A)	
      A	
      performance	
      team	
      (or	
      person)	
      
    • Acquires	
      useful	
      scripts	
      
    • Develops	
      custom	
      scripts	
      
    – B)	
      The	
      rest	
      of	
      the	
      company	
      asks	
      (A)	
      for	
      script/help 	
      	
      
    • They	
      need	
      to	
      know	
      what’s	
      possible,	
      to	
      know	
      to	
      ask	
      
    – Or,	
      you	
      buy/develop	
      a	
      GUI	
      that	
      everyone	
      can	
      use	
      
    • There	
      are	
      some	
      excep?ons	
      
    
slide 28:
    	
      
    Why	
      doesn’t	
      Linux	
      have	
      DTrace?	
      
    
slide 29:
    	
      
    Why	
      doesn’t	
      Linux	
      have	
      a	
      DTrace	
      
    equivalent?	
      
    	
      
    4	
      Answers…	
      
    
slide 30:
    1.	
      It	
      does	
      (sort	
      of)	
      
    Jrace	
      
    perf_events	
      
    
slide 31:
    1.	
      It	
      does	
      (sort	
      of)	
      
    • Linux	
      has	
      changed	
      
    – In	
      2005,	
      numerous	
      Linux	
      issues	
      were	
      difficult	
      or	
      
    impossible	
      to	
      solve.	
      Linux	
      needed	
      a	
      DTrace	
      equivalent.	
      
    – By	
      2014,	
      many	
      of	
      these	
      are	
      now	
      solvable,	
      especially	
      
    using	
      Jrace,	
      perf_events,	
      kprobes,	
      uprobes:	
      all	
      part	
      of	
      
    the	
      Linux	
      kernel	
      
    
slide 32:
    2.	
      Technical	
      
    semantic error: missing x86_64
    kernel/module debuginfo
    
slide 33:
    2.	
      Technical	
      
    • Linux	
      is	
      a	
      more	
      difficult	
      environment	
      
    – Solaris	
      always	
      has	
      symbols,	
      via	
      CTF,	
      which	
      DTrace	
      
    uses	
      for	
      dynamic	
      tracing	
      
    – Linux	
      doesn’t	
      always	
      have	
      symbols/debuginfo	
      
    
slide 34:
    3.	
      Linux	
      isn’t	
      a	
      Company	
      
    “All	
      the	
      wood	
      behind	
      one	
      arrow”	
      
    –	
      Sco^	
      McNealy,	
      CEO,	
      Sun	
      Microsystems	
      
    
slide 35:
    3.	
      Linux	
      isn’t	
      a	
      Company	
      
    • Linus	
      can	
      refuse	
      patches,	
      but	
      can’t	
      stop	
      projects	
      
    – The	
      tracing	
      wood	
      is	
      split	
      between	
      many	
      arrows	
      
    • Jrace,	
      perf_events,	
      eBPF,	
      SystemTap,	
      ktap,	
      LTTng,	
      …	
      
    – And	
      we	
      are	
      a	
      small	
      community:	
      there’s	
      not	
      much	
      
    wood	
      to	
      go	
      around!	
      
    
slide 36:
    4.	
      No	
      Trace	
      Race	
      
    
slide 37:
    4.	
      No	
      Trace	
      Race	
      
    • Post	
      2001,	
      Solaris	
      was	
      losing	
      ground	
      to	
      Linux.	
      Sun	
      
    desperately	
      needed	
      differen?ators	
      to	
      survive	
      
    – Three	
      top	
      Sun	
      engineers	
      spent	
      years	
      on	
      DTrace	
      
    – Sun	
      marke?ng	
      gave	
      it	
      their	
      best	
      shot…	
      
    • This	
      circumstance	
      will	
      never	
      exist	
      again	
      
    – For	
      Linux	
      today,	
      it	
      would	
      be	
      like	
      having	
      Linus,	
      Ingo,	
      and	
      
    Steven	
      do	
      tracing	
      full-­‐?me	
      for	
      three	
      years,	
      followed	
      by	
      
    a	
      major	
      marke?ng	
      campaign	
      
    • There	
      may	
      never	
      be	
      another	
      trace	
      race.	
      Unless…	
      
    
slide 38:
    	
      
    Why	
      doesn’t	
      Linux	
      have	
      DTrace	
      
    itself?	
      
    	
      
    2	
      Answers…	
      
    
slide 39:
    1.	
      The	
      CDDL	
      
    From: Claire Giordano gt;
    To:
    license-discuss@opensource.org	
      	
      [Open	
      Source	
      Ini?a?ve]	
      
    Subject: For Approval: Common Development and Distribution License (CDDL)
    Date: Wed, 01 Dec 2004 19:47:39 -0800
    […]
    Like the MPL, the CDDL is not expected to be compatible with the GPL,
    since it contains requirements that are not in the GPL (for example,
    the "patent peace" provision in section 6). Thus, it is likely that
    files released under the CDDL will not be able to be combined with
    files released under the GPL to create a larger program.
    […]
    CDDL Team, Sun Microsystems
    Source:	
      h^p://lwn.net/Ar?cles/114840/	
      
    
slide 40:
    1.	
      The	
      CDDL	
      
    • Linux	
      tradi?onally	
      includes	
      the	
      tracer/profiler	
      in	
      
    the	
      (GPL)	
      kernel,	
      but	
      the	
      DTrace	
      license	
      is	
      CDDL	
      
    – Out-­‐of-­‐tree	
      projects	
      have	
      maintenance	
      difficul?es	
      
    – Oracle	
      (who	
      own	
      the	
      DTrace	
      copyrights)	
      could	
      
    relicense	
      it	
      as	
      GPL,	
      but	
      haven’t,	
      and	
      may	
      never	
      do	
      this	
      
    • Note	
      that	
      ZFS	
      on	
      Linux	
      is	
      doing	
      well,	
      despite	
      
    being	
      CDDL,	
      and	
      out	
      of	
      tree	
      
    
slide 41:
    2.	
      DTrace	
      ports	
      
    
slide 42:
    2.	
      DTrace	
      ports	
      
    • There	
      are	
      two	
      ports,	
      but	
      both	
      currently	
      incomplete	
      
    • A)	
      h^ps://github.com/dtrace4linux/linux:	
      
    – Mostly	
      one	
      UK	
      developer,	
      Paul	
      Fox,	
      as	
      a	
      hobby	
      since	
      
    2008	
      (when	
      he	
      isn’t	
      developing	
      on	
      the	
      Rasberry	
      Pi)	
      
    • B)	
      Oracle	
      Linux	
      DTrace:	
      
    – Open	
      source	
      kernel,	
      closed	
      source	
      user-­‐level	
      ($)	
      
    • We	
      pay	
      for	
      monitoring	
      tools;	
      why	
      not	
      this	
      too?	
      
    – Experienced	
      engineers,	
      test	
      suite	
      focused	
      
    – Had	
      been	
      good	
      progress,	
      but	
      no	
      updates	
      for	
      months	
      
    
slide 43:
    	
      
    What	
      with	
      DTrace	
      worked	
      well?	
      
    	
      
    	
      
    5	
      Key	
      items…	
      
    
slide 44:
    1.	
      Produc?on	
      Safety	
      
    
slide 45:
    1.	
      Produc?on	
      Safety	
      
    • DTrace	
      architecture	
      
    – Restricted	
      probe	
      context:	
      no	
      kernel	
      facility	
      calls,	
      
    restricted	
      instruc?ons,	
      no	
      backwards	
      branches,	
      
    restricted	
      loads/stores	
      
    – Heartbeat:	
      aborted	
      due	
      to	
      systemic	
      unresponsiveness	
      
    • DTrace	
      Test	
      Suite	
      
    – Hundreds	
      of	
      tests	
      
    • Linux	
      is	
      learning	
      this:	
      
    – Oracle	
      Linux	
      DTrace	
      is	
      taking	
      the	
      test	
      suite	
      seriously	
      
    – Jracetest	
      
    
slide 46:
    2.	
      All	
      the	
      wood	
      behind	
      one	
      arrow	
      
    DTrace	
      
    
slide 47:
    2.	
      All	
      the	
      wood	
      behind	
      one	
      arrow	
      
    • Can	
      Linux	
      learn	
      this?	
      
    – Can	
      we	
      vote	
      some	
      off	
      the	
      Linux	
      tracing	
      island?	
      
    • At	
      least,	
      no	
      new	
      tracers	
      in	
      2015,	
      please!	
      
    
slide 48:
    3.	
      In-­‐Kernel	
      Aggrega?ons	
      
    value ------------- Distribution ------------- count!
    4096 |
    8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@
    1085!
    16384 |@@@@@@@@@@@
    443!
    32768 |@@
    98!
    65536 |
    131072 |
    262144 |
    524288 |
    1048576 |
    11!
    2097152 |
    
slide 49:
    3.	
      In-­‐Kernel	
      Aggrega?ons	
      
    • Changed	
      how	
      inves?ga?ons	
      are	
      conducted	
      
    – rapid,	
      live	
      analysis	
      
    • Low	
      overhead:	
      
    – per-­‐CPU	
      storage,	
      asynchronous	
      kernel-­‐>gt;user	
      transfers	
      
    • Key	
      uses:	
      
    – summary	
      sta?s?cs:	
      count,	
      avg,	
      min,	
      max	
      
    – histograms:	
      quan?ze,	
      lquan?ze	
      
    – by-­‐key:	
      execname,	
      kernel	
      stacks,	
      user	
      stacks	
      
    • Linux	
      can	
      learn:	
      
    – Need	
      aggrega?ons	
      (eBPF	
      maps,	
      SystemTap,	
      ktap?)	
      
    
slide 50:
    4.	
      Many	
      Example	
      Scripts	
      
    
slide 51:
    4.	
      Many	
      Example	
      Scripts	
      
    
slide 52:
    4.	
      Many	
      Example	
      Scripts	
      
    • Scripts	
      serve	
      many	
      needs:	
      
    – tools:	
      ready	
      to	
      run	
      
    – examples:	
      learn	
      by-­‐example	
      
    – marke?ng:	
      each	
      is	
      a	
      use	
      case	
      
    • DTrace	
      book	
      scripts	
      
    – 150+	
      short	
      examples	
      
    • DTraceToolkit	
      
    DTraceToolkit	
      scripts	
      
    – 230	
      more	
      scripts	
      
    – all	
      have	
      man	
      pages,	
      example	
      files,	
      and	
      are	
      tested	
      
    – An	
      essen?al	
      factor	
      in	
      DTrace’s	
      adop?on	
      
    
slide 53:
    4.	
      Many	
      Example	
      Scripts	
      
    • Linux	
      can	
      learn:	
      
    – Many	
      users	
      will	
      just	
      run	
      scripts,	
      not	
      write	
      them	
      
    – People	
      want	
      good	
      short	
      examples	
      
    – If	
      they	
      aren’t	
      tested,	
      they	
      don’t	
      work	
      
    • It’s	
      easy	
      to	
      generate	
      metrics	
      that	
      kind-­‐of	
      work;	
      it’s	
      hard	
      
    to	
      make	
      them	
      reliable	
      for	
      different	
      workloads.	
      
    – Maintenance	
      of	
      dynamic	
      tracing	
      scripts	
      is	
      painful	
      
    • The	
      instrumented	
      code	
      can	
      change	
      
    • Need	
      more	
      sta?c	
      tracepoints	
      
    
slide 54:
    5.	
      Marke?ng	
      
    
slide 55:
    5.	
      Marke?ng	
      
    • DTrace	
      was	
      effec?vely	
      marketed	
      in	
      many	
      ways	
      
    – Tradi?onal,	
      social,	
      blogs,	
      scripts,	
      ponycorn,	
      …	
      
    • Linux	
      has	
      virtually	
      no	
      marke?ng	
      for	
      its	
      tracers	
      
    – Jrace	
      is	
      great,	
      if	
      you	
      ever	
      discover	
      it;	
      etc.	
      
    – Marke?ng	
      spend	
      is	
      on	
      commercial	
      products	
      instead	
      
    • Linux	
      can	
      learn	
      to	
      market	
      what	
      it	
      has	
      
    – Tracers	
      may	
      also	
      benefit	
      from	
      
    “a	
      great	
      name	
      and	
      a	
      cute	
      logo”1	
      
    – “eBPF”	
      is	
      not	
      catchy,	
      and	
      doesn’t	
      
    	
      convey	
      meaning	
      
    1	
      h^p://thenewstack.io/why-­‐did-­‐docker-­‐catch-­‐on-­‐quickly-­‐and-­‐why-­‐is-­‐it-­‐so-­‐interes?ng/	
      
    
slide 56:
    Cute	
      Tracing	
      Logos	
      
    Jrace	
      
    ktap	
      
    perf_events	
      
    LTTng	
      
    SystemTap	
      
    dtrace4linux	
      
    Ponies	
      by	
      Deirdré	
      Straughan,	
      using:	
      h^p://generalzoi.deviantart.com	
      pony	
      creator	
      
    
slide 57:
    Other	
      Things	
      
    • Programmable/scriptable	
      
    • Built-­‐in	
      stability	
      seman?cs	
      
    
slide 58:
    	
      
    What	
      with	
      DTrace	
      didn’t	
      work	
      
    well?	
      
    
slide 59:
    	
      
    What	
      with	
      DTrace	
      didn’t	
      work	
      
    well?	
      
    	
      
    5	
      Key	
      Issues…	
      
    
slide 60:
    1.	
      Adop?on	
      
    
slide 61:
    1.	
      Adop?on	
      
    • Few	
      customers	
      ever	
      wrote	
      DTrace	
      scripts	
      
    – DTrace	
      should	
      have	
      been	
      used	
      more	
      than	
      it	
      was	
      
    – Sun’s	
      “killer”	
      tool	
      just	
      wasn’t	
      
    – Be^er	
      pickup	
      rate	
      with	
      developers,	
      not	
      sysadmins	
      
    • Many	
      customers	
      just	
      ran	
      my	
      scripts	
      
    – Not	
      ideal,	
      but	
      be^er	
      than	
      nothing	
      
    – This	
      wasn’t	
      what	
      many	
      at	
      Sun	
      dreamed	
      
    • Internal	
      adop?on	
      was	
      slow,	
      limited	
      
    – Sun	
      could	
      have	
      done	
      much	
      more,	
      but	
      didn’t	
      
    • The	
      problem	
      was	
      knowing	
      what	
      to	
      do	
      with	
      it	
      
    – The	
      syntax	
      was	
      the	
      easy	
      part	
      
    
slide 62:
    1.	
      Adop?on	
      
    • Linux	
      can	
      learn:	
      
    – Adop?on	
      is	
      about	
      more	
      than	
      just	
      the	
      technology	
      
    • Documenta?on,	
      marke?ng,	
      training,	
      community	
      
    – Teaching	
      what	
      it	
      does	
      is	
      more	
      important	
      than	
      how	
      
    • Everyone	
      needs	
      to	
      know	
      when	
      to	
      ask	
      for	
      it,	
      not	
      
    necessarily	
      how	
      to	
      use	
      it	
      
    – Needs	
      an	
      adop?on	
      curve	
      (not	
      a	
      step	
      func?on)	
      
    • Tools,	
      one-­‐liners,	
      short	
      scripts,	
      …	
      
    
slide 63:
    2.	
      Training	
      
    This	
      is	
      to	
      cer?fy	
      that	
      
    Brendan	
      Gregg	
      
    Has	
      Completed	
      the	
      Sun	
      Educa?onal	
      course	
      
    DTrace	
      is	
      a	
      Solaris	
      differen3ator	
      
    
slide 64:
    2.	
      Training	
      
    • Early	
      training	
      was	
      not	
      very	
      effec?ve	
      
    – Sun	
      began	
      including	
      the	
      DTraceToolkit	
      in	
      courses,	
      
    with	
      be^er	
      success	
      
    • It	
      gradually	
      improved	
      
    – The	
      last	
      courses	
      I	
      developed	
      and	
      taught	
      (aJer	
      Sun)	
      
    used	
      simulated	
      problems	
      for	
      the	
      students	
      to	
      solve	
      
    on	
      their	
      own	
      with	
      DTrace	
      
    • Linux	
      can	
      learn:	
      
    – Lab-­‐based	
      training	
      is	
      most	
      effec?ve.	
      Online	
      tutorials?	
      
    
slide 65:
    3.	
      GUIs	
      
    
slide 66:
    3.	
      GUIs	
      
    • Dozens	
      of	
      performance	
      monitoring	
      products,	
      but	
      
    almost	
      no	
      meaningful	
      DTrace	
      support	
      
    • A	
      couple	
      of	
      excep?ons:	
      
    – Oracle	
      ZFS	
      Storage	
      Appliance	
      Analy?cs	
      
    • Formally	
      the	
      Sun	
      Storage	
      7000	
      Analy?cs	
      
    • Should	
      be	
      generalized.	
      Oracle	
      Solaris	
      11.3?	
      
    – Joyent	
      Cloud	
      Analy?cs	
      
    
slide 67:
    3.	
      GUIs	
      
    • Linux	
      can	
      learn:	
      
    – Real	
      adop?on	
      possible	
      through	
      scripts	
      &	
      GUIs	
      
    – Use	
      the	
      GUI	
      to	
      add	
      value	
      to	
      the	
      data	
      
    • Heat	
      maps:	
      latency,	
      u?liza?on,	
      offset	
      
    • Flame	
      graphs	
      
    • Time	
      series	
      thread	
      visualiza?ons	
      (Trace	
      Compass)	
      
    • ie,	
      not	
      just	
      line	
      graphs!	
      
    – Commercial	
      GUI	
      products	
      have	
      marke?ng	
      budget	
      
    • Applica?on	
      perf	
      monitoring	
      was	
      $2.4B	
      in	
      20131	
      
    1	
      h^ps://www.gartner.com/doc/2752217/market-­‐share-­‐analysis-­‐applica?on-­‐performance	
      
    
slide 68:
    3.	
      GUIs	
      
    • Heat	
      maps	
      are	
      an	
      example	
      must-­‐have	
      use	
      case	
      for	
      trace	
      data	
      
    
slide 69:
    4.	
      Overheads	
      
    While	
      the	
      DTrace	
      technology	
      is	
      awesome,	
      
    it	
      does	
      have	
      some	
      minor	
      technical	
      challenges	
      as	
      well	
      
    
slide 70:
    4.	
      Overheads	
      
    • While	
      op?mized,	
      for	
      many	
      targets	
      the	
      DTrace	
      
    CPU	
      overheads	
      can	
      s?ll	
      be	
      too	
      high	
      
    – Scheduler	
      tracing,	
      memory	
      alloca?on	
      tracing	
      
    – User-­‐level	
      dynamic	
      tracing	
      (fast	
      trap)	
      
    – VM	
      probes	
      (eg,	
      Java	
      disables	
      some	
      probes	
      by	
      default)	
      
    – 10	
      GbE	
      Network	
      I/O,	
      etc…	
      
    • In	
      some	
      cases	
      it	
      doesn’t	
      ma^er	
      
    – Despera?on:	
      system	
      already	
      mel?ng	
      down	
      
    – Troubleshoo?ng	
      in	
      dev:	
      speed	
      not	
      a	
      concern	
      
    • Linux	
      can	
      learn:	
      
    – Speed	
      can	
      ma^er,	
      faster	
      makes	
      more	
      possible	
      
    
slide 71:
    5.	
      Syscall	
      Provider	
      
    
slide 72:
    5.	
      Syscall	
      Provider	
      
    • Solaris	
      DTrace	
      instrumented	
      the	
      trap	
      table,	
      and	
      
    called	
      it	
      the	
      syscall	
      provider	
      
    – Which	
      is	
      actually	
      an	
      unstable	
      interface	
      
    – Breaks	
      between	
      Solaris	
      versions	
      
    • And	
      really	
      broke	
      in	
      Oracle	
      Solaris	
      11	
      
    – Other	
      weird	
      caveats	
      
    • Linux	
      can	
      learn:	
      
    – syscalls	
      are	
      the	
      #1	
      target	
      for	
      users	
      learning	
      system	
      
    tracers.	
      The	
      API	
      should	
      be	
      easy	
      and	
      stable.	
      
    
slide 73:
    Other	
      Issues	
      
    • The	
      lack	
      of:	
      
    – Bounded	
      loops	
      (like	
      SystemTap)	
      
    – Kernel	
      instruc?on	
      tracing	
      (like	
      perf_events)	
      
    – Easy	
      PMC	
      interface	
      (like	
      perf	
      stat)	
      
    – Aggrega?on	
      key/value	
      access	
      (stap,	
      ktap,	
      eBPF)	
      
    – Kernel	
      source	
      (issue	
      for	
      Oracle	
      Solaris	
      only)	
      
    • 4+	
      second	
      startup	
      ?mes	
      
    – Several	
      Linux	
      tracers	
      start	
      instantly	
      
    
slide 74:
    	
      
    	
      
    	
      
    From	
      DTrace	
      to	
      
    Linux	
      Tracers	
      
    (2014)	
      
    	
      
    
slide 75:
    • Massive	
      AWS	
      EC2	
      Linux	
      cloud,	
      with	
      FreeBSD	
      
    appliances	
      for	
      content	
      delivery	
      
    • Performance	
      is	
      cri?cal:	
      >gt;50M	
      subscribers	
      
    • Just	
      launched	
      in	
      Europe!	
      
    
slide 76:
    System	
      Tracing	
      at	
      Ne7lix	
      
    • Present:	
      
    – Jrace	
      can	
      serve	
      many	
      needs	
      
    – perf_events	
      some	
      more,	
      esp.	
      with	
      debuginfo	
      
    – SystemTap	
      as	
      needed,	
      esp.	
      for	
      Java	
      
    – ad	
      hoc	
      other	
      tools	
      
    • Future:	
      
    – Jrace/perf_events/ktap	
      with	
      eBPF,	
      for	
      a	
      fully	
      
    featured	
      and	
      mainline	
      tracer?	
      
    – One	
      of	
      the	
      other	
      tracers	
      going	
      mainline?	
      
    • Summarizing	
      4	
      tracers…	
      
    
slide 77:
    1.	
      Jrace	
      
    
slide 78:
    1.	
      Jrace	
      
    • Tracing	
      and	
      profiling:	
      /sys/kernel/debug/tracing	
      
    – added	
      by	
      Steven	
      Rostedt	
      and	
      others	
      since	
      2.6.27,	
      and	
      
    already	
      enabled	
      on	
      our	
      servers	
      (3.2+)	
      
    • Experiences:	
      
    – very	
      useful	
      capabili?es:	
      tracing,	
      coun?ng	
      
    – surprising	
      features:	
      graphing	
      (latencies),	
      filters	
      
    • Front-­‐end	
      tools	
      to	
      ease	
      use	
      
    – h^ps://github.com/brendangregg/perf-­‐tools	
      
    – WARNING:	
      these	
      are	
      unsupported	
      hacks	
      
    – There’s	
      also	
      the	
      trace-­‐cmd	
      front-­‐end	
      by	
      Steven	
      
    • 4	
      examples…	
      
    
slide 79:
    perf-­‐tools:	
      iosnoop	
      
    • Block	
      I/O	
      (disk)	
      events	
      with	
      latency:	
      
    # ./iosnoop –ts!
    Tracing block I/O. Ctrl-C to end.!
    STARTs
    ENDs
    COMM
    5982800.302061 5982800.302679 supervise
    5982800.302423 5982800.302842 supervise
    5982800.304962 5982800.305446 supervise
    5982800.305250 5982800.305676 supervise
    […]!
    PID
    TYPE DEV
    202,1
    202,1
    202,1
    202,1
    BLOCK
    BYTES LATms!
    0.62!
    0.42!
    0.48!
    0.43!
    # ./iosnoop –h!
    USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]!
    -d device
    # device string (eg, "202,1)!
    -i iotype
    # match type (eg, '*R*' for all reads)!
    -n name
    # process name to match on I/O issue!
    -p PID
    # PID to match on I/O issue!
    # include queueing time in LATms!
    # include start time of I/O (s)!
    # include completion time of I/O (s)!
    # this usage message!
    duration
    # duration seconds, and use buffers!
    […]!
    
slide 80:
    perf-­‐tools:	
      iolatency	
      
    • Block	
      I/O	
      (disk)	
      latency	
      distribu?ons:	
      
    # ./iolatency !
    Tracing block I/O. Output every 1 seconds. Ctrl-C to end.!
    >gt;=(ms) .. gt; 1
    : 1144
    |######################################|!
    1 ->gt; 2
    : 267
    |#########
    2 ->gt; 4
    : 10
    4 ->gt; 8
    : 5
    8 ->gt; 16
    : 248
    |#########
    16 ->gt; 32
    : 601
    |####################
    32 ->gt; 64
    : 117
    |####
    […]!
    	
      
    • User-­‐level	
      processing	
      	
      some?mes	
      can’t	
      keep	
      up	
      	
      
    – Over	
      50k	
      IOPS.	
      Could	
      buffer	
      more	
      workaround,	
      but	
      
    would	
      prefer	
      in-­‐kernel	
      aggrega?ons	
      
    
slide 81:
    perf-­‐tools:	
      opensnoop	
      
    • Trace	
      open()	
      syscalls	
      showing	
      filenames:	
      
    # ./opensnoop -t!
    Tracing open()s. Ctrl-C to end.!
    TIMEs
    COMM
    PID
    postgres
    postgres
    postgres
    postgres
    postgres
    postgres
    postgres
    svstat
    svstat
    stat
    stat
    stat
    stat
    stat
    stat
    […]!
    FD FILE!
    0x8 /proc/self/oom_adj!
    0x5 global/pg_filenode.map!
    0x5 global/pg_internal.init!
    0x5 base/16384/PG_VERSION!
    0x5 base/16384/pg_filenode.map!
    0x5 base/16384/pg_internal.init!
    0x5 base/16384/11725!
    0x4 supervise/ok!
    0x4 supervise/status!
    0x3 /etc/ld.so.cache!
    0x3 /lib/x86_64-linux-gnu/libselinux…!
    0x3 /lib/x86_64-linux-gnu/libc.so.6!
    0x3 /lib/x86_64-linux-gnu/libdl.so.2!
    0x3 /proc/filesystems!
    0x3 /etc/nsswitch.conf!
    
slide 82:
    perf-­‐tools:	
      kprobe	
      
    • Just	
      wrapping	
      capabili?es	
      eases	
      use.	
      Eg,	
      kprobes:	
      
    # ./kprobe 'p:open do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'!
    Tracing kprobe myopen. Ctrl-C to end.!
    postgres-1172 [000] d... 6594028.787166: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
    postgres-1172 [001] d... 6594028.797410: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
    postgres-1172 [001] d... 6594028.797467: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat”!
    ^C!
    Ending tracing...!
    • By	
      some	
      defini?on	
      of	
      “ease”.	
      Would	
      like	
      easier	
      symbol	
      usage,	
      
    instead	
      of	
      +0(%si).	
      
    
slide 83:
    1.	
      Jrace	
      
    • Sugges?ons:	
      
    – I’m	
      blogging	
      and	
      so	
      can	
      you!	
      
    – Func?on	
      profiler:	
      
    • Can	
      these	
      in-­‐kernel	
      counts	
      be	
      used	
      for	
      other	
      vars?	
      
    Eg,	
      associa?ve	
      array	
      or	
      histogram	
      of	
      %dx	
      
    – Func?on	
      grapher:	
      
    • Can	
      the	
      ?ming	
      be	
      exposed	
      by	
      some	
      vars?	
      
    Picture	
      histogram	
      of	
      latency	
      
    – Mul?-­‐user	
      access	
      possible?	
      
    
slide 84:
    2.	
      perf_events	
      
    
slide 85:
    2.	
      perf_events	
      
    • In-­‐kernel,	
      tools/perf,	
      mul?-­‐tool,	
      “perf”	
      command	
      
    • Experiences:	
      
    – Stable,	
      powerful,	
      reliable	
      
    – The	
      sub	
      op?ons	
      can	
      feel	
      inconsistent	
      (perf	
      bench?)	
      
    – Amazing	
      with	
      kernel	
      debuginfo,	
      when	
      we	
      have	
      it	
      
    – We	
      use	
      it	
      for	
      CPU	
      stack	
      profiles	
      all	
      the	
      ?me	
      
    • And	
      turn	
      them	
      into	
      flame	
      graphs,	
      which	
      have	
      solved	
      
    numerous	
      issues	
      so	
      far…	
      
    
slide 86:
    perf	
      CPU	
      Flame	
      Graph	
      
    Kernel	
      
    TCP/IP	
      
    Broken	
      
    Java	
      stacks	
      
    (missing	
      
    frame	
      
    pointer)	
      
    GC	
      
    Locks	
      
    Time	
      
    Idle	
      
    thread	
      
    epoll	
      
    
slide 87:
    2.	
      perf_events	
      
    • Sugges?ons:	
      
    – Support	
      for	
      	
      func?on	
      argument	
      symbols	
      without	
      a	
      full	
      
    debuginfo	
      
    – Rework	
      scrip?ng	
      framework	
      (eg,	
      try	
      por?ng	
      iosnoop)	
      
    • eg,	
      “perf	
      record”	
      may	
      need	
      a	
      tunable	
      ?meout	
      to	
      trigger	
      
    data	
      writes,	
      for	
      efficient	
      interac?ve	
      scripts	
      
    – Break	
      up	
      the	
      mul?-­‐tool	
      a	
      bit	
      (separate	
      perf	
      bench)	
      
    – eBPF	
      integra?on	
      for	
      custom	
      aggrega?ons?	
      
    
slide 88:
    3.	
      SystemTap	
      
    
slide 89:
    3.	
      SystemTap	
      
    • The	
      most	
      powerful	
      of	
      the	
      tracers	
      
    • Used	
      for	
      the	
      deepest	
      custom	
      tracing	
      
    – Especially	
      Java	
      hotspot	
      probes	
      
    • Experiences:	
      
    – Undergoing	
      a	
      reset.	
      Switching	
      to	
      the	
      latest	
      SystemTap	
      
    version,	
      and	
      a	
      newer	
      kernel.	
      So	
      far,	
      so	
      good.	
      
    – Trying	
      out	
      nd_syscall	
      for	
      debuginfo-­‐less	
      tracing	
      
    • Sugges?ons:	
      
    – More	
      non-­‐debuginfo	
      tapset	
      func?onality	
      
    
slide 90:
    4.	
      eBPF	
      
    
slide 91:
    4.	
      eBPF	
      
    • Extended	
      BPF:	
      programs	
      on	
      tracepoints	
      
    Time	
      
    – High	
      performance	
      filtering:	
      JIT	
      
    – In-­‐kernel	
      summaries:	
      maps	
      
    • eg,	
      in-­‐kernel	
      latency	
      heat	
      map	
      (showing	
      bimodal):	
      
    Low	
      
    latency	
      
    cache	
      
    hits	
      
    High	
      
    latency	
      
    device	
      
    I/O	
      
    
slide 92:
    4.	
      eBPF	
      
    • Experiences:	
      
    – Can	
      have	
      lower	
      CPU	
      overhead	
      than	
      DTrace	
      
    – Very	
      powerful:	
      really	
      custom	
      maps	
      
    – Assembly	
      version	
      very	
      hard	
      to	
      use;	
      C	
      is	
      be^er,	
      but	
      s?ll	
      
    not	
      easy	
      
    • Sugges?ons:	
      
    – Integrate:	
      custom	
      in-­‐kernel	
      aggrega?ons	
      is	
      the	
      
    missing	
      piece	
      
    
slide 93:
    Other	
      Tracers	
      
    • Experiences	
      and	
      sugges?ons:	
      
    – ktap	
      
    – LTTng	
      
    – Oracle	
      Linux	
      DTrace	
      
    – dtrace4linux	
      
    – sysdig	
      
    
slide 94:
    The	
      Tracing	
      Landscape,	
      Oct	
      2014	
      
    (less	
      brutal)	
      
    (my	
      opinion)	
      
    Ease	
      of	
      use	
      
    sysdig	
      
    perf	
      
    stap	
      
    Jrace	
      
    (alpha)	
      
    (brutal)	
      
    dtrace4L.
    ktap	
      
    (mature)	
      
    Stage	
      of	
      
    Development	
      
    eBPF	
      
    Scope	
      &	
      Capability	
      
    
slide 95:
    Summary	
      
    • DTrace	
      is	
      an	
      awesome	
      technology	
      
    – Which	
      has	
      also	
      had	
      awesome	
      marke?ng	
      
    • Tradi?onal,	
      social,	
      sales,	
      blogs,	
      …	
      
    – Most	
      people	
      won’t	
      use	
      it	
      directly,	
      and	
      that’s	
      ok	
      
    • Drive	
      usage	
      via	
      GUIs	
      and	
      scripts	
      
    • Linux	
      Tracers	
      are	
      catching	
      up,	
      and	
      may	
      surpass	
      
    – It’s	
      not	
      2005	
      anymore	
      
    • Now	
      we	
      have	
      Jrace,	
      perf_events,	
      kprobes,	
      uprobes,	
      …	
      
    – Speed	
      and	
      aggrega?ons	
      ma^er	
      
    • If	
      DTrace	
      is	
      Ki^y	
      Hawk,	
      eBPF	
      is	
      a	
      jet	
      engine	
      
    
slide 96:
    Acks	
      
    dtrace.conf	
      X-­‐ray	
      pony	
      art	
      by	
      substack	
      
    h^p://www.raspberrypi.org/	
      rasberry	
      PI	
      image	
      
    h^p://en.wikipedia.org/wiki/Crash_test_dummy	
      photo	
      by	
      Brady	
      Holt	
      
    h^ps://findery.com/johnfox/notes/all-­‐the-­‐wood-­‐behind-­‐one-­‐arrow	
      
    h^p://en.wikipedia.org/wiki/Early_flying_machines	
      hang	
      glider	
      image	
      
    h^p://www.beginningwithi.com/2010/09/12/how-­‐the-­‐dtrace-­‐book-­‐got-­‐
    done/	
      
    • h^p://www.cafepress.com/joyentsmartos.724465338	
      
    • h^p://generalzoi.deviantart.com/art/Pony-­‐Creator-­‐v3-­‐397808116	
      
    • Tux	
      by	
      Larry	
      Ewing;	
      Linux®	
      is	
      the	
      registered	
      trademark	
      of	
      Linus	
      
    Torvalds	
      in	
      the	
      U.S.	
      and	
      other	
      countries.	
      
    • Thanks	
      Dominic	
      Kay	
      and	
      Deirdré	
      Straughan	
      for	
      feedback	
      
    	
      
    	
      
    
slide 97:
    Links	
      
    h^ps://www.usenix.org/legacy/event/usenix04/tech/general/full_papers/
    cantrill/cantrill.pdf	
      
    Jrace	
      &	
      perf-­‐tools	
      
    • h^ps://github.com/brendangregg/perf-­‐tools	
      
    • h^p://lwn.net/Ar?cles/608497/	
      
    eBPF:	
      h^p://lwn.net/Ar?cles/603983/	
      
    ktap:	
      h^p://www.ktap.org/	
      
    SystemTap:	
      h^ps://sourceware.org/systemtap/	
      
    sysdig:	
      h^p://www.sysdig.org/	
      
    h^p://lwn.net/Ar?cles/114840/	
      CDDL	
      
    h^p://dtrace.org/blogs/ahl/2011/10/05/dtrace-­‐for-­‐linux-­‐2/	
      
    Jp://Jp.cs.wisc.edu/paradyn/papers/Tamches99Using.pdf	
      
    h^p://www.brendangregg.com/heatmaps.html	
      
    h^p://lkml.iu.edu/hypermail/linux/kernel/0011.3/0183.html	
      LTT	
      +	
      
    DProbes	
      
    
slide 98:
    Thanks	
      
    • Ques?ons?	
      
    • h^p://slideshare.net/brendangregg	
      	
      
    • h^p://www.brendangregg.com	
      
    • bgregg@ne7lix.com	
      
    • @brendangregg