This is a selection of my most useful and popular material. See my homepage for the full list.
- perf Examples: Linux perf_events one-liners, examples, and visualizations.
- The USE Method: a performance methodology for identifying resource bottlenecks.
- USE Method: Rosetta Stone: performance checklists for different OSes.
- Active Benchmarking: a methodology for performing accurate benchmarks.
- Off-CPU Analysis: a methodology for analyzing blocked thread time, complimentary to CPU analysis.
- CPU Flame Graphs: a visualization for sampled stack traces.
- Latency Heat Maps: a visualization for latency distributions over time.
- Frequency Trails: a visualization for multiple distributions.
- CPU Utilization is Wrong: a post explaining the growing problem of memory stall cycles dominating the %CPU metric (2017).
- gdb Debugging Full Example (Tutorial): a post to share an entire debugging, including output and explanations (2016).
- The Flame Graph article for ACMQ that defines and explains flame graphs, and discusses possible future developments (2016).
- Linux Performance Analysis in 60,000 Milliseconds (PDF): for the Netflix Tech Blog, by myself and the perf team (2015).
- Java in Flames (PDF): for the Netflix Tech Blog, introducing mixed-mode Java flame graphs (2015).
- eBPF One Small Step: introducing Linux eBPF and explaining the capabilities this feature brings (2015).
- Ftrace: The Hidden Light Switch: an lwn.net article about Linux ftrace (2014).
- The Benchmark Paradox: a short blog post explaining a seeming paradox in benchmark evaluations (2014).
- strace Wow Much Syscall: my warning blog post about strace(1), along with many bad strace-related jokes (2014).
- The Case of the Clumsy Kernel (PDF): a kernel performance analysis article for USENIX ;login (2013).
- The Greatest Tool that Never Worked: har: about the value of ideas in software screenshots (2013).
- Top 10 DTrace Scripts for Mac OS X: included an intro to command line DTrace usage (2011).
- Visualizing System Latency: an article for CACM about latency heat maps (2010).
- perf-tools: perf analysis tools based on Linux perf_events and ftrace.
- FlameGraph: a visualization for sampled stack traces, used for performance analysis.
- HeatMap: an program for generating interactive SVG heat maps from trace data.
- Specials: "special" tools for system administrators.
Linux 4.x Performance: Using BPF Superpowers, Facebook's Performance @Scale, 2016
Click for video of: Linux 4.x Performance: Using BPF Superpowers (Brendan Gregg)Posted by At Scale on Friday, February 26, 2016
BPF: Tracing and More, LCA, 2017
System Methodology, ACM Applicative, New York, 2016
Performance Checklists for SREs, SREcon Santa Clara, 2016
Linux Performance Tools, O'Reilly Velocity, 2015
Performance Tuning EC2 Instances, AWS re:Invent, 2014
From Clouds to Roots, Surge, 2014
Blazing Performance with Flame Graphs, USENIX/LISA, 2013
Stop the Guessing: Performance Methodologies for Production Systems, Velocity, 2013
- Give me 15 minutes and I'll change your view of Linux tracing, USENIX/LISA, 2016: youtube (18 mins).
- Broken Performance Tools for QConSF, 2015: slideshare, infoq (slides, video) (50 mins).
- Netflix Instance Analysis Requirements for Monitorama, 2015: blog (slides, video) (34 mins).
- What Linux Can Learn from Solaris Performance, and Vice-Versa, SCaLE, 2015: youtube, slideshare (60 mins).
- Flame Graphs on FreeBSD, FreeBSD Developer and Vendor Summit, 2014: blog (slides, video) (53 mins).
- Performance Analysis of BSD, MeetBSD CA, 2014: blog (slides, video) (53 mins).
- Analyzing OS X Systems Performance with the USE Method, MacIT, 2014: slideshare (no video).
- Benchmarking Gone Wrong, Surge 2013 lightning talk: youtube (5 mins).
- Open Source Systems Performance, OSCON, 2013: slideshare, youtube (32 mins).
- Performance Analysis Methodology, USENIX/LISA, 2012: slideshare, youtube (90 mins).
- ZFS: Performance Analysis and Tools, zfsday, 2012: slideshare, youtube (43 mins).
- Performance Visualizations, USENIX/LISA, 2010: slideshare, youtube (80 mins).
More listed on my homepage.
Systems Performance: Enterprise and the Cloud, 2013
This book covers new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers Solaris-based distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. The book is 635 pages plus appendices.
DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD, 2011
This shows how to use DTrace by-example for performance analysis and troubleshooting. Solaris was used as the primary OS, with additional examples from Mac OS X and FreeBSD. The most difficult challenge for using a dynamic tracing tool (DTrace, SystemTap, etc.) is knowing what to do with it. This book provides over one hundred use cases (scripts), which will be invaluable even after the example code becomes out of date. 1152 pages.
Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris, 2006
Richard McDougall, Jim Mauro, Brendan Gregg. ASIN 0131568191. Prentice Hall, 2006.
A practical guide to performance analysis on Solaris. This summarizes background for context, and shows how to use the various tools available. This book was written at an interesting time: DTrace was new, filling in many observability gaps, and this book covers the best of the old and new ways of analysis. It was written as a companion volume to Solaris Internals 2nd Edition, which it references.