This is a short selection of my most useful and popular material. My homepage for the full list.
- perf Examples: Linux perf_events one-liners, examples, and visualizations.
- eBPF Tracing Tools: Linux enhanced BPF tools for performance analysis.
- The USE Method: a performance methodology for identifying resource bottlenecks.
- USE Method: Rosetta Stone: performance checklists for different OSes.
- Off-CPU Analysis: a methodology for analyzing blocked time, complimentary to CPU analysis.
- TSA Method: the methodology of thread state analysis.
- Active Benchmarking: a methodology for performing accurate benchmarks.
- Working Set Size Estimation: showing techniques for understanding main memory usage.
- CPU Flame Graphs: a visualization for sampled stack traces.
- Off-CPU Flame Graphs: different techniques for analyzing blocking events.
- Memory Flame Graphs: techniques for efficiently analyzing leaks and growth.
- Latency Heat Maps: a visualization for latency distributions over time.
- Utilization Heat Maps: different visualizations for resource utilization.
- Frequency Trails: a visualization for multiple distributions.
- What is Observability defines this made-up computer word (2021).
- FlameScope Pattern Recognition, shows how to interpret subsecond-offset heatmap views of profiled data (2018).
- KPTI/KAISER Meltdown Initial Performance Regressions, analyzing the Linux kernel regression we'll all see (2018).
- Linux Load Averages: Solving the Mystery, where I explained the inclusion of the uninterruptible sleep state (2017).
- CPU Utilization is Wrong: a post explaining the growing problem of memory stall cycles dominating the %CPU metric (2017).
- gdb Debugging Full Example (Tutorial): a post to share an entire debugging, including output and explanations (2016).
- The Flame Graph article for ACMQ and CACM that defines and explains flame graphs, and discusses future developments (2016).
- Linux Performance Analysis in 60,000 Milliseconds (PDF): for the Netflix Tech Blog, by myself and the perf team (2015).
- Java in Flames (PDF): for the Netflix Tech Blog, introducing mixed-mode Java flame graphs (2015).
- eBPF One Small Step: introducing Linux eBPF and explaining the capabilities this feature brings (2015).
- Ftrace: The Hidden Light Switch: an lwn.net article about Linux ftrace (2014).
- The Benchmark Paradox: a short blog post explaining a seeming paradox in benchmark evaluations (2014).
- strace Wow Much Syscall: my warning blog post about strace(1), along with many bad strace-related jokes (2014).
- The Case of the Clumsy Kernel (PDF): a kernel performance analysis article for USENIX ;login (2013).
- The Greatest Tool that Never Worked: har: about the value of ideas in software screenshots (2013).
- Top 10 DTrace Scripts for Mac OS X: included an intro to command line DTrace usage (2011).
- Visualizing System Latency: an article for ACMQ and CACM about latency heat maps (2010).
- perf-tools: perf analysis tools based on Linux perf_events and ftrace.
- FlameGraph: a visualization for sampled stack traces, used for performance analysis.
- HeatMap: an program for generating interactive SVG heat maps from trace data.
- Specials: "special" tools for system administrators.
Cloud Performance Root Cause Analysis at Netflix, YOW! Conf Australia, 2018
Performance Tuning EC2 Instances, AWS re:Invent, 2017
Linux 4.x Performance: Using BPF Superpowers, Facebook's Performance @Scale, 2016
Click for video of: Linux 4.x Performance: Using BPF Superpowers (Brendan Gregg)Posted by At Scale on Friday, February 26, 2016
Visualizing Performance with Flame Graphs, USENIX ATC, Santa Clara, 2017
System Methodology, ACM Applicative, New York, 2016
Performance Checklists for SREs, SREcon Santa Clara, 2016
Linux Performance Tools, O'Reilly Velocity, Santa Clara, 2015
- Give me 15 minutes and I'll change your view of Linux tracing, USENIX/LISA, 2016: youtube (18 mins).
- Broken Performance Tools for QConSF, 2015: slideshare, infoq (slides, video) (50 mins).
- Netflix Instance Analysis Requirements for Monitorama, 2015: blog (slides, video) (34 mins).
- What Linux Can Learn from Solaris Performance, and Vice-Versa, SCaLE, 2015: youtube, slideshare (60 mins).
- Flame Graphs on FreeBSD, FreeBSD Developer and Vendor Summit, 2014: blog (slides, video) (53 mins).
- Performance Analysis of BSD, MeetBSD CA, 2014: blog (slides, video) (53 mins).
- Analyzing OS X Systems Performance with the USE Method, MacIT, 2014: slideshare (no video).
- Benchmarking Gone Wrong, Surge 2013 lightning talk: youtube (5 mins).
- Stop the Guessing, Velocity 2013: youtube, slideshare (46 mins).
- Open Source Systems Performance, OSCON, 2013: slideshare, youtube (32 mins).
- Blazing Performance with Flame Graphs, USENIX LISA, 2013: youtube, slideshare (90 mins).
- Performance Analysis Methodology, USENIX/LISA, 2012: slideshare, youtube (90 mins).
- ZFS: Performance Analysis and Tools, zfsday, 2012: slideshare, youtube (43 mins).
- Performance Visualizations, USENIX/LISA, 2010: slideshare, youtube (80 mins).
More listed on my homepage.
Systems Performance: Enterprise and the Cloud 2nd Edition, 2020
Brendan Gregg. ISBN 978-0-13-682015-4. Addison-Wesley.
Systems performance is the study of application, operating system, kernel, and hardware performance: Everything in the data path. The second edition of this best-selling book adds content on BPF, BCC, bpftrace, perf, and Ftrace, mostly removes Solaris, makes numerous updates to Linux and cloud computing, and includes general improvements and additions.
BPF Performance Tools: Linux System and Application Observability, 2019
Brendan Gregg. ISBN 0-13-655482-2. Addison-Wesley.
BPF originally stood for Berkeley Packet Filter, but has been extended to be an in-kernel execution environment in Linux, allowing a new type of software to be developed. This includes a new era of observability tools.
The book includes over 150 BPF observability tools that you can run to find performance wins and troubleshoot software, and also shows you how to write your own. Over one hundred of these BPF tools are newly-developed for this book.
Systems Performance: Enterprise and the Cloud, 2013
This book covers new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers Solaris-based distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. The book is 635 pages plus appendices.
DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD, 2011
This shows how to use DTrace by-example for performance analysis and troubleshooting. Solaris was used as the primary OS, with additional examples from Mac OS X and FreeBSD. The most difficult challenge for using a dynamic tracing tool (DTrace, SystemTap, etc.) is knowing what to do with it. This book provides over one hundred use cases (scripts), which will be invaluable even after the example code becomes out of date. 1152 pages.
Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris, 2006
Richard McDougall, Jim Mauro, Brendan Gregg. ASIN 0131568191. Prentice Hall.
A practical guide to performance analysis on Solaris. This summarizes background for context, and shows how to use the various tools available. This book was written at an interesting time: DTrace was new, filling in many observability gaps, and this book covers the best of the old and new ways of analysis. It was written as a companion volume to Solaris Internals 2nd Edition, which it references.
If you purchase my books through Amazon or InformIT link, the book's technical editor earns a commission.