I originally posted this at http://dtrace.org/blogs/brendan/2012/12/13/usenix-lisa-2012-performance-analysis-methodology.
At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I've also posted USE Method-derived checklists for Solaris- and Linux-based systems.
I've summarized the methodologies in the talk below.
- Find a system or environment component you are not responsible for
- Hypothesize that the issue is with that component
- Redirect the issue to the responsible team
- When proven wrong, go to 1
- Pick observability tools that are
found on the Internet
found at random
Ad Hoc Checklist Method:
- ..N. Run A, if B, do C
Problem Statement Method:
- What makes you think there is a performance problem?
- Has this system ever performed well?
- What has changed recently? (Software? Hardware? Load?)
- Can the performance degradation be expressed in terms of latency or run time?
- Does the problem affect other people or applications (or is it just you)?
- What is the environment? What software and hardware is used? Versions? Configuration?
Workload Characterization Method:
- Who is causing the load? PID, UID, IP addr, ...
- Why is the load called? code path
- What is the load? IOPS, tput, type
- How is the load changing over time?
Drill-Down Analysis Method:
- Start at highest level
- Examine next-level details
- Pick most interesting breakdown
- If problem unsolved, go to 2
Latency Analysis Method:
- Measure operation time (latency)
- Divide into logical synchronous components
- Continue division until latency origin is identified
- Quantify: estimate speedup if problem fixed
For every resource, check:
Stack Profile Method:
- Profile thread stack traces (on- and off-CPU)
- Study stacks bottom-up