Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Performance Instrumentation Counters: short talk

15 May 2010

I originally posted this at http://blogs.sun.com/brendan/entry/performance_instrumentation_counters_short_talk.

Performance Instrumentation Counters (PICs; aka, Performance Monitoring Counters: PMCs) allow CPU internals to be observed, and are especially useful for identifying why exactly CPUs are busy - not just that they are. I've blogged about them before, as part of analyzing HyperTransport utilization and CPI (Cycles-per-Instruction).

There are a number of performance analysis needs for which can only be answered via PICs, either using the command line cpustat/cputrack tools, developer suites such as Oracle Sun Studio, or accessing them via DTrace. They include observing:

  • CPI: cycles per instruction
  • Memory bus utilization
  • I/O bus utilization (between the CPUs and I/O controllers)
  • CPU interconnect bus utilization
  • Level 1 cache (I$/D$) hit/miss rate
  • Level 2 cache (E$) hit/miss rate
  • Level 3 cache (if present) hit/miss rate
  • MMU events: TLB/TSB hit/miss rate
  • CPU stall cycles for other reasons (thermal?)
  • ... and more

This information is useful, not just for developers writing code (who are typically more familiar with their existence from using Oracle Sun Studio), but also for system administrators doing performance analysis and capacity planning.

I've recently been doing more performance analysis with PICs and taking advantage of PAPI (Performance Application Programming Interface), which provides generic counters that are both easy to identify and work across different platforms. Over the years I've maintained a collection of cpustat based scripts to answer questions from the above list. These scripts were written for specific architectures and became out of date when new processor types were introduced. PAPI solves this - I'm now writing a suite of cpustat based scripts based on PAPI (out of necessity - performance analysis is my job), that will work across different and future processor types. If I can, I'll post them here.

And for the reason of this post: Roch Bourbonnais, Jim Mauro and myself were recently in the same place at the same time, and used the opportunity to have a few informal talks about performance topics recorded on video. These talks wern't prepared beforehand, we just chatted about what we knew at the time, including advice and tips. This talk is on PICs:

I'm a fan of informal video talks, and I hope to do more - they are an efficient way to disseminate information. And for busy people like myself, it can be the difference between never documenting a topic or providing something - albeit informal - to help others out. Just based on my experience, the time it's taken to generate different formats of content has been:

  • Informal talk: 0.5 - 1 hour
  • Blog post: 1 - 10 hours
  • Formal presentation: 3 - 10 hours
  • Published article: 3 - 30+ hours
  • Whitepaper: 5 - 50+ hours
  • Book: (months)

In fact, it's taken twice the time to write this blog post about the videos than it took to plan and film them.

Documentation is another passion of mine, and we are doing some very cool things in the Fishworks product to create documentation deliverables in a smart and efficient way; which can be the topic of another blog post...