FROSUG 2009: Little Shop of Performance Horrors
Video: http://www.youtube.com/watch?v=cklPFJysUYMMeetup talk for the Front Range Open Solaris User Group (FROSUG) in Colorado 2009, by Brendan Gregg.
This talk covers the worst performance issues I had seen, and how to learn from these mistakes.
next prev 1/34 | |
next prev 2/34 | |
next prev 3/34 | |
next prev 4/34 | |
next prev 5/34 | |
next prev 6/34 | |
next prev 7/34 | |
next prev 8/34 | |
next prev 9/34 | |
next prev 10/34 | |
next prev 11/34 | |
next prev 12/34 | |
next prev 13/34 | |
next prev 14/34 | |
next prev 15/34 | |
next prev 16/34 | |
next prev 17/34 | |
next prev 18/34 | |
next prev 19/34 | |
next prev 20/34 | |
next prev 21/34 | |
next prev 22/34 | |
next prev 23/34 | |
next prev 24/34 | |
next prev 25/34 | |
next prev 26/34 | |
next prev 27/34 | |
next prev 28/34 | |
next prev 29/34 | |
next prev 30/34 | |
next prev 31/34 | |
next prev 32/34 | |
next prev 33/34 | |
next prev 34/34 |
PDF: FROSUG2009_Performance_Horrors.pdf
Keywords (from pdftotext):
slide 1:
USE IMPROVE EVANGELIZE Little Shop of Performance Horrors Brendan Gregg Staff Engineer Sun Microsystems, Fishworks FROSUG 2009slide 2:
USE IMPROVE EVANGELIZE Performance Horrors I usually give talks on: – how to perform perf analysis! – cool performance technologies!! – awesome benchmark results!!! in other words, things going right. ◠This talk is about things going wrong: – performance horrors – learning from mistakesslide 3:
USE IMPROVE EVANGELIZE Horrific Topics The worst perf issues I've ever seen! â— Common misconfigurations â— The encyclopedia of poor assumptions â— Unbelievably bad perf analysis â— Death by complexity â— Bad benchmarking â— Misleading analysis tools â— Insane performance tuning â— The curse of the unexpectedslide 4:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen!slide 5:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! SMC – Administration GUI for Solaris – Could take 30 mins to load on first bootslide 6:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! SMC – Administration GUI for Solaris – Could take 30 mins to load on first boot Problems: – 12 Million mostly 1 byte sequential read()s of /var/sadm/smc/properties/registry.ser, a 72 KB file – 7742 processes executed – 9504 disk events, 2228 of them writes to the 72Kb registry.ser file. Happy ending – performance was improved in an updateslide 7:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! SMC (cont.) ◠Analysis using DTrace: – syscall frequency counts – syscall args This is “low hanging fruit†for DTrace ◠Lesson: examine high level events. Happy ending – performance was improved in an updateslide 8:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! nxge – 10 GbE network driver – tested during product developmentslide 9:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! nxge (cont.) – 10 GbE network driver – tested during product development Problems: – kstats were wrong (rbytes, obytes) this made perf tuning very difficult until I realized what was wrong! – CR: 6687884 nxge rbytes and obytes kstat are wrong Lessons: – don't trust statistics you haven't double checkedslide 10:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! nxge (cont.) – 10 GbE network driver – tested during product development Problems (#2): – memory leak starving the ZFS ARC – The kernel grew to 122 Gbytes in 2 hours. – 6844118 memory leak in nxge with LSO enabled – Original CR title: “17 MB/s kernel memory leak...†Lessons: – Bad memory leaks can happen in the kernel tooslide 11:
USE IMPROVE EVANGELIZE The worst perf issues I've ever seen! nxge (cont.) – 10 GbE network driver – tested during product development Problems (#3): – LSO (large send offload) destroyed performance: Priority changed from [3-Medium] to [1-Very High] This is a 1000x performance regression. brendan.gregg@sun.com 2008-05-01 23:25:58 GMT – 6696705 enabling soft-lso with fix for 6663925 causes nxge to perform very very poorly Lessons: All configurable options must be tested and retested during development for regressions (such as LSO)slide 12:
USE IMPROVE EVANGELIZE Common Misconfigurationsslide 13:
USE IMPROVE EVANGELIZE Common misconfigurations ZFS RAID-Z2 with half a JBOD – half a JBOD may mean 12 disks. A RAID-Z2 stripe may be 12 disks in width, therefore this configuration acts like a single disk: perf is that of the slowest disk in the stripe  with so few stripes (1), a multi-threaded workload is much more likely to scale Max throughput config without: – jumbo frames – 10 GbE ports (they do work!) sync write workloads without ZFS SLOG devicesslide 14:
USE IMPROVE EVANGELIZE Common misconfigurations Not running the latest software bits – perf issues are fixed often; always try to be on the latest software versions 4 x 1 GbE trunks, andslide 15:
USE IMPROVE EVANGELIZE The Encyclopedia of Poor Assumptionsslide 16:
USE IMPROVE EVANGELIZE The Encyclopedia of Poor Assumptions More CPUs == more performance – not if the threads don't scale Faster CPUs == more performance – not if your workload is memory I/O bound More IOPS capability == more performance – slower IOPS? Imagine a server with thousands of slow disks Network throughput/IOPS measured on the client reflects that of the server – client caching?slide 17:
USE IMPROVE EVANGELIZE The Encyclopedia of Poor Assumptions System busses are fast – The AMD HyperTransport was the #1 bottleneck for the Sun Storage products 10 GbE can be driven by 1 client – may be true in the future, but difficult to do now – may assume that this can be done with 1 thread! Performance observability tools are designed to be the best possible ◠Performance observability statistics (or benchmark tools) are correct – bugs happen!slide 18:
USE IMPROVE EVANGELIZE The Encyclopedia of Poor Assumptions A network switch can drive all its ports to top speed at the same time – especially may not be true for 10 GbE switchs PCI-E slots are equal – test, don't assume; depends on bus architecture Add flash memory SSDs to improve performance! – Probably, but really depends on the workload – This is assuming that HDDs are slow; they usually are, however their streaming performance can be competitive (~100 Mbytes/sec)slide 19:
USE IMPROVE EVANGELIZE Unbelievably Bad Performance Analysisslide 20:
USE IMPROVE EVANGELIZE Unbelievably bad perf analysis The Magic 1 GbE NIC! â— How fast can a 1 GbE NIC run in one direction?slide 21:
USE IMPROVE EVANGELIZE Unbelievably bad perf analysis The Magic 1 GbE NIC! ◠How fast can a 1 GbE NIC run in one direction? ◠Results sent to me include: – 120 Mbytes/sec – 200 Mbytes/sec – 350 Mbytes/sec – 800 Mbytes/sec – 1.15 Gbytes/sec Lesson: perform sanity checksslide 22:
USE IMPROVE EVANGELIZE Death by Complexity!slide 23:
USE IMPROVE EVANGELIZE Death by complexity! Performance isn't that hard, however it often isn't that easy either... ◠TCP/IP stack performance analysis – heavy use of function pointers ZFS performance analysis – I/O processed asynchorously by the ZIO pipelineslide 24:
USE IMPROVE EVANGELIZE Bad Benchmarkingslide 25:
USE IMPROVE EVANGELIZE Bad benchmarking SPEC-SFS http://blogs.sun.com/bmc/entry/eulogy_for_a_benchmark – Copying a file from a local filesystem to an NFS share, to performance test that NFS share various opensource benchmark tools that don't reflect your intended workload Lesson: don't run benchmark tools blindly; learn everything you can about what they do, and how close they match your environmentslide 26:
USE IMPROVE EVANGELIZE Misleading Analysis Toolsslide 27:
USE IMPROVE EVANGELIZE Misleading analysis tools top load averages: 0.03, 0.03, 17:05:29 236 processes: 233 sleeping, 2 stopped, 1 on cpu CPU states: 97.7% idle, 0.8% user, 1.6% kernel, 0.0% iowait, 0.0% swap Memory: 8191M real, 479M free, 1232M swap in use, 10G swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 101092 brendan 93M 25M sleep 187:42 0.28% realplay.bin 100297 root 26 100 -20 182M 177M sleep 58:13 0.14% akd 399362 brendan 95M 28M sleep 53:56 0.12% realplay.bin 115306 root 0K sleep 21:30 0.06% dtrace 100876 brendan 0K sleep 103:52 0.05% Xorg – What does %CPU mean? Are they all CPU consumers? – What does RSS mean?slide 28:
USE IMPROVE EVANGELIZE Misleading analysis tools vmstat # vmstat 1 kthr r b w memory swap free page disk mf pi po fr de sr s0 s1 s2 s3 faults cpu cs us sy id 0 0 0 10830436 501464 54 91 2 5 18 18 1 1835 4807 2067 3 94 0 0 0 10849048 490460 9 245 0 0 16 16 0 1824 3466 1664 4 96 0 0 0 10849048 490488 0 0 1470 3294 1227 1 99 0 0 0 10849048 490488 0 0 1440 3315 1226 1 99 0 0 0 10849048 490488 0 0 1447 3278 1236 1 98 – What does swap/free mean? – Why do we care about de, sr?slide 29:
USE IMPROVE EVANGELIZE Insane Performance Tuningslide 30:
USE IMPROVE EVANGELIZE Insane performance tuning disabling CPUs – turning off half the available CPUs can improve performance (relieving scaleability issues) binding network ports to fewer cores – improves L1/L2 CPU cache hit rate – reduces cache coherency traffic reducing CPU clock rate – if the workload is memory bound, this may have little effect, but save heat, fan, vibration issues...slide 31:
USE IMPROVE EVANGELIZE Insane performance tuning less memory – systems with 256+ Gbytes of DRAM – codepaths that walk DRAM warming up the kmem caches – before benchmarking, a freshly booted server won't have its kmem caches populated. Warming them up with any data can improve performance by 15% or so.slide 32:
USE IMPROVE EVANGELIZE The Curse of the Unexpectedslide 33:
USE IMPROVE EVANGELIZE The Curse of the Unexpected A switch has 2 x 10 GbE ports, and 40 x 1 GbE ports. How fast can it drive Ethernet? – Unexpected: some cap at 11 Gbit/sec total! Latency – Heat map discoveries – DEMO (http://blogs.sun.com/brendan)slide 34:
USE IMPROVE EVANGELIZE Thank you! Brendan Gregg Staff Engineer brendan@sun.com http://blogs.sun.com/brendan “open†artwork and icons by chandan: http://blogs.sun.com/chandan