Recent posts:
Blog index
About
RSS

Brendan Gregg's Blog

home

A quarter million NFS IOPS

02 Dec 2008

I originally posted this at http://blogs.sun.com/brendan/entry/a_quarter_million_nfs_iops.

Following the launch of the Sun Storage 7000 series, various performance results have been published. It's important when reading these numbers to understand their context, and how that may apply to your workload. Here I'll introduce some numbers regarding NFS read ops/sec, and explain what they mean.

A key feature of the Sun Storage 7410 is DRAM scalability, which can currently reach 128 Gbytes per head node. This can span a significant working set size, and so serve most (or even all) requests from the DRAM filesystem cache. If you aren't familiar with the term working set size, it refers to the amount of data which is frequently accessed. For example, your website could be multiple Tbytes in size, but only tens of Gbytes are frequently accessed hour after hour, which is your working set.

Considering that serving most or all of your working set from DRAM may be a real possibility, it's worth exploring this space. I'll start by finding the upper bound: what's the most NFS read ops/sec I can drive. Here are screenshots from Analytics that shows sustained NFS read ops/sec from DRAM. Starting with NFSv3:

And now NFSv4:

Both beating 250,000 NFS random read ops/sec from a single head node – great to see!

Questions when considering performance numbers

To understand these numbers, you must understand the context. These are the sort of questions you can ask yourself, along with the answers for those results above:

The above list covers many subtle issues to help you avoid them (don't learn them the hard way).

Traps to watch out for regarding IOPS

For IOPS results, there are some specific additional questions to consider:

Being more realistic: 8 Kbyte I/O with latency

The aim of the above was to discuss context, and to show how to understand a great result – such as 250,000+ NFS IOPS – by knowing what questions to ask. The two key criticisms for this result would be that it was for 1 byte I/Os, and that latency wasn't shown at all. Here I'll redo this with 8 Kbyte I/Os, and show how Analytics can display the NFS I/O latency. I'll also wind back to 10 clients, only use 1 of the 10 GigE ports on the 7410, and I'll gradually add threads to the clients until each is running 20:

The steps in the NFSv3 ops/sec staircase are where I'm adding more client threads.

I've reached over 145,000 NFSv3 read ops/sec, and this is not the maximum the 7410 can do (I'll need to use a second 10 GigE port to take this further). The latency does increase as more threads queue up, here it is plotted as a heat map with latency on the y-axis (the darker the pixel, the more I/Os were at that latency for that second). At our peak (which has been selected by the vertical line), most of the I/Os were faster than 55 us (0.055 milliseconds), which can be seen in the numbers in the list on the left.

Note that this is the NFSv3 read ops/sec delivered to the 7410 after the client NFS driver has processed the 8 Kbyte I/Os, which decided to split some of the 8 Kbyte reads into 2 x 4 Kbyte NFS reads (pagesize). This means the workload became a mixed 4k and 8k read workload, for which 145,000 IOPS is still a good value. (I'm tempted to redo this for just 4 Kbyte I/Os to keep things simpler, but perhaps this is another useful lesson in the perils of benchmarking – the system doesn't always do what it is asked.)

Reaching 145,000 4+ Kbyte NFS cached read ops/sec without blowing out latency is a great result, and it's the latency that really matters (and from latency comes IOPS). On the topic of latency and IOPS, I do need to post a follow up for the next level after DRAM: no, not disks, it's the L2ARC using SSDs in the Hybrid Storage Pool.