Recent posts:
Blog index
About
RSS

Brendan Gregg's Blog

home

NFS Analytics example

17 Nov 2009

I originally posted this at http://blogs.sun.com/brendan/entry/nfs_analytics_example.

I just tracked down a minor performance issue on my desktop in a couple of minutes, and thought it may make a good example to follow as it demonstrates using Analytics along with dtrace, truss, snoop and kstats – really the entire suite of perf tools I regularly use on Solaris.

I was downloading a file and happened to run the kstat based nicstat tool to watch network traffic. After the download finished, traffic continued:

It's not much, but my desktop should be idle, and yet there are regular network reads and writes. I'd like to know what that is.

Running snoop should give me a quick idea of the traffic over the wire:

That's not what I was expecting. I'm doing NFS traffic? The only NFS share I have mounted is my home directory, and I'm not doing any sustained I/O right now.

Fortunately my home directory server is a Sun Storage system, so I can examine the NFS traffic with Analytics from the server side:

I'll begin by drilling down by client, as I know my desktop is "deimos":

This shows I'm doing more NFS IOPS to the home directory server than anyone else.

Now to drill down further, by type of operation and by file:

".elinks"? elinks is a text based browser that I use, a modern version of lynx.

Here's a screenshot of elinks viewing osnews:

So why is elinks causing so much NFS traffic?

Right now I could aim DTrace at elinks to get a detailed view of what it is up to, but as I'm not worried about slowing the target in this case, I'll take a quick look with truss to start with (runs non-root by default, fewer keystrokes and inbuilt syscall translation):

Ahh, elinks is repeatedly trying to read a non-existant file from my home directory, which is causing the NFS traffic.

To get an accurate read on the rate of these opens, I'll use dtrace to frequency count the filenames that were called with open() by elinks, printing a summary every second:

It's 63 opens per second, all of the ".elinks/formhist" file.

DTrace can dig deeper, shedding light as to why elinks is reading that file. I can just add "ustack()" as a second key to the DTrace aggregation used to frequency count filenames, to include the entire user stack trace:

I'm not familiar with the elinks source, so I'll read from the bottom of the stack trace up, looking for high level functions. It looks generic until I get to redraw_leds(). leds? LEDs? There is a pseudo-LED text display in the bottom right of the screen. It's calling redraw_terminal(), which is calling refresh_view(), and then into load_formhist_from_file() – causing the file open.

Doing file opens with the frequent update of the status LEDs sounds like a bug, and perhaps a known one. A minute with google found that this was known and fixed in the next version of elinks.

While I was watching, the home directory server for fishworks was serving most of its NFS IOPS for a bug in elinks on my desktop. Which I can eliminate by upgrading my elinks version. This is an example of the best type of performance win: eliminating unnecessary work.