How much CPU does a down interface chew?

I originally posted this at http://bdgregg.blogspot.com/2006/08/how-much-cpu-does-down-interface-chew.html.

While teaching a DTrace class in Sydney this week, I ran an ad-lib demo that took a mysterious and surprising turn. I think you'll find this quite interesting.

One of my favourite DTrace demos begins by asking how much CPU a busy network interface will consume. About 2 in 10 people will know the answer: a lot! But how did you measure this on older Solaris? After all, if it really is "a lot", then we really want a value! 20% of a CPU? 50%?

I was demonstrating on an Ultra 10, with a 333 MHz CPU and a 100 Mbit/sec interface (hme0). Even with the numerous TCP/IP performance enhancements in Solaris 10, driving 100 Mbit/sec on an Ultra 10 will still hurt.

From another host, I sent a large file over the network to drive the interface at capacity. I thought I'd start by showing how busy the network interface was by using my sysperfstat tool from my K9Toolkit (I usually use nicstat, I thought this would be a nice change). sysperfstat ran for a few seconds, then I began the bulk network transfer:

$ /brendan/K9Toolkit/sysperfstat 1
    Time    %CPU   %Mem  %Disk   %Net     CPU    Mem   Disk    Net
19:24:27    5.24  33.52   0.52   0.07    0.05   0.00   0.01   0.00
19:24:28    7.35  33.53   1.29   0.00    0.00   0.00   0.00   0.00
19:24:29    7.48  33.53   0.00   0.00    0.00   0.00   0.00   0.00
19:24:30   39.65  33.79   1.27  51.27    3.69   0.00   0.00   0.00
19:24:32   65.67  33.79   0.00 173.14    1.28   0.00   0.00   0.00
19:24:34   68.35  33.79   0.00 173.03    0.00   0.00   0.00   0.00
19:24:37   80.12  33.79   0.00 176.33    0.82   0.00   0.00   0.00
19:24:39   76.97  33.79   0.00 178.12    0.43   0.00   0.00   0.00
19:24:41   76.08  33.79   1.61 164.82    1.37   0.00   0.00   0.00
19:24:44   80.18  33.79   0.00 177.68    0.38   0.00   0.00   0.00

Oh dear! I was expecting %Net to reach around 90%, not over 170%!

Both the Net and Disk percentages are total percentages, summed across all network interfaces or disks. From experience, the total for these resource types has been more useful than average percentages (which CPU and Mem use). 170% could indicate that I have two network interfaces running at around 85%. But why? I had only initiated one network transfer to the hme0 address.

The system had become so heavily loaded that sysperfstat's interval of one second became two, and sometimes three seconds. This can corrupt the metrics reported by many tools, if they assume that they only sleep for the intended interval (say, by making a sleep(1) call), when instead they may have slept for longer. Fortunately, I'd seen this break performance tools before, and coded sysperfstat to deal with it: I use high-resolution delta times from kstat to measure the true interval, and divide the incremented metrics accordingly. So the output isn't broken for that reason.

Demonstrations are much more exciting when even you don't know of the outcome. So why did we hit 170% utilisation? Maybe another network interface was doing traffic that I didn't know about:

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
hme0: flags=1000863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.1.13 netmask ffffff00 broadcast 192.168.1.255
        ether 8:0:20:b0:7a:18
qfe0: flags=1000862<BROADCAST,NOTRAILERS,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.1.2.3 netmask ff000000 broadcast 10.255.255.255
        ether 8:0:20:b0:7a:18
#

No answers there. hme0 is "UP", qfe0 is both down and has an address class that we aren't using anywhere at this site. So qfe0 can't be doing anything, right?

$ /brendan/K9Toolkit/nicstat 1
    Time   Int   rKb/s   wKb/s   rPk/s   wPk/s    rAvs    wAvs   %Util     Sat
19:25:34  qfe0    3.77    0.00    2.98    0.00  1292.8   42.00    0.03    0.00
19:25:34  hme0    4.69    0.42    3.77    5.92  1272.9   72.52    0.04    0.00
    Time   Int   rKb/s   wKb/s   rPk/s   wPk/s    rAvs    wAvs   %Util     Sat
19:25:36  qfe0 10435.6    0.00  8642.5    0.00  1236.4    0.00   85.49    0.00
19:25:36  hme0 10394.1   818.6  8601.4 15524.7  1237.4   54.00   91.86    0.00
    Time   Int   rKb/s   wKb/s   rPk/s   wPk/s    rAvs    wAvs   %Util     Sat
19:25:39  qfe0 10469.2    0.00  8744.1    0.00  1226.0    0.00   85.76    0.00
19:25:39  hme0 10451.4   828.3  8732.5 15708.5  1225.5   54.00   92.40    0.00

Woah, this "down" qfe0 with a dummy address is 85% busy. Something really strange is going on!

I had began this demo to show %CPU for busy interfaces, leading us to run the DTrace based "intrstat" tool, which is new to Solaris 10. It shows us how much CPU our network interface interrupts are consuming, and now I had a renewed interest in what it had to say:

# intrstat 1

      device |      cpu0 %tim
-------------+---------------
       hme#0 |      1232 32.8
       m64#0 |         0  0.0
       qfe#0 |      1140 37.8
        su#0 |         0  0.0
        su#1 |         0  0.0
      uata#0 |         0  0.0

      device |      cpu0 %tim
-------------+---------------
       hme#0 |      1233 32.9
       m64#0 |         0  0.0
       qfe#0 |      1138 37.8
        su#0 |         0  0.0
        su#1 |         0  0.0
      uata#0 |         1  0.0
...

hme0 is consuming 33% of CPU0, quite a substantial amount considering that this is just a 100 Mbit/sec interface. That's where the planned demo would have ended.

However, qfe0 is consuming 37.8% of CPU0. This is really frightening! Remember, I am sending traffic to hme0's address, and qfe0 is not marked "UP". This is a dead interface with a dummy address, and it is consuming a LOT of CPU! (Perhaps it is talking to other dead interfaces? ;-)

I began thinking outside the box, and Scott was around to help. We found that the room was connected to a hub, both hme0 and qfe0 were connected to this same hub, and they both had the same ethernet address. So while qfe0 eventually rejected the packets that it could see, it had to walk through much kernel code to get to that point, consuming CPU cycles and incrementing kstats. A clue from the nicstat output was that qfe0 was receiving but not sending, so it wasn't taking part in a connection.

I unplugged qfe0 and the problem vanished. Easy. However, making the mental leap that CPU load might be caused by a down interface is far from obvious! It's counter intuitive.

...

As an aside, I also ask how we measure how utilised the network interfaces are. Very few people have been able to answer this, and it is a topic I hope chapter 7 of this book will increase awareness for. (You can use nicstat or sysperfstat from the K9Toolkit).

Brendan Gregg's Blog

How much CPU does a down interface chew?