Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

DTracing Off-CPU Time

29 Jul 2007

I originally posted this at http://blogs.sun.com/brendan/entry/dtracing_off_cpu_time.

In this entry I'll demonstrate DTracing off-CPU time.

On my way back from OSCON 2007, I found myself in Portland airport with a laptop, no Internet and half an hour to kill. In fact, the same laptop that I used during the SVOSUG talk in April. It was a chance to finish the demo that I started back then.

For those that were there or listened in, the talk had an unfortunate start with over an hour of audiovisual issues, including problems with the laptop video driver, laptop boot process, the data projector, lights and microphone. Alan DuBoff did a good job of fixing things while I began plan B (presenting from a SunRay), and he fixed the video driver such that it works better with data projectors than any other Acer Ferrari I've tried.

One issue we didn't fix that night was a 20 second wait when starting up certain applications, such as gnome-terminal. It was to make a great live DTrace demo to finish with, but I ran out of time to do it properly (we finished at around 11:30pm). Here it is as a blog entry instead.

The problem

When running gnome-terminal, there is a 20 second wait before the terminal appears on the screen.

Initial check

During the 20 second wait:

  • vmstat 1 showed idle CPUs
  • prstat -m showed that gnome-terminal was not on-CPU

DTrace investigation

The gnome-terminal application is off-CPU for some reason, and the CPUs are otherwise idle. There are numerous different ways to begin the investigation with DTrace, including:

  • using the sched provider to trace off-CPU to on-CPU event time
  • using the syscall provider to trace elapsed time for syscalls
  • using the pid provider to trace elapsed time for application binary and library calls

I'll start with the sched provider (the syscall provider would also make a good starting point). I'll keep restarting gnome-terminal, so that I can DTrace all the events (this is harder if you are trying to DTrace something that is already off-CPU).

The following one-liner runs gnome-terminal and measures the time from that process leaving the CPU to when it returns, and only prints user stack traces if that time was over 1 second:

# dtrace -n 'sched:::off-cpu /pid == $target/ { self->start = timestamp; }
  sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
  { printf("waited: %d ms\\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0  48718                    resume:on-cpu waited: 20034 ms

              libc.so.1`door_call+0x1a
              libc.so.1`_nsc_trydoorcall+0x213
              libnsl.so.1`_door_getipnodebyname_r+0x8f
              libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
              libnsl.so.1`getipnodebyname+0xdf
              libsocket.so.1`get_addr+0x126
              libsocket.so.1`_getaddrinfo+0x414
              libsocket.so.1`getaddrinfo+0x19
              libORBit-2.so.0.1.0`get_netid+0x91
              libORBit-2.so.0.1.0`link_get_local_hostname+0x35
              libORBit-2.so.0.1.0`link_server_setup+0x56
              libORBit-2.so.0.1.0`giop_server_new+0x5e
              libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
              libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
              libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
              libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
              libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
              libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132

\^C

Hmm, looks like that user stack trace was truncated. Time to boost the ustackframes value:

# dtrace -x ustackframes=64 -n '
  sched:::off-cpu /pid == $target/ { self->start = timestamp; }
  sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
  { printf("waited: %d ms\\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
dtrace: pid 101415 has exited
CPU     ID                    FUNCTION:NAME
  0  48720                    resume:on-cpu waited: 20038 ms

              libc.so.1`door_call+0x1a
              libc.so.1`_nsc_trydoorcall+0x213
              libnsl.so.1`_door_getipnodebyname_r+0x8f
              libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
              libnsl.so.1`getipnodebyname+0xdf
              libsocket.so.1`get_addr+0x126
              libsocket.so.1`_getaddrinfo+0x414
              libsocket.so.1`getaddrinfo+0x19
              libORBit-2.so.0.1.0`get_netid+0x91
              libORBit-2.so.0.1.0`link_get_local_hostname+0x35
              libORBit-2.so.0.1.0`link_server_setup+0x56
              libORBit-2.so.0.1.0`giop_server_new+0x5e
              libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
              libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
              libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
              libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
              libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
              libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132
              libgconf-2.so.4.1.0`ConfigServer_add_client+0x4f
              libgconf-2.so.4.1.0`gconf_get_config_server+0xb9
              libgconf-2.so.4.1.0`gconf_engine_connect+0x24f
              libgconf-2.so.4.1.0`gconf_engine_get_default+0x4c
              libgconf-2.so.4.1.0`gconf_client_get_default+0x2a
              libgnomeui-2.so.0.1401.0`libgnomeui_post_args_parse+0x187
              libgnome-2.so.0.1401.0`gnome_program_postinit+0x61
              libgnome-2.so.0.1401.0`gnome_program_init_common+0x37b
              libgnome-2.so.0.1401.0`gnome_program_initv+0x4d
              libgnome-2.so.0.1401.0`gnome_program_init+0x56
              gnome-terminal`main+0x2d5
              gnome-terminal`_start+0x7a

Good. This stack trace is likely to have led to the process leaving the CPU for the measured 20038 ms (the system was otherwise idle, so it is unlikely to have been kicked off due to scheduling). Reading through the lines, it looks like it is resolving a hostname: a common source of latencies when DNS is misconfigured.

There are a number of lines we could begin studying to confirm what is happening (and arrive at the same answer); I'll start with libnsl.so.1`getipnodebyname+0xdf, which should have a man page entry:

# man getipnodebyname
[...]
     struct hostent \*getipnodebyname(const char  \*name,  int  af,
     int flags, int \*error_num);
[...]

Now to check what hostname is being resolved:

# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { trace(copyinstr(arg0)); }'
  -c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  50488            getipnodebyname:entry   marlin
\^C
dtrace: pid 101758 terminated by SIGINT

Oh, "marlin" is the hostname of this laptop, which should resolve just fine. Checking related configuration files:

# grep hosts /etc/nsswitch.conf
hosts:      files dns
#
# cat -n /etc/hosts
     1  #
     2  # Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
     3  # Use is subject to license terms.
     4  #
     5  #ident  "@(#)hosts      1.1     06/08/04 SMI"
     6  #
     7  127.0.0.1       localhost
     8  192.168.1.166   marlin marlin.sf.fw.jpn.com

Hmm, the name service switch file should cause /etc/hosts (/etc/inet/hosts) to be read first, which contains a valid entry for "marlin".

Lets take a closer look at the getipnodebyname() call, and trace the address family and flags arguments:

# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { printf("%s af=%d flags=%d",
  copyinstr(arg0), arg1, arg2); }' -c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  50488            getipnodebyname:entry marlin af=26 flags=19
\^C
dtrace: pid 101821 terminated by SIGINT

Ok, address family 26 is:

# grep 26 /usr/include/sys/socket.h
#define AF_INET6        26              /\* Internet Protocol, Version 6 \*/

IPv6! Should have checked earlier:

# grep ipnodes /etc/nsswitch.conf
ipnodes:    files dns
#
# cat -n /etc/inet/ipnodes
     1  #
     2  # Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
     3  # Use is subject to license terms.
     4  #
     5  #ident  "@(#)ipnodes    1.1     06/08/04 SMI"
     6  #
     7  ::1             localhost
     8  127.0.0.1       localhost

Adding an entry to the ipnodes file for "marlin" fixed the issue.

Take 2

Was using the sched provider and then pid to trace getipnodebyname() the best path to take? It's hard to say – there are many different ways DTrace can help you solve problems. Lets try some other paths.

Here I'll trace elapsed times from the syscall provider, since if we have an off-CPU issue on an idle system, it will almost certainly be visible at the syscall layer.

The procsystime tool from the DTraceToolkit performs different types of syscall time analysis. I'm using it as it saves a minute or so of typing. Here it runs the gnome-terminal command and produces a report of elapsed times by syscall:

# /opt/DTT/procsystime -e gnome-terminal

Elapsed Times for command gnome-terminal,

         SYSCALL          TIME (ns)
           gtime               1882
      sigpending               1921
     priocntlsys               4200
          sysi86               4354
          getgid               5152
       sysconfig               5606
     getsockname               5904
      systeminfo               7040
      setcontext               7342
          fxstat               7943
     getpeername               9393
       getrlimit              11717
          getuid              13927
           uname              14661
       sigaction              14698
          getpid              19241
      setsockopt              23743
          getcwd              33784
            fsat              41700
          stat64              43024
           readv              45937
          listen              49874
          llseek              52341
           mkdir              52633
            pipe              64612
           chmod              65684
           fcntl              70864
           utime              76677
         fstat64             109420
           ioctl             133301
          access             157803
          unlink             165703
          open64             221120
          accept             276863
            bind             281026
          writev             343976
             brk             344410
         memcntl             347475
           write             358764
      getdents64             422206
          munmap             486137
         connect             588166
     resolvepath             686768
       so_socket             748039
           close             846782
            open            1127761
            read            1191102
            mmap            1592314
           xstat            8521257
         pollsys            9817661
          doorfs        20035299808

The doorfs() call takes the door descriptor as the first argument:

# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(arg0); }' -c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
\^C
dtrace: pid 101833 terminated by SIGINT
#
# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(fds[arg0].fi_pathname); }'
  -c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
\^C
dtrace: pid 101835 terminated by SIGINT

The door calls were to /var/run/name_service_door, pointing to name resolution and the nscd process (which can be confirmed with more DTrace).

Running procsystime on nscd produces:

# /opt/DTT/procsystime -e -n nscd
Hit Ctrl-C to stop sampling...
\^C

Elapsed Times for processes nscd,

         SYSCALL          TIME (ns)
           ioctl               2452
          llseek               2615
         fstat64               3996
       sysconfig              13933
            open              15416
            read              26336
           gtime              52980
       so_socket             323066
           close             349885
           xstat             430465
            send             454944
         connect             795246
          doorfs          360127347
        lwp_park        10009952745
         pollsys        20032365315
       nanosleep       153088010532

nscd is multi-threaded, with many threads sleeping as they wait for work, meaning that large off-CPU elapsed times may have nothing to do with gnome-terminal. The time for pollsys() is interesting, as at 20 seconds it matches the measured time in gnome-terminal.

Analysing pollsys() further (the poll() manpage is similar):

# man poll
[...]
     int poll(struct pollfd fds[], nfds_t nfds, int timeout);
[...]

The first argument is an array with length specified by the second argument. Looping over an array is difficult from DTrace (due to lack of loops), but this doesn't really present a difficulty when troubleshooting. Here I'll dump the struct in hex, and the other values as ints:

# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/ { printf("nfds=%d", arg1);
  tracemem(copyin(arg0, 16), 16); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 42 fd 0b 7b ab 46 f0 d7 c0 0d  ....@.B..{.F....

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 0b 7b ab 46 f0 d7 c0 0d  ....@....{.F....

dtrace: pid 102041 has exited

Don't worry, DTrace lets you cast variables as structs if you like. I'm just dumping the data in hex as this is a short one-liner.

The nfds value shows that there is only ever one file descriptor in the array; and the hex dump (on this little endian architecture) shows that the file descriptor number is "5" (the first member of struct pollfd).

Now DTrace is used to print the pathname for that file descriptor:

# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/ 
  { trace(fds[*(uint32_t *)copyin(arg0, 4)].fi_pathname); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    542                    pollsys:entry   <unknown>
  0    542                    pollsys:entry   <unknown>
\^C
dtrace: pid 102063 terminated by SIGINT

Well, that didn't work. Not all file descriptors have pathnames, of course. It will save some time to borrow pfiles for the next step:

# pfiles `pgrep -x nscd`
101917: /usr/sbin/nscd
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   1: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   2: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   3: S_IFDOOR mode:0777 dev:279,0 ino:0 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC  door to nscd[101917]
   4: S_IFSOCK mode:0666 dev:277,0 ino:17874 uid:0 gid:0 size:0
      O_RDWR
        SOCK_RAW
        SO_SNDBUF(8192),SO_RCVBUF(8192)
        sockname: AF_ROUTE
        peername: AF_ROUTE
   5: S_IFSOCK mode:0666 dev:277,0 ino:53988 uid:0 gid:0 size:0
      O_RDWR
        SOCK_DGRAM
        SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(57344)
        sockname: AF_INET 192.168.1.78  port: 51697
        peername: AF_INET 192.168.1.5  port: 53

File descriptor 5 was a socket to the remote host 192.168.1.5 port 53 (DNS). This shows that nscd is waiting for 20 seconds on a DNS socket, and we know that gnome-terminal has a 20 second wait on the /var/run/name_service_door file. They are probably related.

Proving that they are related, if needed, is simply more DTrace. This time I'll borrow a script from /usr/demo/dtrace, which shows who is waiting for nscd and how long for:

# dtrace -s /usr/demo/dtrace/nscd.d -c gnome-terminal
dtrace: script '/usr/demo/dtrace/nscd.d' matched 27 probes
dtrace: pid 101594 has exited

  nscd                                                gnome-session
           value  ------------- Distribution ------------- count
           16384 |                                         0
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
           65536 |                                         0

  nscd                                                gnome-terminal
           value  ------------- Distribution ------------- count
           16384 |                                         0
           32768 |@@@@@@@@@@                               1
           65536 |@@@@@@@@@@@@@@@@@@@@                     2
          131072 |                                         0
          262144 |                                         0
          524288 |                                         0
         1048576 |                                         0
         2097152 |                                         0
         4194304 |                                         0
         8388608 |                                         0
        16777216 |                                         0
        33554432 |                                         0
        67108864 |                                         0
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         0
      2147483648 |                                         0
      4294967296 |                                         0
      8589934592 |                                         0
     17179869184 |@@@@@@@@@@                               1
     34359738368 |                                         0

In the 17.1 to 34.3 second bucket is gnome-terminal waiting for nscd – our 20 second wait.

The /usr/demo/dtrace/nscd.d script works by tracing who is nscd is waking up via the sched:::wakeup probe, after having traced the time when threads sleep (see the "sched" chapter in the Dynamic Tracing Guide on docs.sun.com).

The wrap

DTrace provides many ways to solve performance issues or for troubleshooting. Above were only a few techniques for analysing off-CPU time, but there are more available.

Some techniques can get very complex, and require much systems knowledge. Don't worry: if you only figure out one way to solve your problem, you've still solved your problem. (Systems knowledge should help you solve system problems faster).