DTrace pid Provider Arguments

I originally posted this at http://dtrace.org/blogs/brendan/2011/02/11/dtrace-pid-provider-arguments.

In my previous post, I introduced the DTrace pid provider and discussed an important topic - the stability of its function "entry" probes. Here I'll continue discussing the pid provider, introducing function arguments and the stability of these as well. The aim of these posts is to share my experience using this provider, and of maintaing pid provider-based scripts. For more information on the pid provider, see the pid provider chapter of the DTrace Guide and the examples in the forthcoming DTrace book.

Function Entry Arguments

In my previous post I picked a simple and well-known function to begin tracing: libc's strlen(). This has a string pointer (char *) as the first argument. For the pid provider function entry probe, the first argument is available as the built-in variable "arg0", of type int64. Tracing that:

# dtrace -n 'pid$target:libc:strlen:entry { trace(arg0); }' -p 809
dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  1  63860                     strlen:entry         134508268
  1  63860                     strlen:entry         135408008
  1  63860                     strlen:entry         135408008

DTrace can examine function arguments.

However, it's not so interesting just as a pointer address - we can't see the string it is pointing to. Showing the string:

# dtrace -n 'pid$target:libc:strlen:entry { trace(stringof(arg0)); }' -p 809
dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address
(0x8046eec) in action #1
dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address
(0x8122988) in action #1
dtrace: error on enabled probe ID 1 (ID 63860: pid809:libc.so.1:strlen:entry): invalid address
(0x8122988) in action #1

The DTrace built-in stringof() takes a pointer and treats it as a pointer to a NULL-terminated string. This sounds like the right tool here, since we have a pointer to data and we need to inform DTrace that this data is a string. But it didn't work.

This is a common mistake. The pointer is to an address in user-land, whereas DTrace runs in kernel-land - so it can't be dereferenced directly. This is like calling a phone number without the area code when you are in the wrong area code - you may get no one, or a random person, but you definitely won't reach the person you were looking for. This is fixable with the DTrace copyinstr() function ("copyin" is a term from kernel code to refer to copying in data from user-land to the kernel):

# dtrace -n 'pid$target:libc:strlen:entry { trace(copyinstr(arg0)); }' -p 809
dtrace: description 'pid$target:libc:strlen:entry ' matched 1 probe
  0  63860                     strlen:entry   ls -F -l                         
  0  63860                     strlen:entry   /usr/bin/ls                      
  0  63860                     strlen:entry   /usr/bin/ls                      
  0  63860                     strlen:entry   ls -F -l                         
  0  63860                     strlen:entry   -l                               
  0  63860                     strlen:entry   /export/home/brendan

Key points:

DTrace runs in kernel-land.
You can examine user-land (process) memory using copyinstr() (and copyin()).
If you forget this, you'll see "invalid address" (if you're lucky).

And if you are unlucky, the address will coincide with something valid in the kernel address space, and you could see plausible output and not realize that it is in fact garbage. (eg, 64-bit process, 64-bit kernel, same address range in use).

Simple Data Types

I'll demonstrate simple data type arguments by tracing strncmp(), whose function prototype is:

     int strncmp(const char *s1, const char *s2, size_t n);

This demonstrates multiple arguments, strings and integers. I'll trace it for a bash shell:

# dtrace -qn 'pid$target::strncmp:entry { printf("%30.30s %30.30s %5d\n", copyinstr(arg0),
copyinstr(arg1), arg2); }' -p 670
                     _=/bin/ls __CF_USER_TEXT_ENCODING=0x1F7:     2
                     _=/bin/ls                      _=/bin/ls     2
                /Users/brendan                 /Users/brendan    14
                            bg                             ba     2
                          bind                             ba     2
                         break                             ba     2
                       builtin                             ba     2
                            ba                         banner     2
                        banner                             ba     2
                            ba                       basename     2
                      basename                             ba     2
[...]

While tracing I ran "ls -l", and then "ba" followed by a couple of tabs to list out command completion for "ba". You can see in the listing how bash was processing that - comparing available commands with the letters "ba" and a length of 2. This data was printed and formatted using printf().

Argument Summaries

Before we move on, I'd like to include an example of taking this simple argument data and presenting it as a summary, which I use about as often as listing all the data using trace() or printf().

Tracing malloc() (memory allocate) can be incredibly useful to see how applications are requesting memory. malloc() takes a size_t (integer) argument, which we could print:

# dtrace -n 'pid$target::malloc:entry { trace(arg0); }' -p 210
dtrace: description 'pid$target::malloc:entry ' matched 2 probes
^C
CPU     ID                    FUNCTION:NAME
  1  97364                     malloc:entry                 4
  1  97364                     malloc:entry                 4
  1  97364                     malloc:entry                16
  1  97364                     malloc:entry                16
  1  97364                     malloc:entry               252
  1  97364                     malloc:entry                60
[...]

However, malloc() often happens so frequently (thousands of times per second) that the overhead of printing out a line of data for each malloc() can slow the system and application down. Summaries are very useful here, such as DTrace's quantize() aggregation, used here to summarize Firefox malloc() during several seconds of browsing:

# dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p 210
dtrace: description 'pid$target::malloc:entry ' matched 2 probes
^C

           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |                                         1        
               1 |                                         16       
               2 |                                         540      
               4 |@@@@                                     10493    
               8 |@@@@@@                                   13944    
              16 |@@@@@@@                                  17669    
              32 |@@@@@@@@@@@@@@                           33733    
              64 |@@                                       4440     
             128 |@@                                       3670     
             256 |@@                                       5626     
             512 |@                                        3037     
            1024 |@                                        2546     
            2048 |                                         150      
            4096 |                                         76       
            8192 |                                         250      
           16384 |                                         17       
           32768 |                                         7        
           65536 |                                         181      
          131072 |                                         4        
          262144 |                                         0

Most of the requested sizes were in the 32 - 63 bucket, with one request in the 8 to 16 Mbyte range. We also caught one request for zero bytes!

When tracing malloc:entry, remember that this is requested, not returned bytes.
To see all allocations, trace other variants too: realloc(), calloc(), ...

The pid provider "return" can be used to examine the returned bytes (and errors) for these malloc() requests.

Argument Testing

In the previous example I just found something a little interesting, a zero byte malloc(). I can investigate this further as a demonstration of argument testing.

The argument values arg0..argN can be tested in DTrace predicates (/.../). Here I'll match the malloc() requests for zero bytes, and print the user-level stack back trace (this is still Firefox):

# dtrace -n 'pid$target::malloc:entry /arg0 == 0/ { ustack(); }' -p 210
dtrace: description 'pid$target::malloc:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  1  97364                     malloc:entry
              libSystem.B.dylib`malloc
              XUL`NS_Alloc_P+0x26
              XUL`JNIEnv_::CallStaticObjectMethod(_jclass*, _jmethodID*, ...)+0x41827
              XUL`DumpJSStack+0x12d397
              XUL`DumpJSStack+0x12ce63
              XUL`void std::__adjust_heap<__gnu_cxx::__normal_iterator<nsRefPtr
              XUL`NS_InvokeByIndex_P+0x58
              XUL`DumpJSStack+0x2b58e
[...]

Arguments can be tested in predicates

The above one-liner identified the code path calling a zero byte malloc(). This can be used by the developer to see why this happened.

While somewhat interesting, I doubt that these cause issues. I imagine they are a consequence of a code path allocating memory based on a dynamic variable - which is sometimes zero. The Chrome web browser does it too:

# dtrace -n 'pid$target::malloc:entry /arg0 == 0/ { @["zero byte malloc()s:"] = count(); }'
-p 335
dtrace: description 'pid$target::malloc:entry ' matched 2 probes
^C
  zero byte malloc()s:                                           1139

For my quick tests, anyway.

For another example of testing, I'll trace BSD socket's getaddrinfo() function, which has the prototype:

int
getaddrinfo(const char *hostname, const char *servname, const struct addrinfo *hints,
struct addrinfo **res);

The first two arguments can be fetched using copyinstr(), for example:

# dtrace -n 'pid$target::getaddrinfo:entry { printf("%s %s",
    copyinstr(arg1), copyinstr(arg1)); }' -p 210
dtrace: description 'pid$target::getaddrinfo:entry ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 98069: pid210:libSystem.B.dylib:getaddrinfo:entry):
invalid address (0x0) in action #1 at DIF offset 24
dtrace: error on enabled probe ID 1 (ID 98069: pid210:libSystem.B.dylib:getaddrinfo:entry):
invalid address (0x0) in action #1 at DIF offset 24

Oops. Didn't I do that right?

The problem this time is that I'm assuming that I can dereference the string. Not all pointers are valid, in this case I've tried to dereference a NULL pointer. This can be fixed by testing it first - here I've used ternary operators (A ? B : C):

# dtrace -n 'pid$target::getaddrinfo:entry { printf("%s %s",
    arg0 != NULL ? copyinstr(arg0) : "<NULL>",
    arg1 != NULL ? copyinstr(arg1) : "<NULL>"); }' -p 210
dtrace: description 'pid$target::getaddrinfo:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  1  98069                getaddrinfo:entry mxr.mozilla.org <NULL>
  1  98069                getaddrinfo:entry www.mozilla.org <NULL>
  0  98069                getaddrinfo:entry www.joyent.com <NULL>
  1  98069                getaddrinfo:entry 993-rjq-253.mktoresp.com <NULL>
  0  98069                getaddrinfo:entry dtrace.org <NULL>

As with C programming, you may want to test pointers before dereferencing.

Advanced Data Types

Examining arguments has been straightforward so far, apart from the copyinstr(). And with the DTrace summarizing features, processing them has been a breeze.

Now things get a lot harder.

Let's look at the third argument to getaddrinfo(), which is a "struct addrinfo". On Mac OS X this is:

struct addrinfo {
        int ai_flags;           /* input flags */
        int ai_family;          /* protocol family for socket */
        int ai_socktype;        /* socket type */
        int ai_protocol;        /* protocol for socket */
        socklen_t ai_addrlen;   /* length of socket-address */
        struct sockaddr *ai_addr; /* socket-address for socket */
        char *ai_canonname;     /* canonical name for service location */
        struct addrinfo *ai_next; /* pointer to next in list */
};

I'll demonstrate tracing the first four members of this struct:

# cat -n getaddrinfo.d
     1  #!/usr/sbin/dtrace -s
     2  
     3  pid$target::getaddrinfo:entry
     4  {
     5      this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo));
     6      printf("flags=%x family=%d socktype=%d protocol=%d",
     7          this->hints->ai_flags, this->hints->ai_family,
     8          this->hints->ai_socktype, this->hints->ai_protocol);
     9  }
# ./getaddrinfo.d -p 210
dtrace: failed to compile script ./getaddrinfo.d: line 6: operator -> cannot be
applied to a forward declaration: no struct addrinfo definition is available

Since the argument to getaddrinfo() is a pointer to a struct that is in user-land (process) memory, it must first be copied into kernel memory for DTrace to inspect, which is done on line 5. After that, its direct members can be dereferenced as is done on lines 7 and 8.

But it didn't work. I wanted to demonstrate this error since you'll see many similar scripts which trace the kernel via the fbt provider, and work fine. In the kernel, struct definitions are usually available (via CTF) so that DTrace can navigate them. This isn't the case for user-land, like in this example where the addrinfo struct definition was unavailable, causing DTrace to error.

The -C option to DTrace will invoke the preprocessor (if it is available), allowing #include to be used so that the structure definitions can be sourced from header files. The man page for getaddrinfo() suggests to include the following, as I have on lines 3-5:

# cat -n getaddrinfo.d
     1  #!/usr/sbin/dtrace -Cs
     2  
     3  #include <sys/types.h>
     4  #include <sys/socket.h>
     5  #include <netdb.h>
     6  
     7  pid$target::getaddrinfo:entry
     8  {
     9      this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo));
    10      printf("flags=%x family=%d socktype=%d protocol=%d",
    11          this->hints->ai_flags, this->hints->ai_family,
    12          this->hints->ai_socktype, this->hints->ai_protocol);
    13  }
# ./getaddrinfo.d -p 210
cc1: warning: /dev/fd/5 is shorter than expected
dtrace: failed to compile script ./getaddrinfo.d: "/usr/include/libkern/_OSByteOrder.h",
line 98: specified storage class not appropriate in D

This didn't work either, due to something from a header file that is a bit too tricky for DTrace to parse. This sort of issue isn't uncommon when trying to drag in header files and their dependencies. If you can #include something, great; if not, try including the struct definitions in your D script:

# cat -n getaddrinfo.d
     1  #!/usr/sbin/dtrace -s
     2  
     3  struct addrinfo {
     4          int ai_flags;           /* input flags */
     5          int ai_family;          /* protocol family for socket */
     6          int ai_socktype;        /* socket type */
     7          int ai_protocol;        /* protocol for socket */
     8          socklen_t ai_addrlen;   /* length of socket-address */
     9          struct sockaddr *ai_addr; /* socket-address for socket */
    10          char *ai_canonname;     /* canonical name for service location */
    11          struct addrinfo *ai_next; /* pointer to next in list */
    12  };
    13  
    14  pid$target::getaddrinfo:entry
    15  {
    16      this->hints = (struct addrinfo *)copyin(arg2, sizeof (struct addrinfo));
    17      printf("flags=%x family=%d socktype=%d protocol=%d",
    18          this->hints->ai_flags, this->hints->ai_family,
    19          this->hints->ai_socktype, this->hints->ai_protocol);
    20  }
# ./getaddrinfo.d -p 210
dtrace: script './getaddrinfo.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  98069                getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0
  1  98069                getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0
  0  98069                getaddrinfo:entry flags=0 family=0 socktype=1 protocol=0
^C

Success. I can read socket.h to see what these numbers are for (socktype=1 is SOCK_STREAM, for example). I could also enhance my D script to print out text versions of these for readability.

You can declare structs in D scripts and then cast pointers as those structs, so that members can be dereferenced

What if I'd like to examine ai_addr and ai_next? They are pointers to other structs in user-land memory, so they would need to be copyin()d first. This will mean your script has multiple copyin()s. (And for getaddrinfo() we'd also need to do this on the "return" probe, after they have been populated.)

Argument Stability

The stability for arguments should follow the stability of the probe names, as discussed in my previous post. In summary: since this is the pid provider and is exposing application functions, nothing is guaranteed to be stable. This means that while your script may work fine on one version of an application, it may not work at all on the next since the arguments to functions may have changed. And, they could change in subtle ways (silent failure).

Retesting pid provider scripts on new software versions is necessary to ensure you are looking at what you think you are. Also, keep an eye out for the addition of USDT providers to the application - which should be designed to present stable arguments.

Revalidate pid provider scripts on new software versions

Picking library functions or public API functions should be the safest, if you must use pid provider. Even so, you could run into trouble. The above script that traced getaddrinfo() sounds pretty stable. However, take a look at how the addrinfo struct is defined on Solaris:

struct addrinfo {
       int              ai_flags;      /* AI_PASSIVE, AI_CANONNAME,
                                         AI_NUMERICHOST, AI_NUMERICSERV
                                         AI_V4MAPPED, AI_ALL,
                                         AI_ADDRCONFIG */
       int              ai_family;     /* PF_xxx */
       int              ai_socktype;   /* SOCK_xxx */
       int              ai_protocol;   /* 0 or IPPROTO_xxx for IPv4 & IPv6 */
       socklen_t        ai_addrlen;    /* length of ai_addr */
       char             *ai_canonname; /* canonical name for nodename */
       struct sockaddr  *ai_addr;      /* binary address */
       struct addrinfo  *ai_next;      /* next structure in linked list */
};

The last three arguments are in a different order. Since the D script declares this struct, then you are informing DTrace on Solaris of the wrong structure, and dereferencing ai_addr or ai_canonname will result in the wrong data (they are switched).

Shedding New Light

As is the case with other tools, new observability leads to new discoveries. With the pid provider I'm tripping over discoveries all the time. Earlier when I was looking for simple functions to start with (and returned to using strlen()), I also tried printf(), as called by the bash history built-in. i.e., this:

$ history
  591  hash
  592  type ls
  593  ls

DTrace shows the format string printf() is using:

# dtrace -n 'pid$target:libc:printf:entry { trace(copyinstr(arg0)); }' -p 809
dtrace: description 'pid$target:libc:printf:entry ' matched 1 probe
  1  65354                     printf:entry   %5d%c %s%s
  1  65354                     printf:entry   %5d%c %s%s
  1  65354                     printf:entry   %5d%c %s%s

Oh. What's the "%c" for? I didn't realize there was another column in there.

While this is an interesting aside, I've had many similar accidental discoveries that were very relevant for solving application issues.

Summary

The pid provider can examine arguments to user-land functions, which can be used to understand exactly what an application is processing. Since DTrace runs in kernel context, pointers to user-land such as strings and structs must be copyinstr()d and copyin()d first. If needed, structs can be defined in D scripts, or the C preprocessor may allow header files to be #included. Another useful DTrace feature is the aggregation functions, which can be used to summarize this data in practical ways.

As with the probe names, these pid provider arguments are not considered a stable interface. Scripts written for one software version may not work on another. While the observability is great, if you are going to use pid provider scripts frequently, you'll need to revalidate and update them for new versions of the target software.

Brendan Gregg's Blog