Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Solaris 11 DTrace syscall Provider Changes

09 Nov 2011

I originally posted this at http://dtrace.org/blogs/brendan/2011/11/09/solaris-11-dtrace-syscall-provider-changes.

Oracle Solaris 11 dropped many commonly used probes from the DTrace syscall provider, a disappointing side-effect of some code refactoring in the system call trap table (PSARC 2010/441 "delete obsolete system call traps"). This breaks a lot of scripts and one liners, including many that are used to teach beginners DTrace. Functionality is still (I think) possible, albeit by learning trap table mappings and tracing those. Given how commonly used and taught the syscall provider is, this is not a minor bug or nit (as other providers have had), rather it's the biggest regression in DTrace's history.

In this post I'll explain the changes by showing what happened to the syscall::open:entry probe. For a summary of the affected probes and necessary changes, see the New System Calls and Deleted System Calls lists. This only affects Oracle Solaris 11, all other related operating systems (including Solaris 10, Illumos and SmartOS) remain as before.

Solaris 10

This one-liner traces open() syscalls in Solaris 10, showing process and file names:

# dtrace -n 'syscall::open:entry { printf("%s %s", execname, copyinstr(arg0)); }'

It follows the open() syscall as defined by the POSIX standard and the open(2) man page:

     int open(const char *path, int oflag, /* mode_t mode */);

Mapping this to the one-liner is straighforward, easy and intuitive. It's commonly introduced to beginners learning DTrace.

Here's another example. This is the openat() syscall, which was standardized in POSIX.1-2008:

# dtrace -n 'syscall::openat:entry { printf("%s %s", execname, copyinstr(arg1)); }'

It follows the synopsis:

     int openat(int fildes, const char *path, int oflag, /* mode_t mode */);

This time arg1, not arg0, refers to the pathname.

While this is still straightforward, there were some syscall provider probes that were not, and were documented in the DTrace guide. The syscall provider isn't actually a stable interface, and didn't match exactly the POSIX syscall names. While this was a minor nuisance at times, the syscall provider generally did work as expected: tracing syscalls.

Oracle Solaris 11

Tracing only open() in Oracle Solaris 11 may not be possible. You are supposed to use this instead:

# dtrace -n 'syscall::openat:entry { printf("%s %s", execname, copyinstr(arg1)); }'

On Oracle Solaris 11, this traces both the open() and openat() syscalls. On Solaris 10, this traces just openat(). And if you try to trace open() on Oracle Solaris 11, you get an error:

# dtrace -n 'syscall::open:entry { printf("%s %s",execname,copyinstr(arg0)); }'
dtrace: invalid probe specifier syscall::open:entry { printf("%s %s",execname,copyinstr(arg0)); }:
probe description syscall::open:entry does not match any probes

While open() is still a supported syscall in Oracle Solaris 11 (it needs to be for POSIX), it's no longer present in the DTrace syscall provider, making the provider not work as one may expect.

The syscall provider isn't showing the POSIX defined syscall interface, it's exposing the function names in the syscall trap table, as defined in uts/common/os/sysent.c. In fact, it always has.

The syscall trap table was an attractive location to instrument, as all syscalls could be caught from one place and with similar context. The down side was that the trap table names didn't match the POSIX syscall names exactly. Oracle Solaris 11 stretches the difference much wider and much more noticeable. What was a minor interface bug is now an eyesore, and what was once the basis for learning DTrace becomes a pitfall into Solaris internals.

It's not all bad news. The advantage for the DTrace user is that the above one-liner is more powerful: you won't miss an openat() if you just trace open(). However, you may still miss the 64-bit file offset transitional interface calls: open64() and openat64(), which are now both traced using the single syscall::openat64:entry probe (syscall::open64:entry is gone too).

Making the change

The Oracle Solaris 11 syscall provider changes are listed here in the New and Deleted System Call sections. Other syscalls affected included: chmod(), creat(), mkdir(), readlink(), rename(), rmdir(), stat(), symlink() and unlink(). To see what syscall probes are present, either cat /etc/name_to_sysnum or "dtrace -ln 'syscall:::entry'".

Fortunately, it should only take minutes to update scripts and one-liners to match Solaris 11. The DTraceToolkit has already been updated (by Oracle), and will be shipped in /usr/dtrace/DTT (thanks!). Jim and I are working on the errata for the DTrace book. I assume all other sources of DTrace documentation will be noting the Oracle Solaris 11 changes as well.

Who was born on 3/04/1965?

I mentioned earlier that it may not be possible to achieve the exact same functionality as Solaris 10, namely, using the syscall provider to trace just the open() syscall and not both open() and openat(). Here's my attempt:

# dtrace -n 'syscall::openat:entry /(int)arg0 == -3041965/ { printf("%s %s", execname,
    copyinstr(arg1)); }'

This traces both open() and openat() as before, but filters (hopefully) just the open() calls by matching the first argument (arg0) to be the value for AT_FDCWD, which is defined to be a so-called "magic" number: 0xffd19553 or -3041965. I'm assuming the latter, as other OSes implement AT_FDCWD in the invalid negative range for file descriptors (Linux uses -100, for example). -3041965 appears, to me, to be a date (3/04/1965), possibly the birthday of the engineer who wrote the code. Such easter eggs date back to the Unix File System (if not FFS or earlier) where the on-disk magic number was the birthday of one of the engineers - either Marshell Kirk McKusick or Bill Joy (incidentally, I wrote a program years ago to search for this on disks: findbill.pl).

I expect open() to map to openat(AT_FDCWD, ...), which will be matched by this one-liner, and in-effect trace open() but not openat() (with a valid file descriptor). But what happens if openat() is explicitly called with AT_FWCWD? This one-liner and approach probably won't work, and it may not be possible to do with the syscall provider alone. Fortunately, it may not really matter: the syscall provider can identify that a type of open() syscall happened, which would be sufficient in most cases, and when not you use the DTrace pid provider to see what actual syscall was used.

Instead of hardcoding -3041965 in the one-liner, it would be better to type "AT_FDCWD", but this constant is unknown to DTrace.

So, G'Day -3041965, whoever you are!

Why are we even here?

This change wasn't implemented for the possible DTrace advantage mentioned above: that was a side effect. This was housekeeping to eliminate some duplicate code, as stated in the case title: PSARC 2010/441 "delete obsolete system call traps".

Since open() and openat() have duplicated functionality, you may assume that this eliminates hundreds of duplicate lines. This is not the case. From uts/common/syscall/open.c:

int
open(char *path, int fmode, int cmode)
{
        return (openat(AT_FDCWD, path, fmode, cmode));
}

Any duplication here was already eliminated long ago, with open() a wrapper to openat().

I think this is just about duplication in the syscall trap table. (I can't check myself, the Oracle Solaris 11 code has not yet been released.) This how the code looked before (uts/common/os/sysent.c):

struct sysent sysent32[NSYSCALL] =
[...]
        /*  5 */ SYSENT_CI("open",              open32,         3),
[...]
        /* 68 */ SYSENT_CI("openat",            openat32,       4),

By reducing duplicates, the "open" and "openat" lines can become just one line, "openat".

Given the desire to do this, and the risk to the syscall provider, there were at least four options:

    A) Do nothing. The pain for DTrace not justifying the gain of reducing trap table duplication.

    B) Reduce duplication partially, in a way that doesn't hurt DTrace but actually helps it. Eg, can the "transitional" lf64(5) extensions, including open64(), be folded into the POSIX syscalls? This would reduce duplication while improving how well the syscall provider matches POSIX.

    C) Reduce duplication more, in a way that hurts DTrace.

    D) Reduce duplication more, and rework the syscall provider so that it either doesn't become worse or becomes better (which may include moving probes out of the trap table).

Oracle picked (C). It wasn't presented for discussion with the DTrace community (dtrace-discuss), and the PSARC case was private (as Oracle shut the doors on these since 2010/329). There could be additional reasons for this choice that have yet to be made public.

truss -ftopen

While DTrace's syscall::open has become syscall::openat, the truss(1) utility has been changed to stick to the POSIX interface despite the Solaris 11 changes. It will report openat(AT_FDCWD, ...) as open(), for example. To see what's really happening, you can use the -x option; from the man page:

Displays the arguments to the specified system calls (if traced by -t) in raw form, usually hexadecimal, rather than symbolically. This is for unredeemed hackers who must see the raw bits to be happy. Default is -x!all.

Also, "truss -ftopen command" still works (a commonly used one-liner for debugging file opens).

This puts truss(1) in a better position for the Solaris 11 changes than DTrace. Indeed, if truss can be modified to match the changes, could DTrace be too?

Summary

DTrace scripts and one-liners that use the syscall provider may need updating for Oracle Solaris 11. Many syscall probes were deleted, and grouped into others of a similar type. In this post I discussed what happened to the syscall::open:entry probe, which in Oracle Solaris 11 is now part of syscall::openat:entry.

While this change breaks many existing DTrace scripts, documentation and tutorials (on Oracle Solaris 11 only), it doesn't break DTrace itself. It's just one provider out of many (albeit the first you usually use), and the rest of DTrace still provides enormous value.

There is some very good news for DTrace on Oracle Solaris 11: the ip, tcp and udp providers (which I created and originally developed) have been integrated (thanks Alan Macguire and all who helped!). The iscsi, cpc and kerberos providers are there too.