Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

iSCSI DTrace Provider

31 Jul 2007

I originally posted this at http://blogs.sun.com/brendan/entry/iscsi_dtrace_provider.

I recently tried out the new iSCSI Target Server DTrace provider which Adam wrote (which should be included in Solaris sometime soon). It provides observability of iSCSI traffic from the the server's perspective. This is great news if you have an iSCSI Solaris server, or are planning to have one.

In this entry I'll post a few screenshots of the iSCSI provider in action, both one-liners and scripts.

One-liners

I'll start by listing the probes – which may convey a sense of what is possible to trace. Look down the "NAME" column:

# dtrace -ln 'iscsi*:::'
   ID    PROVIDER            MODULE                          FUNCTION NAME
11696 iscsi100653         iscsitgtd                 send_async_logout async-send
11697 iscsi100653         iscsitgtd                   send_async_scsi async-send
11698 iscsi100653         iscsitgtd                  handle_scsi_data data-receive
11699 iscsi100653         iscsitgtd              iscsi_conn_data_rqst data-request
11700 iscsi100653         iscsitgtd                   send_datain_pdu data-send
11701 iscsi100653         iscsitgtd            iscsi_handle_login_pkt login-command
11702 iscsi100653         iscsitgtd                 send_login_reject login-response
11703 iscsi100653         iscsitgtd            iscsi_handle_login_pkt login-response
11704 iscsi100653         iscsitgtd                 handle_logout_msg logout-command
11705 iscsi100653         iscsitgtd                        conn_state logout-response
11706 iscsi100653         iscsitgtd                   handle_noop_cmd nop-receive
11707 iscsi100653         iscsitgtd                    iscsi_conn_pkt nop-send
11708 iscsi100653         iscsitgtd                   handle_scsi_cmd scsi-command
11709 iscsi100653         iscsitgtd                     send_scsi_rsp scsi-response
11710 iscsi100653         iscsitgtd                   handle_task_mgt task-command
11711 iscsi100653         iscsitgtd                    iscsi_conn_pkt task-response
11712 iscsi100653         iscsitgtd                   handle_text_msg text-command
11713 iscsi100653         iscsitgtd                   handle_text_msg text-response

The easiest one-liner would be to frequency count iSCSI operations:

# dtrace -n 'iscsi*::: { @[probename] = count(); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C

  nop-receive                                                       3
  nop-send                                                          3
  data-receive                                                     55
  data-request                                                     55
  scsi-response                                                   103
  scsi-command                                                    342
  data-send                                                      2865

The above output shows that the iSCSI server is doing more send operations (2865 data-sends), than receive (55 data-receives).

Now to see who is using this iSCSI server, by frequency counting remote IP addresses:

# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C

  192.168.1.109                                                  1338

The above shows that 192.168.1.108 triggered 1338 iSCSI operations, which may have been sends, receives, nops, etc. Breaking down further is just a little more DTrace:

# dtrace -qn 'iscsi*::: { @[args[0]->ci_remote, probename] = count(); } 
  END { printa("%32s %18s %@16d\\n", @); }'
^C
                   192.168.1.109        nop-receive                2
                   192.168.1.109           nop-send                2
                   192.168.1.109       data-receive               16
                   192.168.1.109       data-request               16
                   192.168.1.109      scsi-response               43
                   192.168.1.109       scsi-command              253
                   192.168.1.109          data-send             2619

Great, while a simple one-liner, it quickly identifies who is connecting and what operations they are doing. Some performance wins are about locating and eliminating unnecessary work, which one-liners such as this help identify.

Now for printing payload bytes by iSCSI operation:

# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C

  data-request                                                      0
  scsi-response                                                     0
  data-receive                                                  99328
  scsi-command                                                 140800
  data-send                                                  20672512

The above one-liner prints total bytes by operation, which does lose some data. What was the count by operation, average, min, max? DTrace can answer all of these seperately, or provide a summary as a distribution plot. Here I've used quantize for a power of 2 plot:

# dtrace -n 'iscsi*::: { @[probename] = quantize(args[2]->ii_datalen); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C

  data-request                                      
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
               1 |                                         0        

  scsi-response                                     
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 55       
               1 |                                         0        

  data-receive                                      
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@@@@                                  2        
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@                            4        
            8192 |@@@@@@@@@@@@@@@@@@@@                     6        
           16384 |                                         0        

  scsi-command                                      
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         209      
               1 |                                         0        
               2 |                                         0        
               4 |                                         0        
               8 |                                         0        
              16 |                                         0        
              32 |                                         0        
              64 |                                         0        
             128 |                                         0        
             256 |                                         0        
             512 |@@@                                      18       
            1024 |@@                                       10       
            2048 |@@                                       11       
            4096 |@                                        4        
            8192 |@@                                       12       
           16384 |                                         0        

  data-send                                         
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |                                         1        
            1024 |                                         16       
            2048 |                                         21       
            4096 |                                         28       
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  2559     
           16384 |                                         0        

The plots above show that for the data-send operation, the payload size was usually in the 8 to 16 Kbyte bucket.

Scripts

Apart from one-lines, DTrace allows powerful scripts to be contstructed. I've put a few on the SolarisInternals DTrace Topics: iSCSI page.

Here I'll post my favourite so far, a script which measures read/write latency (or "response time") in nanoseconds:

# ./iscsirwlat.d 
Tracing... Hit Ctrl-C to end.
^C
Read Latency (ns),

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
           32768 |                                         0        
           65536 |@@@                                      27       
          131072 |@@                                       17       
          262144 |                                         4        
          524288 |@@@@                                     39       
         1048576 |@@@                                      32       
         2097152 |@@@@@@@@@@@@@@@@@                        164      
         4194304 |@                                        11       
         8388608 |@                                        5        
        16777216 |@                                        10       
        33554432 |@                                        7        
        67108864 |@                                        5        
       134217728 |@@@@@                                    46       
       268435456 |@                                        14       
       536870912 |                                         0        

Write Latency (ns),

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
           32768 |                                         0        
           65536 |@@                                       14       
          131072 |@@                                       16       
          262144 |@@@                                      20       
          524288 |@@@                                      20       
         1048576 |@@@@@                                    33       
         2097152 |@                                        5        
         4194304 |                                         0        
         8388608 |                                         2        
        16777216 |@                                        10       
        33554432 |@@@                                      18       
        67108864 |@@@@@                                    33       
       134217728 |@@@@@@@@@@@@@                            93       
       268435456 |@@                                       12       
       536870912 |                                         0        

The script is on the DTrace Topics page. It is a glimpse of things to come; how many of those were synchronous requests (which may hurt application performance) versus asynchronous flushing or read ahead (which may not hurt performance)? More DTrace is needed, which will probably include running DTrace on the client side as well...