aboutsummaryrefslogtreecommitdiff
Demonstrations of nfsslower, the Linux eBPF/bcc version.

nfsslower show NFS reads, writes, opens and getattrs, slower than a
threshold. For example:

./nfsslower.py
Tracing NFS operations that are slower than 10 ms
TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
11:25:16 dd             21295  W 1048576 15360      14.84 1.test
11:25:16 dd             21295  W 1048576 16384      12.73 1.test
11:25:16 dd             21295  W 1048576 17408      24.27 1.test
11:25:16 dd             21295  W 1048576 18432      22.93 1.test
11:25:16 dd             21295  W 1048576 19456      14.65 1.test
11:25:16 dd             21295  W 1048576 20480      12.58 1.test
11:25:16 dd             21297  W 1048576 6144       10.50 1.test.w
11:25:16 dd             21297  W 1048576 7168       16.65 1.test.w
11:25:16 dd             21297  W 1048576 8192       13.01 1.test.w
11:25:16 dd             21297  W 1048576 9216       14.06 1.test.w

This shows NFS writes from dd each 1MB in size to 2 different files. The
writes all had latency higher than 10ms.

This "latency" is measured from when the operation was issued from the VFS
interface to the file system, to when it completed. This spans everything:
RPC latency, network latency, file system CPU cycles, file system locks, run
queue latency, etc. This is a better measure of the latency suffered by
applications reading from a NFS share and can better expose problems
experienced by NFS clients.

Note that this only traces the common NFS operations (read,write,open and
getattr). I chose to include getattr as a significant percentage of NFS
traffic end up being getattr calls and are a good indicator of problems
with an NFS server.

The threshold can be provided as an argument. E.g. I/O slower than 1 ms:

./nfsslower.py 1
Tracing NFS operations that are slower than 1 ms
TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
11:40:16 cp             21583  R 131072  0           4.35 1.test
11:40:16 cp             21583  R 131072  256         1.87 1.test
11:40:16 cp             21583  R 131072  384         2.99 1.test
11:40:16 cp             21583  R 131072  512         4.19 1.test
11:40:16 cp             21583  R 131072  640         4.25 1.test
11:40:16 cp             21583  R 131072  768         4.65 1.test
11:40:16 cp             21583  R 131072  1280        1.08 1.test
11:40:16 cp             21583  R 131072  1408        3.29 1.test
11:40:16 cp             21583  R 131072  1792        3.12 1.test
11:40:16 cp             21583  R 131072  3712        3.55 1.test
11:40:16 cp             21583  R 131072  3840        1.12 1.test
11:40:16 cp             21583  R 131072  4096        3.23 1.test
11:40:16 cp             21583  R 131072  4224        2.73 1.test
11:40:16 cp             21583  R 131072  4352        2.73 1.test
11:40:16 cp             21583  R 131072  4480        6.09 1.test
11:40:16 cp             21583  R 131072  5120        4.40 1.test
[...]

This shows all NFS_READS that were more than 1ms. Depending on your
latency to your fileserver, you might need to tweak this value to
remove 

A threshold of 0 will trace all operations. Warning: the output will be
verbose, as it will include all file system cache hits.

./nfsslower.py 0
Tracing NFS operations
11:56:50 dd             21852  W 1048576 0           0.42 1.test
11:56:50 dd             21852  W 1048576 1024        0.46 1.test
11:56:50 dd             21852  W 1048576 2048        0.36 1.test
11:56:50 cp             21854  G 0       0           0.35 1.test
11:56:50 cp             21854  O 0       0           0.33 1.test
11:56:50 cp             21854  G 0       0           0.00 1.test
11:56:50 cp             21854  R 131072  0           0.07 1.test
11:56:50 cp             21854  R 131072  128         0.02 1.test
11:56:50 cp             21854  R 131072  256         0.02 1.test
11:56:50 cp             21854  R 131072  384         0.02 1.test
11:56:50 cp             21854  R 131072  512         0.02 1.test
11:56:50 cp             21854  R 131072  640         0.02 1.test
11:56:50 cp             21854  R 131072  768         0.02 1.test
11:56:50 cp             21854  R 131072  896         0.02 1.test
11:56:50 cp             21854  R 131072  1024        0.02 1.test
11:56:50 cp             21854  R 131072  1152        0.02 1.test
11:56:50 cp             21854  R 131072  1280        0.02 1.test
11:56:50 cp             21854  R 131072  1408        0.02 1.test
11:56:50 cp             21854  R 131072  1536        0.02 1.test
11:56:50 cp             21854  R 131072  1664        0.02 1.test
11:56:50 cp             21854  R 131072  1792        0.02 1.test
11:56:50 cp             21854  R 131072  1920        0.02 1.test
11:56:50 cp             21854  R 131072  2048        0.02 1.test
11:56:50 cp             21854  R 131072  2176        0.04 1.test
11:56:50 cp             21854  R 131072  2304        0.02 1.test
11:56:50 cp             21854  R 131072  2432        0.03 1.test
11:56:50 cp             21854  R 131072  2560        0.03 1.test
11:56:50 cp             21854  R 131072  2688        0.02 1.test
11:56:50 cp             21854  R 131072  2816        0.03 1.test
11:56:50 cp             21854  R 131072  2944        0.02 1.test
11:56:50 cp             21854  R 0       3072        0.00 1.test
11:56:50 ls             21855  G 0       0           0.00 1.test
11:56:50 ls             21856  G 0       0           0.36 music
11:56:50 ls             21856  G 0       0           0.00 music
11:56:50 ls             21856  G 0       0           0.00 test
11:56:50 ls             21856  G 0       0           0.00 ff
11:56:50 ls             21856  G 0       0           0.00 34.log
11:56:50 ls             21856  G 0       0           0.00 vmlinuz-linux
11:56:50 ls             21856  G 0       0           0.00 2.test
11:56:50 ls             21856  G 0       0           0.00 rt.log
11:56:50 ls             21856  G 0       0           0.00 1.lod
11:56:50 ls             21856  G 0       0           0.00 COPYRIGHT.txt
11:56:50 ls             21856  G 0       0           0.00 gg
11:56:50 ls             21856  G 0       0           0.00 qw.log
11:56:50 ls             21856  G 0       0           0.00 README.md
11:56:50 ls             21856  G 0       0           0.00 1.log

The output now includes open operations ("O"), and reads ("R") wand getattrs ("G").
A cp operation


A -j option will print just the fields (parsable output, csv):

./nfsslower.py -j 0
ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
87054476520,dd,22754,W,1048576,0,425,1.test
87054482916,dd,22754,W,1048576,1048576,320,1.test
87054488179,dd,22754,W,1048576,2097152,389,1.test
87054511340,cp,22756,G,0,0,371,1.test
87054511685,cp,22756,O,0,0,306,1.test
87054511700,cp,22756,G,0,0,2,1.test
87054512325,cp,22756,R,131072,0,56,1.test
87054512432,cp,22756,R,131072,131072,22,1.test
87054512520,cp,22756,R,131072,262144,32,1.test
87054512600,cp,22756,R,131072,393216,21,1.test
87054512678,cp,22756,R,131072,524288,21,1.test
87054512803,cp,22756,R,131072,655360,56,1.test

This may be useful for visualizing with another tool, for example, for
producing a scatter plot of ENDTIME vs LATENCY, to look for time-based
patterns.

USAGE message:

usage: nfsslower.py [-h] [-j] [-p PID] [min_ms]

Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4}

positional arguments:
  min_ms             Minimum IO duration to trace in ms (default=10ms)

optional arguments:
  -h, --help         show this help message and exit
  -j, --csv          just print fields: comma-separated values
  -p PID, --pid PID  Trace this pid only

 ./nfsslower         # trace operations slower than 10ms
 ./nfsslower 1       # trace operations slower than 1ms
 ./nfsslower -j 1    # ... 1 ms, parsable output (csv)
 ./nfsslower 0       # trace all nfs operations
 ./nfsslower -p 121  # trace pid 121 only