sanitizers4hpc User Guide

LLVM Sanitizers is a collection of compile time instrumentation tools that detect things like out of bounds reads and writes, memory leaks, and race conditions.

sanitizers4hpc is a tool for running HPC code instrumented with LLVM Sanitizers. It uses a single interface to collect the sanitizer results of multiple HPC ranks at once.

A Quick Example

Let’s say that you have a multi-rank MPI application that has some ranks somehow corrupting their stacks and crashing. Detecting stack corruption is something that LLVM AddressSanitizer can do, so you instrument your application with AddressSanitizer to try to find the source of the problem.

$ cc -fsanitize=address crashing_app.c -o crashing_app
$ srun -n4 ./crashing_app
=================================================================
=================================================================
==124755==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f9acd400078 at pc 0x0000002cdb59 bp 0x7ffc263083b0 sp 0x7ffc26307b70
==124756==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f293f000078 at pc 0x0000002cdb59 bp 0x7ffc2c6af0b0 sp 0x7ffc2c6ae870
WRITE of size 43 at 0x7f293f000078 thread T0
WRITE of size 43 at 0x7f9acd400078 thread T0
    #0 0x2cdb58 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dff in main (/tmp/crashing_app+0x388dff)
    #2 0x7f9ad515e2bc in __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #3 0x2523c9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

    #0 0x2cdb58 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dff in main (/tmp/crashing_app+0x388dff)
    #2 0x7f2946cc92bc in __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #3 0x2523c9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address 0x7f293f000078 is located in stack of thread T0 at offset 120 in frame
Address 0x7f9acd400078 is located in stack of thread T0 at offset 120 in frame
    #0 0x388b4f in main (/tmp/crashing_app+0x388b4f)

    #0 0x388b4f in main (/tmp/crashing_app+0x388b4f)

  This frame has 5 object(s):
  This frame has 5 object(s):
    [32, 36) 'argc.addr'
    [32, 36) 'argc.addr'
    [48, 56) 'argv.addr'
    [48, 56) 'argv.addr'
    [80, 84) 'myRank' (line 19)
    [80, 84) 'myRank' (line 19)
    [96, 100) 'numProcs' (line 19)
    [96, 100) 'numProcs' (line 19)
    [112, 120) 'stack_smasher' (line 27) <== Memory access at offset 120 overflows this variable
    [112, 120) 'stack_smasher' (line 27) <== Memory access at offset 120 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0
Shadow bytes around the buggy address:
  0x7f9acd3ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd3ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd3ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd3fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd3fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7f9acd400000: f1 f1 f1 f1 04 f2 00 f2 f2 f2 04 f2 04 f2 00[f3]
  0x7f9acd400080: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd400100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd400180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd400200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f9acd400280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
Shadow bytes around the buggy address:
  0x7f293efffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293efffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293efffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293effff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293effff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7f293f000000: f1 f1 f1 f1 04 f2 00 f2 f2 f2 04 f2 04 f2 00[f3]
  0x7f293f000080: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293f000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293f000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293f000200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f293f000280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==124756==ABORTING
==124755==ABORTING

You get some AddressSanitizer output that points to a location in the code. This is helpful, but it’s not the whole story in an HPC context. You need to comb through the output to realize that there are two crashing ranks, and the output gives no indication to which ranks are actually crashing.

AddressSanitizer output wasn’t designed to be printed for multiple applications at the same time. Plus, the fact that the output from the two ranks are interleaved doesn’t help the readability of the report.

sanitizers4hpc solves these problems by collecting the output of LLVM sanitizer instrumented applications run at scale, merging identical error items, and marking the rank locations of the error items.

$ sanitizers4hpc --launcher-args="-n4" ./crashing_app
RANKS: <0-1>
AddressSanitizer: stack-buffer-overflow on address at pc bp sp
WRITE of size 55 thread T0
    #0 0x2cdb58 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dff in main (/tmp/crashing_app+0x388dff)
    #2  __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #3 0x2523c9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address is located in stack of thread T0 at offset 120 in frame
    #0 0x388b4f in main (/tmp/crashing_app+0x388b4f)

  This frame has 5 object(s):
    [32, 36) 'argc.addr'
    [48, 56) 'argv.addr'
    [80, 84) 'myRank' (line 19)
    [96, 100) 'numProcs' (line 19)
    [112, 120) 'stack_smasher' (line 27) <== Memory access at offset 120 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0

The multiple buffer overflow errors are condensed into one error and they are labelled with their source ranks.

Building Applications With Sanitizers

Before using sanitizers4hpc, the application under investigation must be built with LLVM Sanitizers support. sanitizers4hpc supports three sanitizers included with Cray CCE and GNU CCC.

  • AddressSanitizer - Detects memory access errors (and more). Compile with -fsanitize=address.

  • LeakSanitizer - Detects memory leaks. Compile with -fsanitize=leak.

  • ThreadSanitizer - Detects thread programming errors, like race conditions. Compile with -fsanitize=thread.

Adding debug into with -g will make the output of all sanitizers more informative.

Using the Cray CCE compiler to compile a program with AddressSanitizer:

$ cc -g -fsanitize=address crashing_app.c -o crashing_app

Instrumentation Detection

sanitizers4hpc will detect if your binary was compiled with sanitizer support or not:

$ sanitizers4hpc ./crashing_app
The binary at ./crashing_app was detected not to have Clang Santizers support compiled in.
Rebuild the target application with Address or Leak Sanitizer support (refer to your
compiler documentation for more details) If your application was built with Clang
Sanitizers support, rerun sanitizers4hpc with the `-f` flag to bypass this check

Use the -f or --force-clang-san flag to force sanitizers4hpc to run the application, even if it thinks the application is not instrumented.

$ sanitizers4hpc -f ./crashing_app
srun: error: node0006: task 0: Segmentation fault

More Complex Job Launches and Options

--launcher-args: Pass Arguments to the WLM

sanitizers4hpc uses the system job launcher to start jobs. One can use the -l or --launcher-args option to pass arguments to the system job launcher. This is how to launch a job with multiple ranks or a custom layout. The format of the string passed to -l is WLM specific. For example, on Slurm systems, -l can be populated with anything that srun accepts.

For example, on a Slurm system, launching a two-rank crashing application across two nodes on the partition called “allnodes”:

$ sanitizers4hpc --launcher-args="-n2 -N2 -p allnodes" ./crashing_app
RANKS: <0-1>
AddressSanitizer: stack-buffer-overflow on address at pc bp sp
WRITE of size 55 thread T0
    #0 0x2cdb58 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dff in main (/tmp/crashing_app+0x388dff)
    #2  __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #3 0x2523c9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address is located in stack of thread T0 at offset 120 in frame
    #0 0x388b4f in main (/tmp/crashing_app+0x388b4f)

  This frame has 5 object(s):
    [32, 36) 'argc.addr'
    [48, 56) 'argv.addr'
    [80, 84) 'myRank' (line 19)
    [96, 100) 'numProcs' (line 19)
    [112, 120) 'stack_smasher' (line 27) <== Memory access at offset 120 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0

squeue confirms the intended layout was used:

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1158576  allnodes s4hStart     user CG       0:05      2 node[0004-0005]

--asan-options, --lsan-options, --tsan-options: Pass Environment Options to Sanitizers

Each sanitizer tool is configurable by setting environment variables. For example, AddressSanitizer is configured by setting ASAN_OPTIONS.

sanitizers4hpc respects options set in the environment, and also offers ways to set options for a specific job launch.

Use -a or --asan-options to set ASAN_OPTIONS for a job, -o or --lsan-options to set LSAN_OPTIONS, and -t or --tsan-options to set TSAN_OPTIONS.

See the documentation for the specific sanitizer tool for settings.

Setting ASAN_OPTIONS:

$ sanitizers4hpc -l"-n2" --asan-options="print_scariness=1" ./crashing_app
RANKS: <0-1>
AddressSanitizer: stack-buffer-overflow on address at pc bp sp
WRITE of size 55 thread T0
SCARINESS: 60 (multi-byte-write-stack-buffer-overflow)

Filtering

The -r or --errors-from option can be used to restrict report results. The supplied regular expression will be matched against source file names, function names, binary names, and library names. To use multiple patterns, use the -r option multiple times. When using multiple patterns, results matching at least one pattern will be shown.

Two errors, one in function “foo” and one in function “bar”:

$ sanitizers4hpc -l"-n2" ./crashing_app
RANKS: <1>
AddressSanitizer: stack-buffer-overflow on address 0x7f592c000068 at pc 0x0000002cdbc9 bp 0x7ffcd22b1610 sp 0x7ffcd22b0dd0
WRITE of size 52 at 0x7f592c000068 thread T0
    #0 0x2cdbc8 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dd7 in bar /tmp/crashing_app.c:28:5
    #2 0x38911d in main /tmp/crashing_app.c:41:3
    #3 0x7f5933ca62bc in __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #4 0x252439 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address 0x7f592c000068 is located in stack of thread T0 at offset 40 in frame
    #0 0x388cff in bar /tmp/crashing_app.c:25

  This frame has 1 object(s):
    [32, 40) 'stack_smasher' (line 27) <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0

RANKS: <0>
AddressSanitizer: stack-buffer-overflow on address 0x7f215c100028 at pc 0x0000002cdbc9 bp 0x7ffe6c30f430 sp 0x7ffe6c30ebf0
WRITE of size 52 at 0x7f215c100028 thread T0
    #0 0x2cdbc8 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388c97 in foo /tmp/crashing_app.c:21:5
    #2 0x389085 in main /tmp/crashing_app.c:40:3
    #3 0x7f2163e212bc in __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #4 0x252439 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address 0x7f215c100028 is located in stack of thread T0 at offset 40 in frame
    #0 0x388bbf in foo /tmp/crashing_app.c:18

  This frame has 1 object(s):
    [32, 40) 'stack_smasher' (line 20) <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5 in __interceptor_strcpy.part.0

Filtering results to only show results in “bar”:

$ sanitizers4hpc -l"-n2" --errors-from="bar" ./crashing_app
RANKS: <1>
AddressSanitizer: WRITE of size 52 at 0x7f303f300068 thread T0
    #0 0x2cdbc8 in __interceptor_strcpy.part.0 /home/jenkins/compiler-rt/lib/asan/asan_interceptors.cpp:440:5
    #1 0x388dd7 in bar /tmp/crashing_app.c:28:5
    #2 0x38911d in main /tmp/crashing_app.c:41:3
    #3 0x7f30470352bc in __libc_start_main (/lib64/libc.so.6+0x352bc) (BuildId: 28910b266cdd8f0c54c7830b758e4a1339f255c1)
    #4 0x252439 in _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120

Address 0x7f303f300068 is located in stack of thread T0 at offset 40 in frame
    #0 0x388cff in bar /tmp/crashing_app.c:25


RANKS: <0>
AddressSanitizer:

AMD GPU and Nvidia GPU Sanitizers

sanitizers4hpc supports AMD’s AddressSanitizer-like sanitizer and Nvidia’s Compute Sanitizer.

With a properly compiled CUDA application, use -m or --cuda-sanitizer to point sanitizerz4hpc to compute-sanitizer. compute-sanitizer is NVidia’s sanitizer. It comes with every CUDA installation.

$ sanitizers4hpc --cuda-sanitizer=$(which compute-sanitizer) ./nvidia_oob_write
Max error: 2.000000
RANKS: <0>
AddressSanitizer: Invalid __global__ write of size 4 bytes
at 0xe0 in saxpy(int, float, float *, float *)
by thread (0,0,0) in block (1,0,0)
Address 0x7f9193a00210 is out of bounds
and is 1 bytes after the nearest allocation at 0x7f9193a00200 of size 16 bytes
Saved host backtrace up to driver entry point at kernel launch time
    #0 0x2ea4a1 in /usr/lib64/libcuda.so.1
    #1 0x8c2b in __cudart808 /tmp/src/./nvidia_oob_write
    #2 0x643c8 in cudaLaunchKernel /tmp/src/./nvidia_oob_write
    #3 0x44e4 in /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include/cuda_runtime.h:211:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) /tmp/src/./nvidia_oob_write
    #4 0x4383 in /tmp/tmpxft_0001ef9d_00000000-6_nvidia_oob_write.compute_86.cudafe1.stub.c:13:__device_stub__Z5saxpyifPfS_(int, float, float*, float*) /tmp/src/./nvidia_oob_write
    #5 0x43c8 in /tmp/src/nvidia_oob_write.cu:8:saxpy(int, float, float*, float*) /tmp/src/./nvidia_oob_write
    #6 0x4128 in /tmp/src/nvidia_oob_write.cu:31:main /tmp/src/./nvidia_oob_write
    #7 0x352bd in __libc_start_main /lib64/libc.so.6
    #8 0x3e9a in ../sysdeps/x86_64/start.S:122:_start /tmp/src/./nvidia_oob_write

...rest omitted...

AMD’s sanitizer is used in a more similar fashion to the CPU sanitizers via a fsanitize=address compile flag with the AMD ROCm compiler.

See the sanitizers4hpc man page for more details.