valgrind4hpc User Guide

valgrind is a tool for detecting memory, threading, and other errors in programs at runtime.

valgrind4hpc is a tool for running HPC jobs under valgrind. Using a single interface, it launches each rank of an HPC job under the supervision of valgrind. It then collects the results of each valgrind instance and merges them into a single report.

A Quick Example

Let’s check a multi-rank MPI app for memory leaks. One basic way to do this, without valgrind4hpc, is to simply launch the job with a job launcher, but under the control of valgrind.

$ srun -n2 valgrind --leak-check=full ./mpi_mem_leak
==37455== Memcheck, a memory error detector
==37456== Memcheck, a memory error detector
==37456== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==37456== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==37456== Command: ./mpi_mem_leak
==37456==
==37455== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==37455== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==37455== Command: ./mpi_mem_leak
==37455==
==37455==
==37456==
==37456== HEAP SUMMARY:
==37456==     in use at exit: 126,889 bytes in 597 blocks
==37456==   total heap usage: 846 allocs, 249 frees, 2,039,125 bytes allocated
==37456==
==37455== HEAP SUMMARY:
==37455==     in use at exit: 124,841 bytes in 596 blocks
==37455==   total heap usage: 859 allocs, 263 frees, 2,043,501 bytes allocated
==37455==
==37456== 2,048 bytes in 1 blocks are definitely lost in loss record 595 of 597
==37455== LEAK SUMMARY:
==37455==    definitely lost: 0 bytes in 0 blocks
==37455==    indirectly lost: 0 bytes in 0 blocks
==37455==      possibly lost: 0 bytes in 0 blocks
==37455==    still reachable: 124,841 bytes in 596 blocks
==37455==         suppressed: 0 bytes in 0 blocks
==37455== Reachable blocks (those to which a pointer was found) are not shown.
==37455== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37455==
==37455== For lists of detected and suppressed errors, rerun with: -s
==37455== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==37456==    at 0x403468B: malloc (vg_replace_malloc.c:393)
==37456==    by 0x201BBA: main (mpi_mem_leak.c:41)
==37456==
==37456== LEAK SUMMARY:
==37456==    definitely lost: 2,048 bytes in 1 blocks
==37456==    indirectly lost: 0 bytes in 0 blocks
==37456==      possibly lost: 0 bytes in 0 blocks
==37456==    still reachable: 124,841 bytes in 596 blocks
==37456==         suppressed: 0 bytes in 0 blocks
==37456== Reachable blocks (those to which a pointer was found) are not shown.
==37456== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37456==
==37456== For lists of detected and suppressed errors, rerun with: -s
==37456== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

With only two ranks, the valgrind output is already huge. One can imagine that it would only get worse with 20, 200, or 2000 ranks.

Taking some time to digest the valgrind output, one can see that it indicates a memory leak. After taking even more time to sift through it, one can notice that the memory leak is only on one rank, but the output has no indication of which one.

These are the problems that valgrind4hpc solves. Even with 64 ranks, the output is short and indicates exactly where the memory leak is, including the offending rank. The reports of the other non-leaky ranks are all grouped together into one output section.

$ valgrind4hpc -n 64 --valgrind-args="--leak-check=full" ./mpi_mem_leak

RANKS: <1>

2,048 bytes in 1 blocks are definitely lost
  at malloc (in vg_replace_malloc.c:393)
  by main (in mpi_mem_leak.c:41)


RANKS: <0,2..63>

HEAP SUMMARY:
  in use at exit: 0 bytes in 0 blocks

All heap blocks were freed -- no leaks are possible


RANKS: <1>

HEAP SUMMARY:
  in use at exit: 2048 bytes in 1 blocks

LEAK SUMMARY:
   definitely lost: 2048 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 0 bytes in 0 blocks

ERROR SUMMARY: 0 errors from 0 contexts (suppressed 595)

Now we can clearly see that the offending rank is number 1 and the rest of the ranks are fine.

More Complex Job Launches and Options

valgrind4hpc has options that offer more control over the job launch and the resulting output.

--: Pass Arguments to the Application

To pass arguments to the underlying application, separate the arguments with -- and put application command line arguments after it.

For example, an application that launches a thread for each argument on the command line:

$ valgrind4hpc ./helgrind_example -- my_first_thread my_second_thread
Thread 1: top of stack near 0x19698ea0; argv_string=my_first_thread
Joined with thread 1; returned value was MY_FIRST_THREAD
Thread 2: top of stack near 0x2bb91ea0; argv_string=my_second_thread
Joined with thread 2; returned value was MY_SECOND_THREAD

RANKS: <0>

HEAP SUMMARY:
  in use at exit: 0 bytes in 0 blocks

All heap blocks were freed -- no leaks are possible

ERROR SUMMARY: 0 errors from 0 contexts (suppressed 0)

--launcher-args: Pass Arguments to the WLM

To pass WLM-specific arguments to the underlying job launcher, use --launcher-args.

For example, to launch 4 ranks across 2 nodes on a Slurm system:

$ valgrind4hpc -n4 --launcher-args="-N2" ./mpi_mem_leak

squeue shows that it did indeed use 2 nodes:

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1154796  allnodes valgrind     user R        0:06      2 node[0004-0005]

--from-ranks: Filter Out Results from Certain Ranks

One can use --from-ranks if one is only interested in the results of a subset of a job’s ranks. The option supports syntax to specify multiple blocks of contiguous ranks.

For example, launching 10 ranks but only displaying results from 0, 1, 2 and 4.

$ valgrind4hpc -n10 --from-ranks=0-2,4 ./mpi_mem_leak

RANKS: <1>

2,048 bytes in 1 blocks are definitely lost
  at malloc (in vg_replace_malloc.c:393)
  by main (in mpi_mem_leak.c:41)


RANKS: <0,2,4>

HEAP SUMMARY:
  in use at exit: 0 bytes in 0 blocks

All heap blocks were freed -- no leaks are possible


RANKS: <1>

HEAP SUMMARY:
  in use at exit: 2048 bytes in 1 blocks

LEAK SUMMARY:
   definitely lost: 2048 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 0 bytes in 0 blocks

ERROR SUMMARY: 0 errors from 0 contexts (suppressed 595)

Using valgrind Tools Other than memcheck

While valgrind is most commonly used for detecting memory leaks, valgrind is actually the name of the generic framework that the memory leak tool, called memcheck, is built upon. valgrind includes other tools for detecting other kinds of errors, and valgrind4hpc supports the usage of these extra tools.

valgrind4hpc supports valgrind tools other than memcheck with the --tool command line option. For example, one can use helgrind, which detects thread programming errors.

This program has a race condition where two threads try to write to the global variable global:

$ valgrind4hpc -n2 --tool=helgrind ./helgrind_example -- one two

RANKS: <0,1>

Possible data race during write of size 4 at 0x205144 by thread #3
  at thread_start (in helgrind_example.c:39)
  by mythread_wrapper (in hg_intercepts.c:406)
  by start_thread (in ./helgrind_example)
  by clone (in ./helgrind_example)
  by thread_start (in helgrind_example.c:39)
  by mythread_wrapper (in hg_intercepts.c:406)
  by start_thread (in ./helgrind_example)
  by clone (in ./helgrind_example)
Address is 0 bytes inside data symbol "global"

RANKS: <0,1>

HEAP SUMMARY:
  in use at exit: 0 bytes in 0 blocks

All heap blocks were freed -- no leaks are possible

ERROR SUMMARY: 1 errors from 1 contexts (suppressed 71)

Suppressions

Sometimes, valgrind detects errors in e.g. a shared library that is not directly related to the program one is debugging with valgrind at the moment. valgrind has the notion of suppressions to remove errors that come from these locations in order to clean up the valgrind output.

Some previous examples had a (suppressed *) at the end of the valgrind report. This is because valgrind4hpc has built in suppressions that automatically filter out memory errors in standard HPC libraries like MPI and SHMEM.

valgrind4hpc also supports custom suppression files. The format is identical to valgrind suppression files.

If valgrind4hpc outputs an error in an external library or in a location that one would otherwise want to suppress, one can use valgrind4hpc’s --gen-suppressions option to generate suppressions in the valgrind4hpc output. The generated suppressions should be saved into a file. Alternatively, --gen-suppressions-file=<file> can be used to create a suppression file in a single step.

To use suppressions, one can use the --suppressions=<file> argument. For multiple suppression files, the option can be used more than once.

For example:

$ valgrind4hpc -n2 --gen-suppressions=yes ./mpi_mem_leak

RANKS: <1>

2,048 bytes in 1 blocks are definitely lost
  at malloc (in vg_replace_malloc.c:393)
  by main (in mpi_mem_leak.c:41)

... normal valgrind4hpc output omitted ...

GENERATED SUPPRESSIONS:

{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:main
}

$ valgrind4hpc -n2 --gen-suppressions-file=leak_suppressions.txt ./mpi_mem_leak

RANKS: <1>

2,048 bytes in 1 blocks are definitely lost
  at malloc (in vg_replace_malloc.c:393)
  by main (in mpi_mem_leak.c:41)

... normal valgrind4hpc output omitted ...

$ cat ./leak_suppressions.txt

{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:main
}

$ valgrind4hpc -n2 --suppressions=./leak_suppressions.txt ./mpi_mem_leak

RANKS: <0,1>

HEAP SUMMARY:
  in use at exit: 0 bytes in 0 blocks

All heap blocks were freed -- no leaks are possible

ERROR SUMMARY: 0 errors from 0 contexts (suppressed 1191)

vgdb

vgdb is a way to stop valgrind after a number of errors and attach to the program under valgrind in gdb. valgrind4hpc offers a similar workflow where the application is stopped and can be debugged with gdb4hpc.

Use the --vgdb valgrind4hpc argument to have valgrind4hpc stop after a certain amount of errors and print instructions on how to attach to the running app with gdb4hpc.

$ valgrind4hpc -n2 --vgdb-error=1 ./mpi_mem_leak

RANKS: <0,1>

Entered VGDB mode. To attach to with GDB4hpc: start GDB4hpc and run
  attach $a --vgdb=/tmp/cti_daemonjuWtME/vgdb-commands 1155209.0

In gdb4hpc:

$ gdb4hpc -s
dbg all> attach $a --vgdb=/tmp/cti_daemonjuWtME/vgdb-commands 1155209.0
0/2 ranks connected... (timeout in 299 seconds)
2/2 ranks connected.
Created network...
Connected to application...
Current rank location:
a{0}: #0  0x00000000082d2c8a in clock_nanosleep@GLIBC_2.2.5
a{0}: #1  0x00000000082d89e3 in nanosleep
a{0}: #2  0x00000000082d88fa in sleep
a{0}: #3  0x0000000000201c92 in main at mpi_mem_leak.c:60

a{1}: #0  0x0000000007172419 in MPIDI_SHMI_progress
a{1}: #1  0x0000000005b59a59 in MPIR_Wait_impl.part.0
a{1}: #2  0x00000000069531f6 in MPIC_Wait
a{1}: #3  0x0000000006966bbc in MPIC_Sendrecv
a{1}: #4  0x00000000068700b2 in MPIR_Barrier_intra_dissemination
a{1}: #5  0x0000000004ecabf1 in MPIR_Barrier_intra_auto
a{1}: #6  0x0000000004ecad18 in MPIR_Barrier_impl
a{1}: #7  0x0000000006b5ee0b in MPIR_CRAY_Barrier
a{1}: #8  0x0000000004ecade0 in MPIR_Barrier
a{1}: #9  0x0000000006b55362 in MPIDI_Cray_shared_mem_coll_opt_cleanup
a{1}: #10 0x0000000006998dc5 in MPIDI_Cray_coll_finalize
a{1}: #11 0x0000000006ceffbf in MPID_Finalize
a{1}: #12 0x000000000537778e in PMPI_Finalize
a{1}: #13 0x0000000000201c97 in main at mpi_mem_leak.c:63