valgrind4hpc Man Page
Valgrind4hpc
SYNOPSIS
valgrind4hpc [VALGRIND4HPC OPTIONS] [-n number of ranks ] executable [– [EXECUTABLE ARGUMENTS]]
DESCRIPTION
Valgrind4hpc is a Valgrind-based debugging tool to aid in the detection of memory leaks and errors in parallel applications. Valgrind4hpc aggregates any duplicate messages across ranks to help provide an understandable picture of program behavior. Valgrind4hpc manages starting and redirecting output from many copies of Valgrind, as well as deduplicating and filtering Valgrind messages. If your program can be debugged with Valgrind, it can be debugged with Valgrind4hpc.
OPTIONS
-n, –num-ranks=RANKS
Run with this number of ranks (default 1)
-t, –tool=TOOL
Run memcheck, helgrind, exp-sgcheck, or drd with this tool on the
backend (default is memcheck)
-l, –launcher-args=”arguments”
arguments to the application launcher (WLM-specific)
-i, –inputfile=FILE
Use file as input to all ranks
-o, –outputfile=FILE
Specify an output file
-s, –suppressions=FILE
Specify a suppression file (can specify this argument multiple times)
–gen-suppressions=[yes|no]
Generate Valgrind suppressions for this job run. By default, print
output to standard error at the end of the job. To specify a file,
set –gen-suppressions-file.
–gen-suppressions-file=FILE
Write generated suppressions to this file. If –gen-suppressions is
not set, this option will enable it.
-v, –valgrind-args=”arguments”
Specify non-Valgrind4hpc-supported Valgrind arguments such as
–show-leak-kinds or –leak-check. Note that certain arguments may
interfere with the functionality of Valgrind4hpc.
-r, –from-ranks=<ranks>
Only show Valgrind output from certain ranks. Format: “a-c,i-k” to
show output from ranks “a” through “c” and “i” through “k”.
-g, –vgdb-error=<count>
Start VGDB mode upon encountering this number of errors. Valgrind4hpc
will print connection instructions to run on the node for the target
rank. Direct SSH access to nodes is required for interactive GDB
debugging.
–cray-pmi=OPTION
ALPS only: disable automatic MPI check and manually specify whether
the target application is a parallel MPI program. Note: this solves
an ALPS-specific problem and is not necessary on SLURM systems.
Possible values: {yes, no}
-h, –help
Display help text and exit
EXAMPLES
To run the program ./a.out and debug it across 32 ranks on 16 nodes with full Valgrind leak-checking, use the command:
valgrind4hpc -n32 --launcher-args="-N16 -j2" --valgrind-args="--track-origins=yes --leak-check=full" ./a.out -- arg1 arg2
Note that valgrind4hpc and target program arguments should be separated by two dashes, –
FALSE NEGATIVES
Valgrind needs to know what memory-allocating functions to wrap. To this end, Valgrind4hpc informs Valgrind which shared library symbols perform memory-allocating functions. To function correctly, target executables must be built dynamically and contain debug symbols. When calling the Cray compiler, set the environment variable
CRAYPE_LINK_TYPE=dynamic
and compile with the debug symbol flag -g
VALGRIND SUPPRESSIONS
For a full guide, see the Valgrind manual section 2.5: “Suppressing errors” at <http://valgrind.org/docs/manual/valgrind_manual.pdf>.
The Valgrind option –gen-suppressions=yes will automatically generate a suppression file for the currently-running program. If you would like to use this option with Valgrind4hpc, remember to pass it as a custom Valgrind option with –valgrind-args=”–gen-suppressions=yes”
When you have created your custom suppression file, you can pass it to Valgrind4hpc with the –suppressions=filename argument.