Creating a sampling experiment with perftools-lite
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2024-2025 Hewlett Packard Enterprise Development LP.
Setting up RajaPerf (optional)
For reference, at the time of writing these steps set up raja-perf to build with CCE and cray-mpich:
> mkdir build_mpi_cce
> cd build_mpi_cce
> cmake -DENABLE_MPI=On -DMPI_CXX_HEADER_DIR=$MPICH_DIR/include \
-DMPI_CXX_COMPILER=$MPICH_DIR/bin/CC \
-DMPI_libmi.so_LIBRARY=$MPICH_DIR/lib/libmpi.so \
-DCMAKE_CXX_COMPILER=CC -DMPI_CXX_LIB_NAMES=libmi.so ..
... about 75 lines of cmake output ...
Building and running with perftools-lite
Load the perftools-lite module before building the application:
> module load perftools-lite
Build the application:
> make -j
[ 1%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 1%] Building CXX object blt/tests/smoke/CMakeFiles/blt_mpi_smoke.dir/blt_mpi_smoke.cpp.o
[ 2%] Building CXX object tpl/RAJA/tpl/camp/CMakeFiles/camp.dir/src/errors.cpp.o
...
[100%] Built target test-raja-perf-suite.exe
And run it, note that everything before `./bin/raja-perf.exe`
will
be specific to your environment:
> srun -p bardpeak -n 2 --exclusive ./bin/raja-perf.exe -pftol 0.05 -k Apps_HALOEXCHANGE_FUSED
srun: job 9135947 queued and waiting for resources
srun: job 9135947 has been allocated resources
CrayPat/X: Version 24.11.0 Revision 31a512b4d sles15.5_x86_64
10/02/24 19:11:06
... output from the test program ...
#################################################################
# #
# CrayPat-lite Performance Statistics #
# #
#################################################################
CrayPat/X: Version 24.11.0 Revision 31a512b4d sles15.5_x86_64 10/02/24 19:11:06
Experiment: lite lite-samples
Number of PEs (MPI ranks): 2
Numbers of PEs per Node: 2
... the rest of the perftools default report ...
The default report
The printed default report contains a set of tables that give a good basic view of what is going on in you program. It’s contents are described on the gettting started page. The default report
Going further
The perftools-lite module is one of a family of “lite” options that provide sampling enabled for different contexts.
perftools-lite — Default profile
perftools-lite-events — Event profile
perftools-lite-gpu — GPU kernel and data movement along with event profile
perftools-lite-loops — Loop estimates (Cray CCE compiler only)
perftools-lite-hbm — High bandwidth memory data (for X86-64 systems only)
The workflow for these are the same as perftools-lite, they just generate more specialized reports. You can see the man page for more details: perftools-lite
perftools-lite saves the experiment results in the current directory.
It’s filed with the name of the executable with decoration to
distinguish different runs, in this example
`raja-perf.exe+349876-26340827s`
. You can `pat_report`
or
Apprentice to look at these results.