Running a sampling experiment with pat_run
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2024-2025 Hewlett Packard Enterprise Development LP.
Setting up RajaPerf (optional)
For reference, at the time of writing these steps set up raja-perf:
> mkdir build_mpi
> cd build_mpi
> cmake -DENABLE_MPI=On ..
... about 40 lines of cmake output ...
> make -j
[ 1%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 1%] Building CXX object blt/tests/smoke/CMakeFiles/blt_mpi_smoke.dir/blt_mpi_smoke.cpp.o
...
[100%] Built target test-raja-perf-suite.exe
With a bunch of compiler warnings.
Collecting the samples
`pat_run`
is super simple to run, just insert “pat_run” in your
launch line, we add the “-r” option to generate a report:
> srun -p bardpeak -n 2 --exclusive pat_run -r ./bin/raja-perf.exe -pftol 0.05 -k Apps_HALOEXCHANGE_FUSED
WARNING: For optimal performance analysis load the 'perftools-preload' module before linking the executable file.
CrayPat/X: Version 24.11.0 Revision 31a512b4d sles15.5_x86_64 10/02/24 19:11:06
... output from running the application ...
Processing step 1 of 12
Suggested trace options file: /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi/raja-perf.exe+1332189-23322133s/build-options.apa
Processing step 12 of 12
CrayPat/X: Version 24.11.0 Revision 31a512b4d sles15.5_x86_64 10/02/24 19:11:06
Number of PEs (MPI ranks): 2
Numbers of PEs per Node: 2
... The rest of the default report ...
Troubleshooting
The warning about perftools-preload can usually be ignored. It’s not that it can’t improve the results in some cases, it’s more that the improvement is not enough to justify the rebuild - especially if your build isn’t set up to relink without recompiling.
More tips for dealing with programs that do not exectte
The default report
The printed default report contains a set of tables that give a good basic view of what is going on in you program. It’s contents are described on the gettting started page. The default report
Going further
This walkthrough created the default report. `pat_run`
supports
more specialized data collection via the `-m mode-name`
argument, which
accepts these mode names:
lite-events — Event profile
lite-gpu — GPU kernel and data movement along with event profile
lite-hbm — High bandwidth memory data (for X86-64 systems only)
lite-loops — Loop estimates (CCE compiler only)
The modes are described in detail in here: perftools-lite
`pat_run`
saves the experiment result in the current directory.
distinguish different runs, in this example
`raja-perf.exe+349876-26340827s`
. You can `pat_report`
or
Apprentice to look at these results. Apprentice 3
You can also run without the “-r” option to skip the printed report.
Of course, much more information is available in the man page: pat_run