Sample default report
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2024-2025 Hewlett Packard Enterprise Development LP.
This report was generated via the perftools-lite approach, but other versions will be very similar:
#################################################################
# #
# CrayPat-lite Performance Statistics #
# #
#################################################################
CrayPat/X: Version 24.11.0 Revision 31a512b4d sles15.5_x86_64 10/02/24 19:11:06
Experiment: lite lite-samples
Number of PEs (MPI ranks): 2
Numbers of PEs per Node: 2
Numbers of Threads per PE: 1
Number of Cores per Socket: 64
Accelerator Model: AMD MI200 Series; Memory: 64.00 GB; Frequency: 1.70 GHz
Execution start time: Thu Nov 14 12:58:50 2024
System name and speed: pinoak0003 2.028 GHz (nominal)
AMD Trento CPU Family: 25 Model: 48 Stepping: 1
Core Performance Boost: All 2 PEs have CPB capability
Avg Process Time: 9.47 secs
High Memory: 475.8 MiBytes 237.9 MiBytes per PE
I/O Read Rate: -- MiBytes/sec
I/O Write Rate: 4.244889 MiBytes/sec
Notes for table 1:
This table shows functions that have significant exclusive sample
hits, averaged across ranks.
For further explanation, use: pat_report -v -O samp_profile ...
Table 1: Sample Profile by Function
Samp% | Samp | Imb. | Imb. | Group
| | Samp | Samp% | Function=[MAX10]
| | | | PE=HIDE
100.0% | 939.5 | -- | -- | Total
|-----------------------------------------------------------------------------
| 86.8% | 815.5 | -- | -- | USER
||----------------------------------------------------------------------------
|| 20.8% | 195.0 | 0.0 | 0.0% | RAJA::detail::intro_sort_depth<>
|| 18.8% | 176.5 | 9.5 | 10.2% | rajaperf::calcChecksum
|| 17.0% | 159.5 | 0.5 | 0.6% | std::__introsort_loop<>
|| 15.4% | 145.0 | 0.0 | 0.0% | rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
|| 6.1% | 57.5 | 0.5 | 1.7% | rajaperf::basic::DAXPY::runSeqVariant
|| 3.0% | 28.5 | 1.5 | 10.0% | rajaperf::allocAndInitDataRandValue
|| 2.6% | 24.5 | 0.5 | 4.0% | rajaperf::algorithm::SORT::runSeqVariant
|| 1.1% | 10.0 | 0.0 | 0.0% | rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant
||============================================================================
| 11.1% | 104.5 | 9.5 | 16.7% | ETC: libm-2.31.so
||----------------------------------------------------------------------------
|| 11.1% | 104.5 | 9.5 | 16.7% | __sin_fma
||============================================================================
| 1.9% | 18.0 | 2.0 | 20.0% | MATH
||----------------------------------------------------------------------------
|| 1.9% | 18.0 | 2.0 | 20.0% | rand
|=============================================================================
Notes for table 2:
This table shows functions, and line numbers within functions, that
have significant exclusive sample hits, averaged across ranks.
For further explanation, use: pat_report -v -O samp_profile+src ...
Table 2: Sample Profile by Group, Function, and Line
Samp% | Samp | Imb. | Imb. | Group
| | Samp | Samp% | Function=[MAX10]
| | | | Source
| | | | Line
| | | | PE=HIDE
100.0% | 939.5 | -- | -- | Total
|-----------------------------------------------------------------------------
| 86.8% | 815.5 | -- | -- | USER
||----------------------------------------------------------------------------
|| 20.8% | 195.0 | -- | -- | RAJA::detail::intro_sort_depth<>
|||---------------------------------------------------------------------------
3|| 9.2% | 86.5 | -- | -- | c++/13/bits/move.h
||||--------------------------------------------------------------------------
4||| 3.0% | 28.0 | 4.0 | 25.0% | line.197
4||| 5.4% | 50.5 | 5.5 | 19.6% | line.198
||||==========================================================================
3|| 7.2% | 68.0 | -- | -- | include/RAJA/util/sort.hpp
4|| 6.2% | 58.5 | 1.5 | 5.0% | line.87
3|| 4.3% | 40.5 | 3.5 | 15.9% | include/RAJA/util/Operators.hpp
4|| | | | | line.513
|||===========================================================================
|| 18.8% | 176.5 | -- | -- | rajaperf::calcChecksum
3| | | | | RAJAPerf-v2022.10.0/src/common/DataUtils.cpp
||||--------------------------------------------------------------------------
4||| 1.9% | 18.0 | 3.0 | 28.6% | line.364
4||| 3.2% | 30.5 | 5.5 | 30.6% | line.365
4||| 1.4% | 13.5 | 1.5 | 20.0% | line.366
4||| 5.2% | 49.0 | 1.0 | 4.0% | line.367
4||| 4.2% | 39.0 | 6.0 | 26.7% | line.368
4||| 1.5% | 14.0 | 1.0 | 13.3% | line.369
||||==========================================================================
|| 17.0% | 159.5 | -- | -- | std::__introsort_loop<>
|||---------------------------------------------------------------------------
3|| 11.0% | 103.0 | -- | -- | c++/13/bits/stl_algo.h
||||--------------------------------------------------------------------------
4||| 1.2% | 11.5 | 0.5 | 8.3% | line.1877
4||| 2.6% | 24.0 | 0.0 | 0.0% | line.1878
4||| 3.9% | 37.0 | 3.0 | 15.0% | line.1880
4||| 3.0% | 28.5 | 5.5 | 32.4% | line.1882
||||==========================================================================
3|| 5.3% | 49.5 | 0.5 | 2.0% | c++/13/bits/predefined_ops.h
4|| | | | | line.45
|||===========================================================================
|| 15.4% | 145.0 | -- | -- | rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
|||---------------------------------------------------------------------------
3|| 13.2% | 124.0 | -- | -- | RAJAPerf-v2022.10.0/src/basic/INDEXLIST_3LOOP-Seq.cpp
||||--------------------------------------------------------------------------
4||| 3.6% | 33.5 | 0.5 | 2.9% | line.58
4||| 3.7% | 35.0 | 1.0 | 5.6% | line.81
4||| 1.8% | 16.5 | 0.5 | 5.9% | line.134
4||| 1.2% | 11.5 | 2.5 | 35.7% | line.135
||||==========================================================================
3|| 1.3% | 12.0 | 1.0 | 15.4% | include/RAJA/util/Operators.hpp
4|| | | | | line.334
|||===========================================================================
|| 6.1% | 57.5 | -- | -- | rajaperf::basic::DAXPY::runSeqVariant
3| 6.0% | 56.5 | -- | -- | RAJAPerf-v2022.10.0/src/basic/DAXPY-Seq.cpp
||||--------------------------------------------------------------------------
4||| 4.0% | 37.5 | 1.5 | 7.7% | line.30
4||| 2.0% | 19.0 | 0.0 | 0.0% | line.41
||||==========================================================================
|| 3.0% | 28.5 | 1.5 | 10.0% | rajaperf::allocAndInitDataRandValue
3| | | | | RAJAPerf-v2022.10.0/src/common/DataUtils.cpp
4| | | | | line.285
|| 2.6% | 24.5 | -- | -- | rajaperf::algorithm::SORT::runSeqVariant
3| 2.1% | 20.0 | -- | -- | c++/13/bits/stl_algo.h
|| 1.1% | 10.0 | -- | -- | rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant
3| | | | | RAJAPerf-v2022.10.0/src/apps/HALOEXCHANGE_FUSED-Seq.cpp
||============================================================================
| 11.1% | 104.5 | 9.5 | 16.7% | ETC: libm-2.31.so
||----------------------------------------------------------------------------
|| 11.1% | 104.5 | 9.5 | 16.7% | __sin_fma
||============================================================================
| 1.9% | 18.0 | 2.0 | 20.0% | MATH
||----------------------------------------------------------------------------
|| 1.9% | 18.0 | 2.0 | 20.0% | rand
|=============================================================================
Observation: MPI utilization
No suggestions were made because all ranks are on one node.
Samp% Function
PE=HIDE
20.8% RAJA::detail::intro_sort_depth<>
18.8% rajaperf::calcChecksum
17.0% std::__introsort_loop<>
15.4% rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
11.1% __sin_fma
6.1% rajaperf::basic::DAXPY::runSeqVariant
3.0% rajaperf::allocAndInitDataRandValue
2.6% rajaperf::algorithm::SORT::runSeqVariant
1.9% rand
1.1% rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant
Notes for table 3:
This table shows memory traffic for numa nodes, taking for each numa
node the maximum value across nodes. It also shows the balance in
memory traffic by showing the top 3 and bottom 3 node values.
For further explanation, use: pat_report -v -O mem_bw ...
Table 3: Memory Bandwidth by Numanode
Thread | Numanode
Time | PE=HIDE
|---------------------
| 9.445552 | numanode.0
|=====================
Notes for table 4:
This table shows energy and power usage for the nodes with the
maximum, mean, and minimum usage, as well as the sum of usage over
all nodes.
Energy and power for accelerators is also shown, if available.
For further explanation, use: pat_report -v -O program_energy ...
Table 4: Program Energy and Power Usage from Cray PM
PE=HIDE
=========================================================
Total
---------------------------------------------------------
PM Energy Node 574 W 5,434 J
PM Energy Cpu 43 W 408 J
PM Energy Memory 69 W 653 J
PM Energy Acc0 91 W 861 J
PM Energy Acc1 89 W 843 J
PM Energy Acc2 90 W 852 J
PM Energy Acc3 89 W 841 J
Process Time 9.474561 secs
=========================================================
Notes for table 5:
This table show the average time and number of bytes written to each
output file, taking the average over the number of ranks that
wrote to the file. It also shows the number of write operations,
and average rates.
For further explanation, use: pat_report -v -O write_stats ...
Table 5: File Output Stats by Filename
Avg | Avg | Write Rate | Number | Avg | Bytes/ | File Name=!x/^/(proc|sys)/
Write | Write | MiBytes/sec | of | Writes | Call | PE=HIDE
Time per | MiBytes | | Writer | per | |
Writer | per | | Ranks | Writer | |
Rank | Writer | | | Rank | |
| Rank | | | | |
|-----------------------------------------------------------------------------
| 0.000196 | 0.000207 | 1.055628 | 1 | 4.0 | 54.25 | ./RAJAPerf-timing-Average.csv
| 0.000185 | 0.001038 | 5.612246 | 1 | 15.0 | 72.53 | ./RAJAPerf-checksum.txt
| 0.000165 | 0.000233 | 1.413520 | 1 | 4.0 | 61.00 | ./RAJAPerf-speedup-Average.csv
| 0.000164 | 0.000961 | 5.876299 | 1 | 9.0 | 112.00 | ./RAJAPerf-fom.csv
| 0.000143 | 0.000251 | 1.749410 | 1 | 3.0 | 87.67 | ./RAJAPerf-kernels.csv
| 0.000029 | 0.001053 | 36.211744 | 1 | 248.0 | 4.45 | stdout
|=============================================================================
Table 6: Lustre File Information
File Path | Stripe | Stripe | Stripe | OST list
| size | offset | count |
------------------------------------------------------------------------
./RAJAPerf-timing-Average.csv | 1,048,576 | 0 | 1 | 1
./RAJAPerf-speedup-Average.csv | 1,048,576 | 0 | 1 | 0
./RAJAPerf-checksum.txt | 1,048,576 | 0 | 1 | 1
./RAJAPerf-fom.csv | 1,048,576 | 0 | 1 | 0
./RAJAPerf-kernels.csv | 1,048,576 | 0 | 1 | 1
========================================================================
Program invocation:
/lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/./bin/raja-perf.exe -pftol 0.05 -k Apps_HALOEXCHANGE_FUSED
For a complete report with expanded tables and notes, run:
pat_report /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s
For help identifying callers of particular functions:
pat_report -O callers+src /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s
To see the entire call tree:
pat_report -O calltree+src /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s
For interactive, graphical performance analysis, run:
app2 /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s
================ End of CrayPat-lite output ==========================