Sample default report

Author:

Hewlett Packard Enterprise Development LP.

Copyright:

Copyright 2024-2025 Hewlett Packard Enterprise Development LP.

This report was generated via the perftools-lite approach, but other versions will be very similar:

#################################################################
#                                                               #
#            CrayPat-lite Performance Statistics                #
#                                                               #
#################################################################

CrayPat/X:  Version 24.11.0 Revision 31a512b4d sles15.5_x86_64  10/02/24 19:11:06
Experiment:                  lite  lite-samples
Number of PEs (MPI ranks):      2
Numbers of PEs per Node:        2
Numbers of Threads per PE:      1
Number of Cores per Socket:    64
Accelerator Model: AMD MI200 Series; Memory: 64.00 GB; Frequency: 1.70 GHz


Execution start time:  Thu Nov 14 12:58:50 2024
System name and speed:  pinoak0003  2.028 GHz (nominal)
AMD   Trento               CPU  Family: 25  Model: 48  Stepping:  1
Core Performance Boost:  All 2 PEs have CPB capability


Avg Process Time:     9.47 secs
High Memory:         475.8 MiBytes     237.9 MiBytes per PE
I/O Read Rate:          -- MiBytes/sec
I/O Write Rate:   4.244889 MiBytes/sec

Notes for table 1:

  This table shows functions that have significant exclusive sample
    hits, averaged across ranks.
  For further explanation, use:  pat_report -v -O samp_profile ...

Table 1:  Sample Profile by Function

  Samp% |  Samp | Imb. |  Imb. | Group
        |       | Samp | Samp% |  Function=[MAX10]
        |       |      |       |   PE=HIDE

 100.0% | 939.5 |   -- |    -- | Total
|-----------------------------------------------------------------------------
|  86.8% | 815.5 |   -- |    -- | USER
||----------------------------------------------------------------------------
||  20.8% | 195.0 |  0.0 |  0.0% | RAJA::detail::intro_sort_depth<>
||  18.8% | 176.5 |  9.5 | 10.2% | rajaperf::calcChecksum
||  17.0% | 159.5 |  0.5 |  0.6% | std::__introsort_loop<>
||  15.4% | 145.0 |  0.0 |  0.0% | rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
||   6.1% |  57.5 |  0.5 |  1.7% | rajaperf::basic::DAXPY::runSeqVariant
||   3.0% |  28.5 |  1.5 | 10.0% | rajaperf::allocAndInitDataRandValue
||   2.6% |  24.5 |  0.5 |  4.0% | rajaperf::algorithm::SORT::runSeqVariant
||   1.1% |  10.0 |  0.0 |  0.0% | rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant
||============================================================================
|  11.1% | 104.5 |  9.5 | 16.7% | ETC: libm-2.31.so
||----------------------------------------------------------------------------
||  11.1% | 104.5 |  9.5 | 16.7% | __sin_fma
||============================================================================
|   1.9% |  18.0 |  2.0 | 20.0% | MATH
||----------------------------------------------------------------------------
||   1.9% |  18.0 |  2.0 | 20.0% | rand
|=============================================================================

Notes for table 2:

  This table shows functions, and line numbers within functions, that
    have significant exclusive sample hits, averaged across ranks.
  For further explanation, use:  pat_report -v -O samp_profile+src ...

Table 2:  Sample Profile by Group, Function, and Line

  Samp% |  Samp | Imb. |  Imb. | Group
        |       | Samp | Samp% |  Function=[MAX10]
        |       |      |       |   Source
        |       |      |       |    Line
        |       |      |       |     PE=HIDE

 100.0% | 939.5 |   -- |    -- | Total
|-----------------------------------------------------------------------------
|  86.8% | 815.5 |   -- |    -- | USER
||----------------------------------------------------------------------------
||  20.8% | 195.0 |   -- |    -- | RAJA::detail::intro_sort_depth<>
|||---------------------------------------------------------------------------
3||   9.2% |  86.5 |   -- |    -- | c++/13/bits/move.h
||||--------------------------------------------------------------------------
4|||   3.0% |  28.0 |  4.0 | 25.0% | line.197
4|||   5.4% |  50.5 |  5.5 | 19.6% | line.198
||||==========================================================================
3||   7.2% |  68.0 |   -- |    -- | include/RAJA/util/sort.hpp
4||   6.2% |  58.5 |  1.5 |  5.0% |  line.87
3||   4.3% |  40.5 |  3.5 | 15.9% | include/RAJA/util/Operators.hpp
4||        |       |      |       |  line.513
|||===========================================================================
||  18.8% | 176.5 |   -- |    -- | rajaperf::calcChecksum
3|        |       |      |       |  RAJAPerf-v2022.10.0/src/common/DataUtils.cpp
||||--------------------------------------------------------------------------
4|||   1.9% |  18.0 |  3.0 | 28.6% | line.364
4|||   3.2% |  30.5 |  5.5 | 30.6% | line.365
4|||   1.4% |  13.5 |  1.5 | 20.0% | line.366
4|||   5.2% |  49.0 |  1.0 |  4.0% | line.367
4|||   4.2% |  39.0 |  6.0 | 26.7% | line.368
4|||   1.5% |  14.0 |  1.0 | 13.3% | line.369
||||==========================================================================
||  17.0% | 159.5 |   -- |    -- | std::__introsort_loop<>
|||---------------------------------------------------------------------------
3||  11.0% | 103.0 |   -- |    -- | c++/13/bits/stl_algo.h
||||--------------------------------------------------------------------------
4|||   1.2% |  11.5 |  0.5 |  8.3% | line.1877
4|||   2.6% |  24.0 |  0.0 |  0.0% | line.1878
4|||   3.9% |  37.0 |  3.0 | 15.0% | line.1880
4|||   3.0% |  28.5 |  5.5 | 32.4% | line.1882
||||==========================================================================
3||   5.3% |  49.5 |  0.5 |  2.0% | c++/13/bits/predefined_ops.h
4||        |       |      |       |  line.45
|||===========================================================================
||  15.4% | 145.0 |   -- |    -- | rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
|||---------------------------------------------------------------------------
3||  13.2% | 124.0 |   -- |    -- | RAJAPerf-v2022.10.0/src/basic/INDEXLIST_3LOOP-Seq.cpp
||||--------------------------------------------------------------------------
4|||   3.6% |  33.5 |  0.5 |  2.9% | line.58
4|||   3.7% |  35.0 |  1.0 |  5.6% | line.81
4|||   1.8% |  16.5 |  0.5 |  5.9% | line.134
4|||   1.2% |  11.5 |  2.5 | 35.7% | line.135
||||==========================================================================
3||   1.3% |  12.0 |  1.0 | 15.4% | include/RAJA/util/Operators.hpp
4||        |       |      |       |  line.334
|||===========================================================================
||   6.1% |  57.5 |   -- |    -- | rajaperf::basic::DAXPY::runSeqVariant
3|   6.0% |  56.5 |   -- |    -- |  RAJAPerf-v2022.10.0/src/basic/DAXPY-Seq.cpp
||||--------------------------------------------------------------------------
4|||   4.0% |  37.5 |  1.5 |  7.7% | line.30
4|||   2.0% |  19.0 |  0.0 |  0.0% | line.41
||||==========================================================================
||   3.0% |  28.5 |  1.5 | 10.0% | rajaperf::allocAndInitDataRandValue
3|        |       |      |       |  RAJAPerf-v2022.10.0/src/common/DataUtils.cpp
4|        |       |      |       |   line.285
||   2.6% |  24.5 |   -- |    -- | rajaperf::algorithm::SORT::runSeqVariant
3|   2.1% |  20.0 |   -- |    -- |  c++/13/bits/stl_algo.h
||   1.1% |  10.0 |   -- |    -- | rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant
3|        |       |      |       |  RAJAPerf-v2022.10.0/src/apps/HALOEXCHANGE_FUSED-Seq.cpp
||============================================================================
|  11.1% | 104.5 |  9.5 | 16.7% | ETC: libm-2.31.so
||----------------------------------------------------------------------------
||  11.1% | 104.5 |  9.5 | 16.7% | __sin_fma
||============================================================================
|   1.9% |  18.0 |  2.0 | 20.0% | MATH
||----------------------------------------------------------------------------
||   1.9% |  18.0 |  2.0 | 20.0% | rand
|=============================================================================

Observation:  MPI utilization

    No suggestions were made because all ranks are on one node.

     Samp%   Function
              PE=HIDE

      20.8%   RAJA::detail::intro_sort_depth<>
      18.8%   rajaperf::calcChecksum
      17.0%   std::__introsort_loop<>
      15.4%   rajaperf::basic::INDEXLIST_3LOOP::runSeqVariant
      11.1%   __sin_fma
       6.1%   rajaperf::basic::DAXPY::runSeqVariant
       3.0%   rajaperf::allocAndInitDataRandValue
       2.6%   rajaperf::algorithm::SORT::runSeqVariant
       1.9%   rand
       1.1%   rajaperf::apps::HALOEXCHANGE_FUSED::runSeqVariant


Notes for table 3:

  This table shows memory traffic for numa nodes, taking for each numa
    node the maximum value across nodes. It also shows the balance in
    memory traffic by showing the top 3 and bottom 3 node values.
  For further explanation, use:  pat_report -v -O mem_bw ...

Table 3:  Memory Bandwidth by Numanode

   Thread | Numanode
     Time |  PE=HIDE
|---------------------
| 9.445552 | numanode.0
|=====================

Notes for table 4:

  This table shows energy and power usage for the nodes with the
    maximum, mean, and minimum usage, as well as the sum of usage over
    all nodes.
    Energy and power for accelerators is also shown, if available.
  For further explanation, use:  pat_report -v -O program_energy ...

Table 4:  Program Energy and Power Usage from Cray PM

PE=HIDE


=========================================================
  Total
---------------------------------------------------------
  PM Energy Node   574 W    5,434 J
  PM Energy Cpu     43 W      408 J
  PM Energy Memory  69 W      653 J
  PM Energy Acc0    91 W      861 J
  PM Energy Acc1    89 W      843 J
  PM Energy Acc2    90 W      852 J
  PM Energy Acc3    89 W      841 J
  Process Time           9.474561 secs
=========================================================

Notes for table 5:

  This table show the average time and number of bytes written to each
    output file, taking the average over the number of ranks that
    wrote to the file.  It also shows the number of write operations,
    and average rates.
  For further explanation, use:  pat_report -v -O write_stats ...

Table 5:  File Output Stats by Filename

      Avg |      Avg |  Write Rate | Number |    Avg | Bytes/ | File Name=!x/^/(proc|sys)/
    Write |    Write | MiBytes/sec |     of | Writes |   Call |  PE=HIDE
 Time per |  MiBytes |             | Writer |    per |        |
   Writer |      per |             |  Ranks | Writer |        |
     Rank |   Writer |             |        |   Rank |        |
          |     Rank |             |        |        |        |
|-----------------------------------------------------------------------------
| 0.000196 | 0.000207 |    1.055628 |      1 |    4.0 |  54.25 | ./RAJAPerf-timing-Average.csv
| 0.000185 | 0.001038 |    5.612246 |      1 |   15.0 |  72.53 | ./RAJAPerf-checksum.txt
| 0.000165 | 0.000233 |    1.413520 |      1 |    4.0 |  61.00 | ./RAJAPerf-speedup-Average.csv
| 0.000164 | 0.000961 |    5.876299 |      1 |    9.0 | 112.00 | ./RAJAPerf-fom.csv
| 0.000143 | 0.000251 |    1.749410 |      1 |    3.0 |  87.67 | ./RAJAPerf-kernels.csv
| 0.000029 | 0.001053 |   36.211744 |      1 |  248.0 |   4.45 | stdout
|=============================================================================

Table 6:  Lustre File Information

                      File Path |    Stripe | Stripe | Stripe | OST list
                                |      size | offset |  count |
------------------------------------------------------------------------
  ./RAJAPerf-timing-Average.csv | 1,048,576 |      0 |      1 | 1
 ./RAJAPerf-speedup-Average.csv | 1,048,576 |      0 |      1 | 0
        ./RAJAPerf-checksum.txt | 1,048,576 |      0 |      1 | 1
             ./RAJAPerf-fom.csv | 1,048,576 |      0 |      1 | 0
         ./RAJAPerf-kernels.csv | 1,048,576 |      0 |      1 | 1
========================================================================

Program invocation:
  /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/./bin/raja-perf.exe -pftol 0.05 -k Apps_HALOEXCHANGE_FUSED

For a complete report with expanded tables and notes, run:
  pat_report /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s

For help identifying callers of particular functions:
  pat_report -O callers+src /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s
To see the entire call tree:
  pat_report -O calltree+src /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s

For interactive, graphical performance analysis, run:
  app2 /lus/cflus02/jvogt/raja/RAJAPerf/RAJAPerf-v2022.10.0/build_mpi_cce/raja-perf.exe+592077-23709654s

================  End of CrayPat-lite output  ==========================