accpc
predefined accelerator performance counter groups
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2019,2021-2023-2024 Hewlett Packard Enterprise Development LP.
- Manual section:
5
DESCRIPTION
CrayPat supports experiments that make use of the accelerator performance counters (ACCPCs). These counters are accessed through use of runtime environment variables and their usage is described in the intro_craypat(1) man page. Counters can be enabled by using either the counter group numbers or names listed in pat_help.
For more detailed information about the individual counters that make up the groups, run:
$ srun papi_native_avail -i cuda$ srun papi_native_avail -i rocm_7001
NOTES
PAPI components post-fixed with _#### represent a version of the component locked at specific Cray-PAPI release version. For example, the rocm_7001 component represents the rocm component as of release 7.0.0.1.
When collecting ACCPCs for AMD GPUs, accelerator kernels are serialized, meaning that kernel launches block until kernel completion. This behavior is controlled via the AMD_SERIALIZE_KERNEL option, which may be set to 0 if kernel serialization is not desired. In addition, ACCPCs for AMD GPUs are only collected at kernel launch entry and return, i.e. accelerator counter events are not collected for HIP API calls other than hipLaunchKernel and hipModuleLaunchKernel.
The accelerator model and device driver version can be determined by running the following srun commands:
$ srun cat /proc/driver/nvidia/version | grep Module$ srun cat /proc/driver/nvidia/gpus/0/information | grep Model
Accelerator model and device driver version can also be found in the pat_report output from an instrumented executable that ran on an accelerator node.
$ pat_report data_file.ap2 2>&1 | grep -e “Accelerator Model” -e “Accelerator Driver”
In this release, hardware performance counters (HWPC) and accelerator performances counters (ACCPC) cannot be enabled simultaneously.
Cray accelerator hardware counters are not compatible with the CUDA profiler.
Accelerated applications cannot be compiled with -h profile_generate, therefore accelerator performance statistics and loop profile information cannot be collected from the same executable.
These counters should be used with programs instrumented for tracing experiments only. While samples will show up in CUDA libraries, sampling experiments provide no useful accelerator performance statistics.
Enabling accelerator hardware counters changes the behavior of the application. In order to accurately record the accelerator hardware counters, the host needs to synchronize with the accelerator at each event; this is due to the accelerator executing asynchronously with the host. This change in behavior can be seen in the accelerator table in pat_report; when ACCPC is not enabled, the time spent waiting for the accelerator is shown on the lines labeled ACC_SYNC_WAIT, which is the sync created by the compiler. Because the performance tools sync with the accelerator at each event, the Host Time is shown as exclusive Host Time for the containing region, as this waiting does not occur during the sync created by the compiler but rather within the event’s tracepoint.
Specific to Nvidia GPUs, collection of accelerator hardware counters is supported when CUDA Multi-Process Service (MPS) is enabled. However, collected counts may be higher than expect, or result in launch/device failures when monitoring select counters. To disable CUDA MPS, the proxy server must be manually disabled using the Nvidia nvidia-cuda-mps-control utility. See Nvidia’s CUDA MPS “Starting and Stopping MPS on LINUX” for additional details.
When the accelerator hardware counters overflow, the counter value will stick at 18446744073709551616. This will generate incorrect results for the native and derived counters in pat_report.
OpenCL is not supported at this time.
SEE ALSO
app2(1), intro_craypat(1), pat_build(1), pat_help(1), pat_report(1), pat_run(1)
accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)