uncore

Uncore performance counters

Author:

Hewlett Packard Enterprise Development LP.

Copyright:

Copyright 2019,2021,2023-2024 Hewlett Packard Enterprise Development LP.

Manual section:

5

DESCRIPTION

Intel Xeon processors have shared resources on the socket that are not exclusive to any specific core. CrayPat supports experiments that make use of these “uncore” counters through model specific registers (MSRs), which are chip-specific registers that can be accessed through use of the environment variable PAT_RT_PERFCTR and related runtime environment variables. Uncore counters are organized into logical units called “boxes” and the boxes and registers available vary widely depending on the generation and model of the processor.

In general, events on the following boxes are supported. Additional boxes or logical units may be supported on specific processors.

B-Box: coherency and memory ordering

C-Box: last level cache

HA-Box: memory coherencing protocols

M-Box: integrated memory controllers

P-Box: physical QPI interconnect

QPI-Box: packetizing link layer

R-Box: crossbar router box

R2PCIe-Box: interface between the Ring and IIO traffic to/from PCIe

R3QPI-Box: interface between the QPI and the Ring

S-Box: manages the interface between the two Rings

U-Box: system config controller

W-Box: power controller

IMC: memory controller

PCU: power control

To determine which Uncore events are available, use srun to run the papi_component_avail on the compute node. For example:

$ aprun papi_component_avail

If access is permitted to the user, the following will list the Uncore events:

$ aprun papi_native_avail -i _unc_

For complete lists of the Uncore events currently supported by a processor family, execute pat_help, select counters, select your processor type, and then select native.

Uncore event names must be of the form PMU::event_specification, where PMU specifies one of the component names that is listed in the output from papi_component_avail and event_specification is an event that resides on that PMU.

Another way to determine which logical units are supported on a compute node is to list the pseudo-files in the /sys/devices directory. For example:

$ aprun ls /sys/devices

Note that some logical units are not accessible even though their respective pseudo-file entry may exist. If the user does not have access to the Uncore counters, there will be a message to that effect.

WARNINGS

When executing a program instrumented with pat_build and with one of the environment variables PAT_RT_PERFCTR set to select one or more Uncore events, all compute nodes upon which the instrumented executable is scheduled to execute must allow access to the Uncore events or else the program execution will fail. Access to Uncore events is determined by the /proc/sys/kernel/perf_event_paranoid file. If the content of this file is 0 or -1, access is permitted. If the value in this file is 1, access is not permitted, and users should contact the site administrator if they wish to have Uncore performance counter and event access enabled.

Uncore counters are collected by one core per socket, specifically processor zero on each socket upon which the application is scheduled and executing. If the application is not scheduled to run on processor zero of a socket, no Uncore counters are collected to represent the application on that socket.

Do not mix compute nodes that contain different processor models. If the /proc/sys/kernel/perf_event_paranoid file has been set up to grant access to the Uncore performance counter events, all requested Uncore events must exist on all compute nodes scheduled to execute the instrumented executable, or else the execution will fail.

The application must be launched such that MPI ranks are bound to specific sockets. If using the PBS workload manager, launch the application using the aprun -cc cpu option/keyword combination. If using the SLURM workload manager, launch the application using the srun –exclusive –cpu_bind=rank options. For more information, see the aprun(1) or srun(1) man pages.

While the maximum number of physical counters specified by a logical unit (box) may allow more than that number of events to be specified, a failure will occur when CrayPat attempts to read those events.

SEE ALSO

app2(1), intro_craypat(1), pat_build(1), pat_help(1), pat_report(1), pat_run(1)

accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)

Uncore Performance Monitoring Reference Manuals (link to URL https://software.intel.com/en-us/articles/intel-sdm#uncore)

Intel 64 and IA-32 Architectures Software Developer’s Manual Section 18.7.2, “Performance Monitoring Facility in the Uncore” (Order Number: 325462-055US)