pat_api
CrayPat Aplication Program Interface (API)
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2019,2021-2024 Hewlett Packard Enterprise Development LP.
- Manual section:
1
DESCRIPTION
CrayPat API calls are functions that can be inserted into source code that write special tracing records into the experiment data file at runtime.
When one of the perftools instrumentation modules is loaded, it defines a compiler macro called CRAYPAT. It can be useful when adding any of the following function calls or include statements to the source code to make them conditional:
#if defined(CRAYPAT)
<function call>
#endif
To disable all CrayAPI calls while using perftools, set PAT_RT_TRACE_API to 0.
The following is a description of available API functions in C. Fortran functions are similar, except that as with MPI, an extra argument needs to be added: typically istat or ierr. To expose the CrayPat API, in Python code, import pat_api. Function signatures are identical to C. For example, in C PAT_region_begin is called like this:
#ifdef CRAYPAT
PAT_region_begin ( 1, "loop" );
#endif
In Fortran, the call is:
#ifdef CRAYPAT
call PAT_region_begin ( 1, "loop", istat );
#endif
In Python, the call is:
if 'PAT_RT_EXPERIMENT' in os.environ:
pat_api.PAT_region_begin(1, "loop")
For further examples of using CrayPat API calls in source code, see pat_help API. For details on language-specific support, see the Language Support subsection of this man page.
int PAT_region_begin (int id, const char*label); int PAT_region_end (int id)
Defines the boundaries of a region. A region must consist of a sequence of executable statements within a single function, and must have a single entry at the top and a single exit at the bottom. Regions must be either separate or nested: if two regions are not disjoint, then one must entirely contain the other. A region may contain function calls. (These restrictions are similar to the restrictions on an OpenMP structured block.)
For each region, a summary of activity including time and hardware performance counters (if selected) is produced. The argument id assigns a numerical value to the region and must be greater than zero. Each id must be unique across the entire program.
The argument label assigns a character string to the region, allowing for easier identification of the region in the report.
These functions return PAT_API_OK if the region request was valid and PAT_API_FAIL if the request was not valid.
Two runtime environment variables affect region processing: PAT_RT_REGION_CALLSTACK and PAT_RT_REGION_MAX. See the intro_craypat(1) man page for more information.
int PAT_region_push (const char*label); int PAT_region_pop (const char*label)
When enabled and executed, these functions define the beginning and end of a region, which is identified by the label.
The calls from an associated pair are not required to appear within the same function, and the same label may used in more than one pair of calls.
If an execution of one region overlaps in time with an execution of another region or a traced function, then the time for one must entirely contain the time for the other.
For each region, a summary of activity including time and hardware performance counters (if selected) is produced.
These functions return PAT_API_OK if the region request was valid and PAT_API_FAIL if the request was not valid.
int PAT_record (int state)
If called from the main thread, PAT_record controls the state for all threads on the executing PE. Otherwise, it controls the state for the calling thread on the executing PE.
The PAT_record function sets the recording state to one of the following values and returns the previous state before the call was made.
Note: Calling PAT_STATE_ON or PAT_STATE_OFF in the middle of a traced function does not affect the resulting time for that function. These calls affect only subsequent traced functions and any other information those traced functions collect.
- PAT_STATE_ON
If called from the main thread, switches recording on for all threads on the executing PE. Otherwise, switches recording on for just the calling child thread.
- PAT_STATE_OFF
If called from the main thread, switches recording off for all threads on the executing PE. Otherwise, switches recording off for just the calling child thread.
- PAT_STATE_QUERY
If called from the main thread, returns the state of the main thread on the executing PE. Otherwise, returns the state of the calling child thread.
All other values have no effect on the state.
int PAT_flush_buffer (unsigned long *nbytes)
Writes all the recorded contents in the data buffer to the experiment data file for the calling PE and calling thread. The number of bytes written to the experiment data file is returned in the variable pointed to by *nbytes. Returns PAT_API_OK if all buffered data was written to the data file successfully, otherwise, returns PAT_API_FAIL. After writing the contents, the data buffer is empty and begins to refill. See intro_craypat(1) to control the size of the write buffer.
int PAT_counters (int category, const char *names[], unsigned long values[], int *nevents)
PAT_counters returns the names and current count value of any counter events that have been set to count on the hardware category. The names of these events are returned in the names array of strings, the number of names is returned in the location pointed by to nevents, and the counts are returned for the thread from which the function is called. The values for these events are returned in the values array of integers, and the number of values is returned in the location pointed by to nevents. If both names and values are set to zero then what nevents points to is set to the number of events. The function returns PAT_API_OK if all the event names were returned successfully and PAT_API_FAIL if they were not.
The values for category are:
- PAT_CTRS_CPU
Performance counters that reside on the CPU.
- PAT_CTRS_UNCORE
Performance counters that reside on the uncore subsystem of the CPU.
- PAT_CTRS_NETWORK
Performance counters that reside on the network interconnect.
- PAT_CTRS_ACCEL
Performance counters that reside on any GPU accelerator.
- PAT_CTRS_RAPL
Counters that measure the Running Average Power Level (raw counts) on a CPU socket.
- PAT_CTRS_PM
Counters that measure the Cray Power and Energy Management on a compute node.
To get just the number of events returned, set names or values to zero.
The event names to be returned are selected at runtime using the environment variable PAT_RT_PERFCTR. If no event names are specified, the value of nevents is zero.
PAT_trace (callable)
For Python tracing, a decorator to register a callable for tracing. Similar to calling PAT_region_push and PAT_region_pop before and after the decorated callable.
OpenMP
In some circumstances, CrayPat can automatically measure the overhead incurred by entering and leaving parallel regions and work-sharing constructs within parallel regions, show per-thread timings and other data, and calculate the load balance across threads for such constructs. See the pat_build(1) man page OpenMP section for details.
In circumstances where CrayPat cannot collect OpenMP information, the user may insert API calls using the CrayPat OpenMP API.
The following C functions are used to instrument OpenMP constructs for compilers that do not support automatic instrumentation. Fortran subroutines with the same names are also available.
void PAT_omp_barrier_enter (void);void PAT_omp_barrier_exit (void);void PAT_omp_loop_enter (void);void PAT_omp_loop_exit (void);void PAT_omp_master_enter (void);void PAT_omp_master_exit (void);void PAT_omp_parallel_begin (void);void PAT_omp_parallel_end (void);void PAT_omp_parallel_enter (void);void PAT_omp_parallel_exit (void);void PAT_omp_section_begin (void);void PAT_omp_section_end (void);void PAT_omp_sections_enter (void);void PAT_omp_sections_exit (void);void PAT_omp_single_enter (void);void PAT_omp_single_exit (void);void PAT_omp_task_begin (void);void PAT_omp_task_end (void);void PAT_omp_task_enter (void);void PAT_omp_task_exit (void);void PAT_omp_workshare_enter (void);void PAT_omp_workshare_exit (void);
Note that the CrayPat OpenMP API does not support combined parallel work-sharing constructs. To instrument such a construct, it must be split into a parallel construct containing a work-sharing construct.
Use of the CrayPat OpenMP API function must satisfy the following requirements.
If one member of an _enter/_exit or _begin/_end pair is called, the other must also be called.
Calls to _enter or _begin functions must immediately precede the relevant construct. Calls to _end or _exit functions must immediately follow the relevant construct.
For a given parallel region, all or none of the four functions with prefix PAT_omp_parallel must be called.
For a given “sections” construct, all or none of the four functions with prefix PAT_omp_section must be called.
A “single” construct should be treated as if it were a “sections” construct consisting of one section.
Language Support
Full CrayPat API support is provided for C and Fortran code. Partial CrayPAT API support is provided for Python code.
For C, C++, and Fortran code, include files may be located in $CRAYPAT_ROOT/include/pat_api* after the **perftools-base* module is loaded. If working in C, the header file _pat_api.h_ must be included in the C source code. If working in Fortran 90, the header file pat_apif.h must be included in the Fortran source code. In all other versions of Fortran, the header file pat_apif77.h must be incorporated in the Fortran source code.
Note: The Fortran header file pat_apif.h can be used only with compilers that accept Fortran 90 constructs such as new-style declarations and interface blocks. The other Fortran header file, pat_apif77.h, is for use with compilers that do not accept such constructs.
For Python code, import pat_api to expose the CrayPat API. To see which API functions are supported, load the perftools-base module and see $CRAYPAT_ROOT/libexec64/pat_api/__init__.py.
SEE ALSO
intro_craypat(1), pat_build(1), pat_help(1), pat_report(1), pat_run(1),
perftools-base(4), perftools-preload(4)
accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)