pat_build

instrument a program for performance analysis

Author:

Hewlett Packard Enterprise Development LP.

Copyright:

Copyright 2019-2024 Hewlett Packard Enterprise Development LP.

Manual section:

1

SYNOPSIS

pat_build [-d directive-file] [-D directive-name] [-f] [-g trace-group] [-n] [-O ofile] [-o instr-program] [-r] [-s] [-S] [-t trace-file] [-T trace-func] [-u] [-V] [-v] [-w] [-z] program [instr-program]

DESCRIPTION

The pat_build utility executes the instrumenting portion of the CrayPat performance analysis tool as a stand-alone program.

CrayPat supports two categories of performance analysis experiments: tracing experiments, which count some event such as the number of times a specific system call is executed, and asynchronous (sampling) experiments, which capture values at specified time intervals or when a specified counter overflows. If neither a tracing experiment nor sampling is specified when instrumenting program, pat_build defaults to instrumenting program for Automatic Profiling Analysis (-O apa), as described elsewhere in this man page.

Typically, after using pat_build to instrument a program, users set environment variables to control runtime data collection and run the instrumented executable, and then use either pat_report or Cray Apprentice2 to view the resulting report.

OPTIONS

pat_build supports the following options and arguments:

-d directive-file

Process all the directives contained in the file directive-file. See the BUILD DIRECTIVES section for more information.

-D directive-name

Process the individual directive directive-name. See the BUILD DIRECTIVES section for more information.

-f

If the output file instr-program already exists, overwrite it.

-g trace-group

Instrument program to trace all function references belonging to the trace function group trace-group. Only those functions actually executed by program at runtime are traced. trace-group is case-insensitive and can be one or more of the values listed below.

If the trace-group name is preceded by an exclamation point (!) character, the functions within the specified trace-group are not traced.

Not all trace function groups are available on all Cray platforms. If a trace-group is not available on your system its request is silently ignored. If a trace-group contains functions that are not supported on your system, those functions are silently ignored and are not traced.

A trace group supports only a single version of the library associated with the group of functions. To determine which version of the library is supported by the trace group, execute the following shell command:

strings instr-program | grep CRAYPAT_LIBRARY

If the version of the library linked into the instr-program does not match the version supported by the trace group associated with the library, some functions in the library may not be traced, or the instrumented program may not execute as expected.

The valid trace-group values are:

adios2

Adaptable Input Output System Version 2

aio

Functions that perform asynchronous I/O.

blacs

Basic Linear Algebra communication subprograms

blas

Basic Linear Algebra subprograms

caf

Co-Array Fortran (CCE compiler only)

comex

Communications Runtime for Extreme Scale

cuda

NVidia Compute Unified Device Architecture runtime and driver API

cuda_math

NVidia Compute Unified Device Architecture math library API

curl

Multi-protocol file transfer API

dl

functions that manage dynamic linking

dmapp

Distributed Memory Application API

dsmml

Distributed Shared Symmetric Memory Management API

fabric

Open network communication services API

ffio

functions that perform Flexible File I/O (CCE compiler only)

fftw

Fast Fourier Transform library (32- and 64-bit only)

ga

Global Arrays API

gmp

GNU MultiPrecision Arithmetic Library

hdf5

Hierarchical Data Format library

heap

dynamic heap

hip

AMD Heterogeneous-compute Interface for Portability runtime API

hip_math

AMD Heterogeneous-compute Interface for Portability math library API

hsa

AMD Heterogeneous System Architecture API

huge

Linux huge pages

io

functions and system calls that perform I/O

lapack

Linear Algebra Package

lustre

Lustre User API

math

POSIX.1 math functions

memory

memory management operations

mpfr

GNU MultiPrecision Floating-Point Library

mpi

Message Passing Interface library

nccl

Nvidia NCCL communication library

netcdf

Network Common Data Form

numa

Non-uniform Memory Access API (see numa(3))

oacc

OpenAccelerator API

omp

OpenMP API

opencl

Open Computing Language API

pblas

Parallel Basic Linear Algebra Subroutines

petsc

Portable Extensible Toolkit for Scientific Computation. Supported for “real” computations only.

pgas

Parallel Global Address Space

pnetcdf

Parallel Network Common Data Form

pthreads

POSIX threads

pthreads_mutex

POSIX threads concurrent process control

pthreads_spin

POSIX threads low-level synchronization control

rccl

AMD RCCL communication library

realtime

POSIX realtime extensions

rocm_math

AMD Radeon Open Compute platform math library API

scalapack

Scalable LAPACK

shmem

One-sided Remote Direct Memory Access Parallel-Processing Interface library

signal

POSIX signal handling and control

spawn

POSIX realtime process creation

stdio

all library functions that accept or return the FILE* construct

string

String operations

syscall

system calls

sysfs

system calls that perform miscellaneous file management

sysio

system calls that perform I/O

umpire

Heterogeneous Memory Resources Management Library

upc

Unified Parallel C (CCE compiler only)

xpmem

cross-process memory mapping

zmq

High-performance asynchronous messaging API

-n

Reproduce the original program as instr-program rather than creating a new program instrumented for performance analysis. This can be used as a validation test.

-O ofile

Specifies an ASCII file (ofile) that contains pat_build command line options. The options in the file are interpreted as if they appear in the same relative position as the -O ofile argument appears in the original pat_build command line.

Additionally, pat_build command line options can be specified using the environment variable PAT_BUILD_OPTIONS.

Note: Use the special argument -O apa to instrument a program for automatic profiling analysis. Executing the resulting instrumented executable will generate a .apa file, which contains recommended parameters for re-instrumenting program and refining the performance analysis. For more information about using automatic profiling analysis, see the EXAMPLES section of this man page.

-o instr-program

The resulting instrumented executable. If the -o option is not specified, and the instrumented executable name is not otherwise specified as the final argument, the resulting instrumented executable is written to the file program+pat.

-r

Report on the status of tracing-related issues.

-s

Terminates pat_build immediately after acquiring all the link information from program. This can be used with the -v option to validate certain input parameters without performing any instrumentation.

-S

Instruments program asynchronously. Note that the use of -S with any of the tracing options (-t, -T, -u, -w, or -g) will result in an error as the options require that program be instrumented synchronously.

-t trace-file

Instrument program to trace all user-defined function references listed in the trace-file.

Note that the ! and / characters can be used in the trace-file with the same effect described for -T because each line in the file is interpreted in the same way as if it had been specified with the -T option. Because it is not processed by a shell there is no need to use quotes or escapes on any special characters.

-T trace-func

Instrument program to trace the user-defined function references to trace-func.

Use the nm(1) or readelf(1) command to determine function names to specify for tracing. The name of the function is the name used when the program is linked. For Fortran 90 and C++ programs, this is the mangled form of the name.

If trace-func is preceded by an exclamation point (!) character, references to trace-func are not traced.

If trace-func is preceded by a slash (/) character, the string is interpreted as a basic regular expression. If more than one regular expression is specified, the union of all regular expressions is taken. All functions that match at least one of the regular expressions are added to the list of functions to trace. The match is case sensitive. For more information about UNIX regular expressions, see the regexec(3) man page.

The functions identified as a result of regular expressions are those defined in source code owned by the user. To apply the regular expressions to all functions in program (e.g., including those in system header files), set the directive trace-user-only to 0. For more information about the directive trace-user-only, see the BUILD DIRECTIVES section of this man page.

One or more regular expression qualifiers can precede the slash (/) character. These are:

!

Reverse the results of the match.

i

Ignore case when matching.

x

Use extended regular expressions.

If the list of functions to be traced using regular expressions includes any user-defined functions, the -w option must also be specified to generate trace intercept routines.

-u

Create new trace intercept routines for those functions that are defined in the respective source file owned by the user. Use the trace-text-size directive to further refine selecting functions to trace. To trace the user-defined function, use the -T option with the function name. To prevent a user-defined function trace-func from being traced, use the -T option preceding the function name with an exclamation point (!) character as shown in this example:

-T !trace-func

-V

Write the CrayPat version number to standard error.

-v

Verbose. Write progress messages related to the instrumentation process to standard error. For each -v specified an increase in detail is produced.

-w

Make tracing the default experiment and create new trace intercept routines for user-defined functions. If -t, -T, or the directive trace are not specified, only those functions necessary to support the CrayPat runtime library are traced.

-z

Do not process the contents of the startup directives file in $CRAYPAT_ROOT/share/config/BuildDirectives.

BUILD DIRECTIVES

Directives further affect how pat_build evaluates and produces instrumented executables. The format of each of the following directives is directive-name=directive-value.

addsym-archive=y | n

If set to y archive files writable by the user are eligible to have their functions traced when the -u option is specified. This is the default behavior. If a member object filename appears multiple times in the archive, a global function whose references and definition are in the same object file will not have those references traced.

addsym-cmd=file

Specifies the executable file of the addsym utility. The default is $CRAYPAT_ROOT/sbin/addsym. If set to 0 the addsym utility is disabled and a global function whose references and definition are in the same object file will not have those references traced.

addsym-weak=y | n

If set to y functions defined with WEAK binding in files writable by the user are eligible to have their functions traced when the -u option is specified. This is the default behavior.

debug-pubnames=y | n, debug-names=y | n

Indicates if the .debug_pubnames section is used to identify all global functions for generating trace intercept routines (default). If set to n or if no .debug_pubnames section exists, all global functions are collected by processing each .debug_info section individually. The default is y.

force-instr=y | n

By default, the pat_build utility does not permit a program to be instrumented if it already has been instrumented by another method. If this directive is set to y, the pat_build utility ignores the check for prior instrumentation and attempts to force instrumentation of program. The other methods of instrumenting a program include:

  • the PERFCTR, PFM, or PAPI libraries

  • the IOBUF or FPMPI libraries

  • GNU profiling or GNU coverage analysis

  • MPI profiling functions

  • previous use of the pat_build utility

Caution:

Using this directive to force instrumentation of a previously instrumented executable may result in an executable that produces incorrect results, exhibits unexpected behavior, or generates invalid CrayPat performance analysis data.

group=trace-group[, trace-group…]

Specifies one or more trace groups in the original program to trace. If trace-group is preceded by an exclamation point (!) character, all functions in the group are not allowed to be traced.

invalid=entry-point[, entry-point…]

Specifies one or more functions in the original program that inhibit any instrumentation.

link-fatal=operand[, operand…]

Specifies one or more operands that, if present in the original link, will prevent the instrumented link from occurring.

link-ignore=operand[, operand…]

Specifies one or more operands that, if present in the original link, will not be passed down to the instrumented link.

link-ignore-libs=lib[, lib…]

Specifies one or more object or archive files that, if present in the original link, will not be passed down to the instrumented link.

link-instr=operand[, operand…]

Specifies one or more operands to include in the instrumented link.

link-map=y | n

Generates a link map about the link that created the instrumented executable. The link map is written to file program+map.

link-minus-u=entry-point[, entry-point,…]

Adds the ld -u option for each entry-point to the relink command, forcing the entry-point to be loaded into the instrumented executable.

link-objs=ofile[, ofile…]

Specifies one or more object files to include in the instrumented link.

link-rpath=y | n

Add the ld -rpath=$CRAYPAT_ROOT/lib64 option to the relink command. The default is y.

link-symbol=entry-point[, entry-point,_…]

Adds the ld -y option for each entry-point to the relink command, showing where the entry-point is being referenced and from where it is resolved.

report=y | n

Generates a text report upon completion of the instrumented executable’s successful execution. The report is written to stdout. The default is n.

rtenv=name=value[;name=value;…]

Embeds the runtime environment variable name in the instrumented executable and sets it to value value. If a runtime environment variable is set using both this directive and in the execution environment, the value set in the execution environment takes precedence and this value is ignored.

For more information about runtime environment variables, see the intro_craypat(1) man page.

trace=trace-func[, trace-func,…]

Specifies one or more functions in the original program to trace. If trace-func is preceded by an exclamation point (!) character, function trace-func is not allowed to be traced.

trace-args=y | n

Collect and record at runtime the values of formal parameters for generated trace intercept routines. The default is n.

trace-complex=y | n

If set to y, generate a trace intercept routine for functions that return a complex value. The default is n.

trace-debug=strng[,strng2,…]

Add verbose print statements to generated trace intercept routines. The string strng identifies all or part of the function name. The print statements are activated at runtime when the environment variable PAT_RT_MSG_VERBOSE is set to one or more PE numbers. This may be helpful if a traced function is suspected of causing a runtime error.

trace-file=strng[,strng2,…]

Activate or deactivate tracing of functions in a file. The string strng identifies all or part of the file name to activate or deactivate. If strng is preceded by an exclamation point (!) character functions in the matched file(s) are not traced.

trace-force-deref=y | n

If set to y, when a trace intercept routine is generated for a Fortran subprogram, define each formal parameter to the subprogram as being dereferenced. The default is determined by examining the Dwarf Information Entry (DIE) associated with the formal parameter.

trace-fortran-char-string=y | n

If set to n, do not generate a trace intercept routine for functions that have Fortran character string types as a formal parameter. The default is y.

trace-gpu-pure=y | n

If set to y and the -g cuda or -g hip option is specified, only enable the cuda or hip trace intercept routines if the OpenMP programming model is not present. The default is n.

trace-max=n

The maximum number of functions in the original program that can be traced based on the size of the function. By default, the 1024 largest functions by size are traced. If n is less than zero, the n smallest functions are traced. This directive can be combined with the trace-text-size directive to trace a limited number of functions within a specified range. Tracing a large number of functions results in degraded performance of the instrumented executable at runtime.

trace-return-size=min,max

Specifies the minimum and maximum size in bytes of the return value of a user function to trace. User functions that return fewer than min bytes or greater than max bytes are not traced.

Defaults: 0,16

trace-skip=strng[,strng2,…]

Silently ignore functions when processing them for tracing. The string strng identifies all or part of the function name.

trace-text-size=min,max

Specifies the minimum and maximum size in bytes of text sections in functions selected using the -u option to trace. The default is to trace user-defined functions whose text sections are greater than or equal to 1200 bytes. This does not apply to functions defined in the trace function groups.

trace-user-only=y | n

By default, pat_build generates trace intercept routines to trace functions defined in source files solely owned by the user. This includes header files. If set to n, this directive generates trace intercept routines to trace functions that are defined in source files not solely owned by the user (e.g., system header files). This directive may be most useful with applications written in C++.

varargs=y | n

If set to y, functions that accept variable arguments can be traced. The default is n.

OpenMP

For programs that use the OpenMP programming model, CrayPat can measure the overhead incurred by entering and leaving parallel regions and work-sharing constructs within parallel regions, show per-thread timings and other data, and calculate the load balance across threads for such constructs.

For programs that use both MPI and OpenMP, profiles by default show the load balance over PEs of the average time in the threads for each PE, but profiles also show load balances for each programming model separately. For more information about reporting load balance by programming model, see the pat_report(1) man page.

If the executable is built with the CCE C/C++ compiler’s -finstrument-openmp or the CCE Fortran compiler’s -h omp_trace option, calls to tracepoints in the CrayPat runtime library will be inserted and used to support CrayPat measurements. These options are on by default. This approach is recommended when using the Cray programming environment.

CrayPat OMPT, an OpenMP Tools (OMPT) implementation, collects measurements for programs whose OpenMP implementation supports OMPT. For host callbacks, this includes CCE, Intel, and AMD compilers. For target device callbacks and tracing, this includes AMD with ROCM 5.2.3 or greater. To enable CrayPat OMPT, set CRAYPAT_OMP_TOOL to ‘enabled’ prior to program compilation. CrayPat OMPT may not be used with CCE compiler tracepoints.

If CRAYPAT_OMP_TOOL is unset and pat_build is called using an Intel or AMD programming environment and OpenMP use is detected, CrayPat OMPT will be implicitly enabled.

In all other cases, the user is responsible for inserting API calls using the CrayPat OpenMP API. See the pat_api(1) man page’s OpenMP section for details.

If the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option is specified, all OpenMP constructs and optimizations are disabled.

WARNINGS

The perftools-base and an instrumentation module must be loaded before compiling and linking the original program. The perftools-base module does not affect program behavior and can be left loaded when not collecting performance data.

The pat_build utility recommends that compiling and linking be done as separate and distinct steps when creating the original program.

Instrumenting a program that uses the MPI profiling mechanism may not succeed, and if it does appear to succeed, the resulting instrumented executable may not execute correctly or may produce invalid results.

Tracing frequently called user-defined functions, such as those called from within a loop, adds excessive overhead and can cause large runtime dilation. Use the -T ! syntax to prevent these functions from being traced.

Tracing some pthreads functions may result in programs that use POSIX threads to experience segmentation faults. Tracing pthread_mutex and pthread_spin functions may greatly increase the runtime overhead.

Tracing Fortran functions that have Fortran character strings as formal parameters may not be properly represented in the Dwarf Information Entry (DIE) associated with the parameter. This can cause segmentation faults or otherwise unexpected runtime behavior of the instrumented program. See the directive trace-fortran-char-string for more information. The -T ! syntax may also be used to prevent suspect user-defined functions from being traced.

Tracing heap functions on RedHat systems can cause the instrumented program to hang or otherwise not complete executing when the program is scheduled for execution across multiple compute nodes.

The -r option indicates that messages related to trace-related issues are written to standard error. To request additional details regarding the status of tracing a function, specify at least one -v option or set PAT_BUILD_VERBOSE to at least 1.

An instrumented executable is not compatible with the DARSHAN I/O characterization tool. Disable the DARSHAN environment if any perftools instrumentation module is loaded. It should remain disabled during the execution of the instrumented executable.

Processing the DWARF debugging records for some large programs may cause pat_build to exceed the stack size limit. See the man page for the respective shell for the proper command to execute in order to increase the size of the stack.

If object files are not created during the build process or one or more of the object files that make up the program reside in a temporary location such as /tmp, pat_build copies the object file from the temporary directory into the directory $HOME/.craypat/program-name/PID-of-link. These copies are made at the time the original program is created. The pat_build utility uses the copies in this location in place of the versions in the original temporary location to create the instrumented executable. Users are responsible for managing the contents of this location in order to ensure that enough free space is available to save copies of object files, and to ensure that the object files are retained long enough to perform repeated pat_build commands on the executable program for which the copies are required. Use the environment variable PAT_LD_OBJECT_TMPDIR to change the location of this directory.

Note: To disable copying of object files, set the environment variable PAT_LD_OBJECT_TMPDIR to 0 (zero).

If the directory in which the link command to create program took place was a temporary location, pat_build will fail if that location no longer exists and program will not be instrumented.

If a symbol selected for tracing is defined in the text segment of the program’s executable file, any references to the symbol from a shared-object file cannot be traced.

For functions that do not have a predefined trace intercept routine, a valid DWARF Debug Information Entry for the function must be present in the original program.

If a trace intercept routine is created for a function, trace records are produced only if the function is executed during runtime.

Because of the way the x86 ABI specifies how aggregate data structures can be passed as formal parameters, such structures that are 16 bytes or less in size may cause the traced function using such parameters to fail when the function is executed. If this occurs, re-instrument the program, using the -T ! syntax with those functions to prevent the functions from being traced.

Tracing of functions that return a complex type is supported only when the Cray programming environment (PrgEnv-cray) is loaded and the CCE compilers are being used. This restriction applies either when the functions are traced using a function group trace intercept routine (the -g options) or when defined by the user and specified for tracing using the -u, -t, or -T options. If using another programming environment/compiler, use the -T ! syntax to prevent those functions that return a complex type from being traced.

The Fortran language “alternate return” subroutine construct is not supported for traced functions. If a function supporting this construct is selected for tracing, unexpected runtime behavior including incorrect execution of the user’s code will result. Use the -T ! syntax to prevent such functions from being traced.

All relocatable files and libraries used to create the original program must be available in the same directory that they were in at the time the original program was created. See the environment variable PAT_BUILD_LINK_DIR description for more information.

There is no support in pat_build for instrumenting programs written in Python.

There is no support in pat_build for instrumenting programs written in the Chapel parallel programming language.

Only non-static functions at global scope and written in C, C++, Charm++, or Fortran are eligible for tracing. In C and C++, this is under the programmer’s control, through use of the static keyword. In other languages, a compiler may choose to use static functions in some cases. For example, Fortran internal procedures are currently implemented as static functions by at least one compiler, and so can be traced only by using compiler-inserted function tracing hooks.

Programs that directly or indirectly reference semaphore operations (semop(2)) or signals (signal(2), sigaction(2)) at runtime may interfere with sampling-type experiments. This may cause invalid data collection or unexpected runtime behavior.

See the intro_craypat(1) man page for runtime conditions and exceptions.

ENVIRONMENT VARIABLES

CRAYPAT_OMP_TOOL

Enable CrayPat’s OpenMP Tool implementation. See pat_help OpenMP help for more details.

PAT_BUILD_CLEANUP

By default or if set to a nonzero value, the intermediate directory is removed. Set to zero to retain the directory after pat_build completes.

PAT_BUILD_EMBED_RTENV

Specifies one or more comma-separated CrayPat runtime environment variables to embed in the instrumented executable. A substring that matches any part of a runtime environment variable is allowed.

The CrayPat runtime environment variables must be set at the time the instrumented executable is created. By default, all CrayPat runtime environment variables that are set at the time the instrumented executable is created are embedded in the instrumented executable. If PAT_BUILD_EMBED_RTENV is set to zero, no CrayPat runtime environment variables are embedded. For more information, see the description of the rtenv build directive, elsewhere in this man page.

PAT_BUILD_LINK_DIR

Specify an alternate directory in which the original program was linked. All relocatable objects and libraries are located relative to the link directory.

PAT_BUILD_LIBUNWIND_DSO

Specifies an alternate libunwind shared object to use. Effective when PAT_RT_CALLSTACK_MODE is unset or set to unwind.

PAT_BUILD_OPTIONS

Specifies the pat_build options that are evaluated before any options on the command line.

PAT_BUILD_REPLACE_INVALID_CHARS

Specifies characters that appear in a user-defined function be considered invalid and replaced with an underscore character. If not set or set to 0 characters !#%&.@*/+-=% are not replaced and the selected function cannot be traced. If set to 1 the characters are replaced and the function is traced. Otherwise, specifies the list of characters to be considered invalid and are replaced with an underscore character if any appear in the function name.

Default: Characters considered invalid are not replaced.

PAT_BUILD_USER_OK

By default, functions selected for tracing that do not exist in one of the trace function groups are candidates for interception if the function is in a source file owned by the user executing the pat_build utility. To allow other functions to be candidates for tracing, specify a comma-separated list of strings which represent all or part of the directory or source path of the file(s) in which these functions are defined.

PAT_BUILD_VERBOSE

Specifies the detail level of the progress messages related to the instrumentation process. This value corresponds to the number of -v options specified. The messages are written to stderr or to the optionally specified file.

When the perftools-base module is loaded, the following environment variables affect linking when program is created.

PAT_LD_AS

Specifies the full file path to an assembler overriding the default.

PAT_LD_BLACKLIST

Specifies a comma-separated list of strings that represent all or part of an executable file name whose preparation for instrumentation is not done. The resulting executable file cannot be instrumented with pat_build.

PAT_LD_LD

Specifies the full file path to a linker overriding the default.

PAT_LD_OBJECT_TMPDIR

Allows the user to change the location of the directory where CrayPat copies of object files that are in a /tmp directory. When set, pat_build writes copies of object files into the $PAT_LD_OBJECT_TMPDIR/.craypat/program-name/PID-of-link directory. The default value for PAT_LD_OBJECT_TMPDIR is $HOME.

To disable copying of object files, set this environment variable to 0 (zero).

PAT_LD_VERBOSE

If set messages indicating linking progress are written to standard error.

PAT_LD_WHITELIST

Specifies a comma-separated list of strings that represent all or part of a specific executable file name whose preparation for instrumentation is done. If the executable file name does not match any strings in the list the resulting executable file cannot be instrumented with pat_build. This environment variable takes precedence over the environment variables CRAYPAT_LITE_BLACKLIST and PAT_LD_BLACKLIST.

When a perftools-lite module is loaded, the following environment variables affect the instrumented executable created.

CRAYPAT_LITE_BLACKLIST

Specifies a comma-separated list of strings that represent all or part of an executable file name against which instrumentation is not applied. This environment variable applies only when the perftools-lite module is loaded. Use this environment variable in situations where the perftools-lite module is loaded and executable files created during and executed as part of an invocation of make should be excluded from instrumentation.

CRAYPAT_LITE_WHITELIST

Specifies a comma-separated list of strings that represent all or part of a specific executable file name for which instrumentation is done. This environment variable applies only when the perftools-lite module is loaded. Use this environment variable in situations where the perftools-lite module is loaded and specific executable files created during and executed as part of an invocation of make or configure are only instrumented. This environment variable takes precedence over the environment variables CRAYPAT_LITE_BLACKLIST and PAT_LD_BLACKLIST.

EXAMPLES

The following subsections discuss how to instrument and execute a program.

Instrumenting a Program

The following examples show how to use the pat_build component to instrument the program a.out and put the resulting instrumented executable in the file inst.out.

To instrument a.out to trace all MPI function calls, enter this command line:

$ pat_build -g mpi a.out inst.out

Note: If pat_build returns an error message similar to the following example, use the -w option to create new trace intercept routines for those functions for which no trace intercept routine already exists:

The function 'fn_name' cannot be traced because it does
not have a predefined trace intercept routine.

Running a Program

Runtime environment variables enable the user to control various parameters that affect the execution of the instrumented executable. (For a list of runtime environment variables, see the intro_craypat(1) man page.)

For example, if inst.out is instrumented as a synchronous experiment, the following commands select three hardware performance counters to monitor and execute the inst.out file:

$ export PAT_RT_PERFCTR=PAPI_TOT_CYC,PAPI_TOT_IN,PAPI_BR_UCN
$ srun inst.out

Note: The command used to execute a program varies depending on the Workload Manager available on the system.

In the above example, the hardware performance counter runtime environment variables configure the program to monitor the following three counter events:

  • Total cycles

  • Instructions completed

  • Unconditional branches

For more information about the CrayPat runtime environment variables, see the intro_craypat(1) man page.

Instrumenting Programs Using Compiler Options

As an alternative to using the -u option to trace functions through the pat_build utility’s generated trace intercept mechanism, most compilers support an option to effect insertion of function tracing hooks at compile-time. Note that code instrumented in this way can be used for trace experiments only, not for sampling experiments. Code instrumented in this way can inject a large overhead at runtime if the function is called a substantial number of times. This may give the impression that the program is hung or executing more slowly than expected. Use caution when applying this technique to an entire source file.

The CCE Fortran compiler supports the -h func_trace option that performs compiler instrumentation:

$ module load PrgEnv-cray
$ ftn -h func_trace -c pgm.f90
$ ftn -o pgm pgm.o

Note: The CCE Fortran compiler, version 12 and version 13, do not support tracing of functions with the -T, -t, and -u options, nor with the trace directive. To successfully trace user-defined functions the CCE Fortran compiler option -h func_trace must be specified to enable tracing for functions defined in Fortran source files. Use the pat_build -w option when instrumenting to activate the compiler-inserted trace intercepts.

The CCE C/C++, GNU, Intel, Allinea, and AMD compilers support the use of the -finstrument-functions compile-time option to perform compiler instrumentation of functions. The GNU command looks like the following example.

$ module load PrgEnv-gnu
$ cc -finstrument-functions -c pgm.c
$ cc -o pgm pgm.o

The Nvidia compilers support the use of the -Minstrument compile-time option to support compiler instrumentation.

In each case, the perftools-base module and a perftools instrumentation module (and compute-node target module, if used) must be loaded prior to the link step, and it is recommended that they be loaded prior to compilation. These options cause the compilers to insert calls to hooks at the entry and return points of each function defined in pgm.c. The CrayPat runtime library resolves these hooks and records trace data. Call stack and hardware performance counter (if chosen) information is recorded, but formal function argument and return values are not supported by this form of instrumentation. Data is recorded for calls that were inlined.

To instrument a program that contains compiler-generated function tracing hooks, follow this example:

$ pat_build -w pgm pgm+pat

Here the -w serves only to enable the hook instrumentation, and it is not possible to specify that trace intercept routines be generated for user-defined functions using the -u, -t, or -T options.

For example, to trace MPI functions and all functions that contain compiler hooks, follow this example:

$ pat_build -g mpi pgm pgm+pat

The runtime environment variable PAT_RT_TRACE_HOOKS controls whether data is recorded for those functions containing compiler hooks. If PAT_RT_TRACE_HOOKS is unset, or is set to 1, the data is recorded. If PAT_RT_TRACE_HOOKS is set to zero, the data is not recorded.

Profile-Guided Optimization

Statistics for loops in an application can be gathered by using the CCE Fortran compiler option -h profile_generate when compiling the source, or the CCE C/C++ compiler option -finstrument-loops when compiling the source and linking the program, and then using pat_build to instrument it for tracing. If either one of these options is specified, all OpenMP constructs and optimizations are disabled.

Using Automatic Profiling Analysis

For programs that run for only a few seconds, there is no problem with using pat_build with the -u and -g mpi options to trace all user functions. However with a large, long-running program such a trace will inject considerable overhead. It is better to limit tracing to those functions that consume the most time. You can use a preliminary sampling experiment to determine and instrument those functions with the following steps, referred to as automatic profiling analysis.

Procedure 1. Using Automatic Profiling Analysis

Step 1. Instrument the original program.

$ pat_build -O apa my_program

This produces the instrumented executable my_program+pat.

Step 2. Run the instrumented executable.

$ srun my_program+pat

This produces an experiment data directory named my_program+pat+PID-node-t, which contains basic asynchronously derived program profiling data.

Step 3. Use pat_report to process the experiment data directory.

$ pat_report my_program+pat+PID-node-t

This produces three results:

  • a sampling-based text report to stdout

  • a subdirectory named ap2-files, containing an .ap2 file (my_program+pat+PID-node-t.ap2), which contains both the report data and the associated mapping from addresses to functions and source line numbers

  • in the experiment data directory, an .apa file (build-options.apa), which contains the pat_build arguments recommended for further performance analysis

Note: If an .apa file is not produced the text report will state a reason, such as:

No .apa file with pat_build option suggestions was generated
because no samples appear to have been taken in USER functions.

This can happen, for example, when trying out these steps with a small program that runs to completion before the first sample is taken.

Step 4. Re-instrument the program, this time using the .apa file.

$ pat_build -O experiment_data_directory/build-options.apa

It is not necessary to specify the program name, as this is specified in the .apa file. Invoking this command produces the new executable, my_program+apa, this time instrumented for enhanced tracing analysis.

Note: Remember, when re-instrumenting a program, the -g !trace-group argument will suppress the tracing of any group that might be part of the APA-generated experiment.

Step 5. Run the new instrumented executable.

$ srun my_program+apa

This produces a new experiment data directory, my_program+pat+PID2-node-t, which contains expanded information tracing the most significant functions in the program.

Step 6. Use pat_report to process the new data file.

$ pat_report new_experiment_data_directory

This produces two results.

  • a tracing report to stdout

  • a new ap2-files subdirectory, containing an .ap2 file (my_program+apa+PID2-node-t.ap2), which contains both the report data and the associated mapping from addresses to functions and source line numbers

Using CrayPat with Charm++

Charm++ version 6.7.1 and later is supported. Additional changes may be required beyond this version to optimally build the Charm++ components using the CCE, Intel, and GNU compilers.

Charm++ may be built using either the Intel compiler, version 16.0.3.210 or later, or GNU, version 6.1.0 or later. The following builds are supported:

Intel

./build charm++ gni-crayxc smp -optimize
./build charm++ mpi-crayxc smp -optimize

GNU

./build charm++ mpi-crayxc smp -optimize

Do not build Charm++ with the module perftools loaded or use any compiler options that produce debug information, such as -g.

When building Charm++ use the compiler option that does not omit the frame pointer. When using CCE, Intel or GNU this is the -f no-omit-frame-pointer option. This option ensures all call stack tracing can be completed in its entirety by the instrumented executable at runtime.

The Charm++ compiler charmc option -save must be specified so that pat_build can properly instrument the resulting Charm++ executable file.

Because of extensive use of small aggregate data structures in the Charm++ and Converse header files, the pat_build -r option will issue a large number of messages regarding not being able to generate trace intercept routine for some user functions.

Use the pat_report -s aggr_th=avg to ensure performance data from all threads is displayed.

For more information see https://charm.cs.illinois.edu/help

FILES

$HOME/.craypatrc

Contains CrayPat runtime environment variables and provides configuration for all instrumented executables executed by the user.

$CRAYPAT_ROOT/share

The directory containing subdirectories of reference files, including predefined trace groups, performance counters, versioning, and other information.

a.out+pat+PID-nodes|t

Depending on the nature of the program and the environmental conditions in effect at the time of program execution, the instrumented executable, when executed, generates a experiment_data_directory with this name, where:

a.out

is the name of the original program

PID

is the process ID assigned to the instrumented executable at runtime

node

is the physical node ID upon which the rank zero process was executed

s|t

is a one-letter code indicating the type of experiment performed, either s for sampling or t for tracing

By default, the experiment data directory is created under the current working directory, but this location can be changed by setting the environment variable PAT_RT_EXPDIR_BASE.

Performance data files associated with this executable run are stored in this experiment data directory and include:

xf-files

A subdirectory containing one or more .xf files generated during the run. To save disk space, this subdirectory may be deleted once the ap2-files directory has been generated.

ap2-files

A subdirectory containing one or more .ap2 files automatically generated during the first invocation of pat_report invocation on the experiment data directory. The .ap2 files in the directory contain all the information from the original .xf files, but in the more portable Cray Apprentice2 format. This subdirectory can also be created without using pat_report, by running an executable instrumented for a Perftools-lite experiment.

Note: The most significant difference between .xf and .ap2 format is that .xf files require the original instrumented executable and dynamic libraries to be available to provide mapping from addresses to function names and source line numbers, while .ap2 files incorporate this data mapping and are self-contained. Therefore the .ap2 format is recommended to preserve the data for future reference. By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

rpt-files

A subdirectory containing one or more text report files generated during the run or by pat_report.

html-files

A subdirectory containing one or more reports in HTML format, which are produced by using pat_report with the -f html option. These files can be opened with any web browser, or opened from the command line on Macintosh or Linux systems by using the open filename.html or xdg-open filename.html commands, respectively. By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

plot-files

A subdirectory containing one or more reports in gnuplot format. These files are created by running an instrumented executable with PAT_RT_SUMMARY set to 0 and PAT_RT_SAMPLING_DATA set to a supported value (e.g., cray_pm or cray_rapl), and then using pat_report with the -f plot option. The resulting files can be viewed either by invoking pat_report on the experiment data directory or using gnuplot. By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

index.ap2

An index data file created as a map to the data within the ap2-files directory.

build-options.apa

File containing recommended parameters for re-instrumenting the program for more detailed performance analysis. This is generated by running an executable instrumented for Automatic Profiling Analysis (pat_build -O apa) and then running pat_report on the resulting experiment data directory.

MPICH_RANK_ORDER*

One or more files containing options for rerunning MPI applications with optimized rank orders. This file is generated either manually, using the grid_order utility, or automatically, by running a performance analysis experiment using Perftools-lite.

SEE ALSO

intro_craypat(1), pat_build(1), pat_opts(1), pat_help(1), pat_report(1), pat_run(1), grid_order(1), reveal(1)

nm(1), readelf(1)

intro_mpi(3), regexec(3)

perftools-base(4), perftools-lite(4), perftools-preload(4)

accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)