pat_run

Author:

Hewlett Packard Enterprise Development LP.

Copyright:

Copyright 2019-2024 Hewlett Packard Enterprise Development LP.

Manual section:

1

NAME

pat_run - launch a program instrumented for performance analysis

SYNOPSIS

[ wlm-launcher [args-for-launcher]] pat_run [-c] [-e events-file] [-E events] [-f] [-g group-names] [-m mode-name] [-l | -n] [-p dso-file] [-P] [-r] [-s sampling-specs] [-S] [-t] [-v] [-V] [-w] [-z params] [] program [args-for-program]

DESCRIPTION

The pat_run utility combines many of the instrumentation features of the pat_build utility with the environment variable LD_PRELOAD to execute the program. The program collects the performance data and produces the same experiment data directory structure as a program independently instrumented with pat_build.

The wlm-launcher is the Workload Manager job and application scheduler. This utility is used to initiate an application launch onto the compute nodes. See aprun(1), srun(1), and mpiexec(1).

A program instrumented with pat_build has more capability and has greater flexibility in data collection than does using pat_run on the program, although many of the instrumentation features supported by pat_build are supported by pat_run. In fact, some programs that pat_build cannot instrument may be instrumented with pat_run. The table below summarizes the major differences between the two approaches.

Presently a workload manager launching command is required to use pat_run. If the launching command is missing or invalid, pat_run will fail or the program will not execute correctly. Later releases of pat_run will lift this restriction.

To maximize access to the runtime performance data collection and recording, load the perftools-preload module before compiling and linking the program. Programs that have not been linked with perftools or perftools-lite modules can also be executed using the pat_run utility, though these programs will not have full access to the instrumentation features. The -z option exists to pass user-collectible parameters which provide pat_run with information to maximize its use of instrumentation features.

A program created without the perftools modules loaded can use pat_run. However, only a subset of instrumentation features will be available to the program. See the section Programs Created Without perftools Modules Loaded below.

The perftools-preload module is not compatible with perftools or perftools-lite modules. Programs linked with the perftools-lite module loaded cannot use pat_run. A program that is the result of instrumentation by pat_build cannot use pat_run.

---------------------------------------------------------------
  pat_build                      pat_run
---------------------------------------------------------------
- functions defined in         - use the appropriate compiler
  user-owned files can be        option to instrument functions
  individually selected for      in user-owned source files
  tracing

- functions belonging to a     - all functions in a trace group
  trace group can be             are traced: selecting an
  individually selected for      individual one selects all
  tracing

- some lite modes trace        - functions in user-owned
  functions in user-owned        source files are only traced
  source files                   in lite modes if instrumented
                                 using compiler options

- python experiments are not   - support for python experiments
  supported                      are in beta
---------------------------------------------------------------

Some of the advantages of using pat_run compared to instrumenting with pat_build and executing the instrumented executable include:

  • no additional executable file is created

  • different performance analysis data may be collected for the same executable file

  • no dependency on perftools modules loaded when executable file linked

  • performance analysis on executable files stripped on their global symbol tables

  • a reduction in steps to acquire performance data

  • accepting ISV codes (where prebuilt binaries exist instead of source)

  • avoiding long compiling time of source code

  • no recompilation or relink required to obtain performance data

  • beta support for python experiments

Some of the advantages of using pat_build and executing the instrumented executable compared to using pat_run include:

  • a more thorough analysis of programming models represented in the executable file

  • more control over the criteria in selecting functions to trace including by size and by name

  • tracing individual functions that reside in trace groups

  • creating a stand-alone executable file uniquely instrumented

  • less startup overhead when launched by the Workload Manager

Options

pat_run supports the following options and arguments:

-c

Ignore all CrayPat runtime environment variables set in the execution environment before executing the program.

-e events-file

File containing list of performance counter specifications (see PAT_RT_PERFCTR_FILE in intro_craypat(1)).

-E events

Performance counter specification (see PAT_RT_PERFCTR in intro_craypat(1)).

-f

Force the instrumentation and execution of the program even if underlying conditions prohibit it.

-g group-names

Trace all functions that belong to group-names (see -g option in pat_build(1)).

-l

Populate LD_PRELOAD but instead of launching evaluate the program with ldd.

-m mode-name

Instrumentation mode (see -O apa option in pat_build(1) and perftools-lite(4)). mode-name can be apa, lite-events, lite-gpu, lite-hbm, lite-loops, or lite-samples (default: apa).

-n

Populate LD_PRELOAD but do not launch the program.

-p dso-file

Populate LD_PRELOAD with the list of shared objects contained in the ASCII file dso-file. The shared object file names are listed one-per-line. The shared object files are assumed to have read and execute permission for the user and be accessible from the execution directory.

-P

Use the value of LD_PRELOAD set in the execution environment instead of pat_run populating it.

-r

Generate a report upon successful execution (see PAT_RT_REPORT_CMD in intro_craypat(1)).

-s sampling-spec

A sampling experiment is performed on the program with the given specifications. One or more sampling-specs separated with a comma (,) must be listed:

raw | bubble

Capture the raw or bubble address

pc | cs

Capture the program address or the entire callstack

time | ovfl

Trigger a sampling interrupt using a timer interval or a performance counter overflow

See the description for PAT_RT_EXPERIMENT in intro_craypat(1) for details. The default is raw,pc,time, which is a samp_pc_time experiment.

-S

A sample-program-counter-by-time experiment is performed on the program.

-t

Collect performance data in full-trace mode (see PAT_RT_SUMMARY in intro_craypat(1)).

-v

Print progress messages to stderr.

-V

Print version number to stderr.

-w

A tracing experiment is performed on the program. This option is required to enable compiler-inserted trace points.

-z params

Parameters that extend the feature base of pat_run. Valid keywords include:

bind=y|n

Resolve all symbols at the program’s startup instead of deferring resolution to the point when the symbols are first referenced.

default: n

debug=categories

Set the environment variable LD_DEBUG for runtime linking information about the program from the dynamic linker. See ld.so(8) for more details.

env=file

Write the value of LD_PRELOAD to the file.

excl=y|n

Only instrument the program and exclude instrumenting any executable programs directly or indirectly launched by program.

default: y

hooks=y|n

Enable all compiler-inserted trace points for data collection.

default: y

mpi=y|n

Load the MPI shared object.

default: y

ovhd=y|n

Load the runtime overhead measurement shared object.

default: y for tracing experiments

pm=programming-models

Define the programming models used by the program (default: determined at runtime, although recommended if runtime behavior does not match expected behavior). This is a hexadecimal value attained by executing $CRAYPAT_ROOT/sbin/getpm on the program.

samp-depth=depth

Specify the depth of the call stack trace for a sampling experiment when the -s bubble option is specified. See the description for PAT_RT_SAMPLING_MODE in intro_craypat(1) for details.

default: 2

sigs=y|n

Enable signal handling in the CrayPat runtime library.

default: y

program

Dynamically-linked program launched for performance analysis.

Instrumentation Modes

To take full advantage of performance analysis for other instrumentation modes, additional compile-time and link-time options are required when creating the program. The perftools-base and perftools-preload modules must be loaded to properly prepare the program for pat_run.

lite-events

Add the option that instruments function entry and return to each compilation step

lite-loops

(CCE Fortran only)

Add the following arguments to each compilation step:

-h profile_generate

(CCE C/C++ only)

Add the following arguments to each compilation step and the link step:

-finstrument-loops

See the -O apa option in pat_build(1) and perftools-lite(4) for more information.

If the -e or -E options are specified along with the -m option, the performance counters indicated by the -e or -E options take precedence over any associated wit the -m option. If the -g option is specified along with the -m option, the trace groups indicated by the -g option take precedence over any associated with the -m option.

Programs Created Without perftools Modules Loaded

If the program was created without the perftools-base and perftools-preload modules loaded, minimal symbol table information is present in the ELF symbol sections of the program. Sampled addresses, function entry points and callstack tracebacks are compromised and cannot be assigned any entry point identification. Labels such as LO_MEMORY may appear in report tables.

Minimal symbol table information is present if the program was processed by the strip utility, or if any of the ld strip options were used when the program was created.

To verify that the minimum symbol table information that provides address-to-symbol mapping is present, execute the following commands:

nm -g –defined-only program

nm -gD –defined-only program

This identifies those function entry points (marked with the letter T) for which address-to-symbol mapping is supported.

Programs Do Not Execute Correctly

By default, the CrayPat runtime determines what programming models are present. The result may not accurately or optimally reflect what programming models are present, and may cause the program to not execute correctly or as expected. Perform the following procedure to more accurately describe the characteristics of the program:

First, load the perftools-base module, then execute the getpm utility to acquire the programming models present in the program:

$CRAYPAT_ROOT/sbin/getpm program

Use the hexadecimal value indicated in the -z option pm parameter when pat_run executes the program:

launcher pat_run -z pm=valueprogram

Details on the programming models and the compiler-inserted trace points used by the program are printed to standard error if the -Wl,-v option is specified at the time the program is created.

Python Experiments (BETA)

pat_run supports Python tracing and sampling experiments. See pat_python(1) for details.

WARNINGS

When used with multiple-program multiple-data invocations, pat_run must be executed with identical options. Set the environment variable PAT_RUN_OPTIONS and execute pat_run with no options on its command line.

The pat_run utility uses the directory $HOME/.craypat to manage some persistent state. Users are responsible for managing the contents of this location in order to ensure that enough free space is available.

The pat_run utility provides no facility to intercept or trace user-defined function entry points. Source code may be recompiled using the compiler option that instruments function entry points. The pat_run utility will recognize these compiler-inserted trace points and resolve them resulting in those functions being traced at runtime. See pat_build(1), section Instrumenting Programs Using Compiler Options.

If the program launched by the pat_run utility was not built with the perftools-preload module loaded, it may produce incomplete caller/calltree information.

Selecting -g heap for programs that use the UPC or CAF programming model with OpenFabric Interfaces will result in unexpected behavior.

Selecting -g heap for programs on RedHat systems can cause the program to hang or otherwise not complete executing when the program is scheduled for execution across multiple compute nodes.

An instrumented executable is not compatible with the DARSHAN I/O characterization tool. Disable the DARSHAN environment if any perftools instrumentation module is loaded. It should remain disabled during the execution of the instrumented executable.

Performance data is not collected for any child processes created during the execution of the program.

The perftools-preload module must be loaded if the program was created using any compiler options that generate instrumentation. For example, CCE Fortran compiler’s -h profile_generate, -h omp_trace, or -h func_trace options, or CCE C/C++ compiler’s -finstrument-loops, -finstrument-functions, or -finstrument-openmp options, PGI’s and Nvidia’s -Minstrument and Allinea’s, AMD’s, GNU’s, and Intel’s -finstrument-functions option. The module must also be loaded if any CrayPat API function appears in the source code.

All compiler-inserted trace points are enabled when the -w option is specified. To control the volume of recorded data, set the -z hooks parameter to 0 and the environment variable PAT_RT_TRACE_HOOKS to enable or disable selected trace point types.

Programs created using the perftools-preload module cannot be instrumented with the pat_build utility.

In general the LD_LIBRARY_PATH should be set to ensure CrayPat dynamic shared objects can be found when pat_run launches the program or when using the ldd utility on the program:

LD_LIBRARY_PATH=$CRAYPAT_ROOT/lib64:$LD_LIBRARY_PATH

To trace functions in user-defined archive (.a) files or shared object (.so) files, generate the object files using the appropriate compiler option that instruments function entry points.

ENVIRONMENT VARIABLES

CRAYPAT_OMP_TOOL

Enable CrayPat’s OpenMP Tool implementation. See pat_help OpenMP help for more details.

PAT_RUN_INSTR_MODE

Specifies the default instrumentation mode (default: apa, see the -m option).

PAT_RUN_LIBUNWIND_DSO

Specifies an alternate libunwind shared object to use. Only effective when PAT_RT_CALLSTACK_MODE is set to unwind.

PAT_RUN_LOCK_DIR

Specifies the directory used for process lock management (default: $HOME/.craypat).

PAT_RUN_OPTIONS

Specifies the pat_run options that are evaluated before any options on the command line.

PAT_RUN_PYTHONPATH

By default, pat_run exposes CrayPat API calls by appending $CRAYPAT_ROOT/libexec64 to *PYTHONPATH. If set to 0, PYTHONPATH is not modified. See *man pat_api(1) for more details.

PAT_RUN_RT_DSOS

A comma-separated list of DSO file names to add to the environment variable LD_PRELOAD constructed by pat_run. If the name does not start with the slash character, the file is found via library search directories. See ld.so(8) for more details.

PAT_RUN_VERBOSE

Print progress messages to stderr.

In addition, the CrayPat runtime environment variables are recognized but those required to comply to the pat_run invocation will have their values reset. Specify the pat_run -v option to see the state of the environment variables when the program is launched by pat_run.

EXAMPLES

The following examples show how to execute a program to collect performance data and produce a experiment data directory.

To execute program pgm using the PBS Workload Manager and collect Automatic Profiling Analysis performance data, enter this command line:

$ aprun aprun-options pat_run pgm

If pgm was the result of using the CCE Fortran compiler with the option -h func_trace, data reflecting user-defined functions will also be collected.

To execute the MPI program pgm using the SLURM Workload Manager and collect performance counter information for the performance counter group default, while also collecting data on MPI and IO activity, enter this command line:

$ srun srun-options -n 8 pat_run -E default -g mpi,io pgm

User-defined functions are traced if pgm was compiled with options that facilitate the collection of user-defined functions.

To execute program pgm using the PBS Workload Manager and collect callstack data sampled by time, enter this command line:

$ aprun aprun-options pat_run -s cs pgm

To execute program pgm that was compiled with the CCE Fortran compiler -h func_trace option using the SLURM Workload Manager and collect data representing the whole program, enter this command line:

$ srun srun-options pat_run -w pgm

FILES

$HOME/.craypatrc

Contains CrayPat runtime environment variables and provides configuration for all instrumented executables executed by the user.

$CRAYPAT_ROOT/share

The directory containing subdirectories of reference files, including predefined trace groups, performance counters, versioning, and other information.

a.out+pat+PID-nodes|t

Depending on the nature of the program and the environmental conditions in effect at the time of program execution, the instrumented executable, when executed, generates a experiment_data_directory with this name, where:

a.out

is the name of the original program

PID

is the process ID assigned to the instrumented executable at runtime

node

is the physical node ID upon which the rank zero process was executed

s|t

is a one-letter code indicating the type of experiment performed, either s for sampling or t for tracing

By default, the experiment data directory is created under the current working directory, but this location can be changed by setting the environment variable PAT_RT_EXPDIR_BASE.

Performance data files associated with this executable run are stored in this experiment data directory and include:

xf-files

A subdirectory containing one or more .xf files generated during the run. To save disk space, this subdirectory may be deleted once the ap2-files directory has been generated.

ap2-files

A subdirectory containing one or more .ap2 files, which contain all the information from the original .xf files, but in the more portable Cray Apprentice2 format. This subdirectory is created automatically by an executable instrumented for a Perftools-lite experiment, or otherwise by the first invocation of pat_report on the experiment data directory.

Note: The most significant difference between .xf and .ap2 format is that .xf files require the executable file and dynamic libraries to be available to provide mapping from addresses to function names and source line numbers, while .ap2 files incorporate this data mapping and are self-contained. Therefore the .ap2 format is recommended if you wish to preserve the data for future reference.By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

rpt-files

A subdirectory containing one or more text report files generated during the run or by pat_report.

html-files

A subdirectory containing one or more reports in HTML format, which are produced by using pat_report with the -f html option. These files can be opened with any web browser, or opened from the command line on Macintosh or Linux systems by using the open filename.html or xdg-open filename.html commands, respectively.

By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

plot-files

A subdirectory containing one or more reports in gnuplot format. These files are created by running an instrumented executable with PAT_RT_SUMMARY set to 0 and PAT_RT_SAMPLING_DATA set to a supported value (e.g., cray_pm or cray_rapl), and then using pat_report with the -f plot option. The resulting files can be viewed either by invoking pat_report on the experiment data directory or using gnuplot.

By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.

index.ap2

An index data file created as a map to the data within the ap2-files directory.

build-options.apa

File containing recommended parameters for re-instrumenting the program for more detailed performance analysis. This is generated by running an executable instrumented for Automatic Profiling Analysis (pat_build -O apa) and then running pat_report on the resulting experiment data directory.

MPICH_RANK_ORDER*

One or more files containing options for rerunning MPI applications with optimized rank orders. This file is generated either manually, using the grid_order utility, or automatically, by running a performance analysis experiment using Perftools-lite.

SEE ALSO

intro_craypat(1), pat_build(1), pat_opts(1), pat_help(1), pat_python(1), pat_report(1), pat_run(1), grid_order(1), reveal(1)

aprun(1), ld(1), ldd(1), mpiexec(1), nm(1), srun(1), strip(1)

perftools-base(4), perftools-lite(4), perftools-preload(4)

accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)

ld.so(8)