pat_report
Output performance reports from experiments created using pat_build or pat_run
- Author:
Hewlett Packard Enterprise Development LP.
- Copyright:
Copyright 2019-2024 Hewlett Packard Enterprise Development LP.
- Manual section:
1
SYNOPSIS
pat_report [-V] [-i instrprog] [-o output-file] [-v] [-O option-file] [-Q number-of-files] [-C ‘table-caption’] [-d d-opts] [-b b-opts] [-s key=value] [-G group=value] [-g path-to-gnuplot] [-H] [-P] [-T] [-z] data-directory.plot | experiment-data-directory | data-file.ap2
pat_report [-i instrprog] [-o output-file] -f ap2 experiment-data-directory | data-file.xf
pat_report [-i instrprog] [-o output-file] -f otf2 experiment-data-directory
pat_report [-V] [-i instrprog] [-o output-file] [-r] [-v] -f rpt|html|plot experiment-data-directory | data-file.ap2
pat_report [-O [option-file]|-b|-d|-s] -h
DESCRIPTION
The pat_report utility executes the text reporting portion of the CrayPat performance analysis tool.
After using pat_build to instrument a program, you may set environment variables and run the instrumented executable. You may alternately run the program using pat_run. Depending on the experiment chosen and the environment variables set, this creates an experiment-data-directory containing one or more data files that contain the data that was captured during program execution. The experiment-data-directory has a generated name based on the instrumented executable name. For a detailed description of generated names, see the FILES section of this man page.
Typically pat_report is used to generate one or more text reports from an entire data directory, but other choices are discussed below.
If an experiment data directory does not already contain an ap2-files subdirectory, one will typically be created the first time you invoke pat_report on the data directory. The ap2-files subdirectory contains one or more .ap2 files, each containing data from one or more .xf files. Data in .ap2 format is required for producing text reports using pat_report or for viewing using GUI tools in Cray Apprentice2. Once the ap2-files subdirectory has been successfully created, the xf-files subdirectory and its contents are no longer needed.
The most significant difference between .xf and .ap2 format is that .xf files require the original instrumented executable and dynamic libraries to be available to provide mapping from addresses to function names and source line numbers, while .ap2 files incorporate this data mapping and are self-contained. Therefore the .ap2 format is recommended if you wish to preserve the data for future reference.
The user has some choice in determining how many .ap2 files will be generated from the existing .xf files. The number of .ap2 files is determined by the environment variable PAT_AP2_FILE_MAX. This environment variable has a range of 1 - 256, with the default being 256. When this environment variable is greater than 1, the .xf to .ap2 conversion can be done in parallel. Depending on the number of .ap2 and .xf files, generating them in parallel can greatly reduce the conversion processing time.
The user also has some choice in determining how many .ap2 files will be used to generate a report. By default, all available .ap2 files are used, but a smaller number can be specified with the -Q n option. Typically this option is needed only when using all the files causes processing to run out of memory, or to take an inconveniently long time. If n is zero, then .xf to .ap2 conversion is done, if necessary, but no report is generated. If n is positive, then at most n .ap2 files are used, chosen to be spaced evenly in the lexicographically ordered list of all .ap2 files.
Automatic Profiling Analysis: If an executable was instrumented using the pat_build -O apa option (default when no pat_build options are specified), or if the executable was given to pat_run -m apa option, running pat_report also produces an .apa file, which contains the recommended parameters for re-instrumenting the program for more detailed performance analysis. For more information about Automatic Profiling Analysis, see the pat_build man page.
Processing Selected Data Files: If one or more .xf files are corrupted, or there are enough data files that generating a report takes too long, use pat_report to process a subset of the files. To produce a report from the data in a single .ap2 file in the ap2-files directory, invoke pat_report on the path to that file:
$ pat_report exp_data_dir/ap2-files/000000.ap2
To process sets of selected files, use the following method:
Create a new directory, e.g. data_subset, in the directory containing the original experiment data directory.
Create a subdirectory, either data_subset/xf-files or data_subset/ap2-files, depending on whether the .ap2 files of interest have already been created under the original experiment data directory.
Make symbolic links in the subdirectory created in step 2 to the .xf or .ap2 files of interest under the original experiment data directory.
Invoke pat_report on data_subset in the same way as on the original experiment data directory.
OPTIONS
The pat_report utility supports the following options and arguments:
-C ‘table-caption’
Use table-caption as a caption (title) for a table specified by an adjacent pair of -b and -d options.
-f rpt | ap2 | otf2 | html | plot
Format the data from one or more input data files and map the contained addresses to functions and line numbers.
The -f rpt option produces a text report, which is the default behavior if no -f option is used..
The ap2 option produces a compressed-format .ap2 data file that can be used as input to either pat_report or Cray Apprentice2.
Note that when the -f ap2 option is used, pat_report functions as a data export tool. The entire data file is converted to .ap2 format and placed in an ap2-files directory within the experiment data directory. The pat_report filtering and formatting options are ignored.
The otf2 option provides another way to export data when runtime summarization was disabled (by setting PAT_RT_SUMMARY=0 or using the -t option of pat_run). It produces an Open Trace Format 2 archive under the experiment data directory with an anchor file named otf2-files.otf2. (Note that only the files under the xf-files directory are used as input, so the ap2-files directory and its contents are neither required nor created.) This option should be used on the same system where the data was collected. The information in the archive can be printed as text with the utility $CRAYPAT_ROOT/sbin/otf2-print.
The html option creates a directory named html-files within the experiment data directory, which contains all of the generated data files. The html-files directory can be redirected to a different directory by using the -o option. All intermediate directories in the path specified by the -o option must exist. If any one of them does not, the operation will fail and a message to that effect will be printed. HTML files generated by pat_report can be opened using any web browser, or opened from the command line on Macintosh or Linux systems by using the open filename.html or xdg-open filename.html commands, respectively.
The plot option can be used with a file that contains data collected by sampling with PAT_RT_SUMMARY set to zero and PAT_RT_SAMPLING_DATA set to one of its supported values: cray_pm, cray_rapl, etc. This option creates a directory named plot-files within the experiment data directory, which contains gnuplot command and data files. The plots can be viewed by invoking pat_report on the directory, or by invoking it on one or more of the .gp data files in the directory.
-g path-to-gnuplot
Specify a path to the gnuplot command and data files.
-i instrprog
Use this option to specify the directory path or the full path to the instrumented executable file. If this option is omitted, the path recorded in the data file is used.
-o output-file
Use this option to specify the path and name of the output file. If output-file is set to -, the output is directed to stdout. If the -f option is used, the default output file name has the same root as the first input file name and the suffix specified by the -f option.
-O option-file
This option is replaced by the options contained in the file with the specified name or path. The lines in the file are not evaluated by the shell, and each line should be empty or specify a single option, after trimming the character # and following characters, and then trimming any leading or trailing white space. A comma-separated list of option file names may be specified. The options from a file typically specify one or more report tables or observations. The names of the available predefined option files are:
accelerator
Show calltree of accelerator performance data sorted by host time.
accpc
Show accelerator performance counters.
acc_fu
Show accelerator performance data sorted by host time.
acc_time_fu
Show accelerator performance data sorted by accelerator time.
acc_time
Show calltree of accelerator performance data sorted by accelerator time.
acc_show_by_ct
(Deferred implementation) Show accelerator performance data sorted alphabetically.
affinity
Shows affinity bitmask for each node. Can use -s pe=ALL and -s th=ALL to see affinity for each process and thread, and use -s filter_input=expression to limit the number of PEs shown.
profile
Show data by function name only
callers (or ca)
Show function callers (bottom-up view)
calltree (or ct)
Show calltree (top-down view)
ca+src
Show line numbers in callers
ct+src
Show line numbers in calltree
hbm_ct
Show memory bandwidth data by object, sorted by sample count.
hbm_details
Show hbm data collection statistics, including counts of sampled addresses that could not be mapped to a registered object.
hbm_frees
Show program locations at which objects are freed by explicit calls to free or delete.
hbm_wt
Show memory bandwidth data by object, sorted by aggregate sample weight. The weight estimates the benefit of allocating the object in high bandwidth memory.
heap
Implies heap_program. heap_hiwater, and heap_leaks. Instrumented executables must be built using the pat_build -g heap option or executed with the pat_run -g heap option in order to show heap_hiwater and heap_leaks information.
heap_program
Compare heap usage at the start and end of the program, showing heap space used and free at the start, and unfreed space and fragmentation at the end.
heap_hiwater
If the pat_build -g heap option was used to instrument the program or the program was executed with the pat_run -g heap option, this report option shows the heap usage “high water” mark, the total number of allocations and frees, and the number and total size of objects allocated but not freed between the start and end of the program.
heap_leaks
If the pat_build -g heap option was used to instrument the program or the program was executed with the pat_run -g heap option, this report option shows the largest unfreed objects by call site of allocation and PE number.
himem
Memory high water mark by Numa Node. For nodes with multiple sockets the default report should also have a table showing high water usage by numa node. That table is not shown if all memory was mapped to numa node 0, but can be explicitly requested with pat_report -O himem.
acc_kern_stats
Show kernel-level statistics including average kernel grid size, average block size, and average amount of shared memory dynamically allocated for the kernel.
load_balance
Implies load_balance_program, load_balance_group, and load_balance_function. Show PEs with maximum, minimum, and median times.
load_balance_program, load_balance_group, load_balance_function
For the whole program, groups, or functions, respectively, show the imb_time (difference between maximum and average time across PEs) in seconds and the imb_time% (imb_time/max_time * NumPEs/(NumPEs - 1)). For example, an imbalance of 100% for a function means that only one PE spent time in that function.
load_balance_cm
If the pat_build -g mpi option was used to instrument the program or the program was executed with the pat_run -g mpi option, this report option shows the load balance by group with collective-message statistics.
load_balance_sm
If the pat_build -g mpi option was used to instrument the program or the program was executed with the pat_run -g mpi option, this report option shows the load balance by group with sent-message statistics.
load_imbalance_thread
Shows the active time (average over PEs) for each thread number.
loop_times
Inclusive and Exclusive Time in Loops. If the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used, then this table will be included in a default report and the following additional loop reporting options are also available.
loop_callers
Loop Stats by Function and Caller. Available only if the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used.
loop_callers+src
Loop Stats by Function and Callsites. Available only if the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used.
loop_calltree
Function and Loop Calltree View. Available only if the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used.
loop_calltree+src
Function and Loop Calltree with Line Numbers. Available only if the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used.
profile_loops
Profile by Group and Function with Loops. Available only if the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used.
mesh_xyz
Show the coordinates in the network mesh.
mpi_callers
Show MPI sent- and collective-message statistics
mpi_sm_callers
Show MPI sent-message statistics
mpi_coll_callers
Show MPI collective-message statistics
mpi_dest_bytes
Show MPI bin statistics as total bytes
mpi_dest_counts
Show MPI bin statistics as counts of messages
mpi_sm_rank_order
Calculate a suggested rank order based on MPI grid detection and MPI point-to-point message optimization. Uses sent-message data from tracing MPI functions to generate suggested MPI rank order information. Requires the program to be instrumented using the pat_build -g mpi option or executed with the pat_run -g mpi option.
mpi_rank_order
Calculate a rank order to balance a shared resource such as USER time over all nodes. Uses time in user functions, or alternatively, any other metric specified by using the -s mro_metric options, to generate suggested MPI rank order information.
mpi_hy_rank_order
Calculate a rank order based on a hybrid combination of mpi_sm_rank_order and mpi_rank_order.
nids
Show PE to NID mapping.
nwpc
Program network performance counter activity.
profile_nwpc
Network performance counter data by Function Group and Function. Table shown by default if NWPCs are present in the .ap2 file.
profile_pe.th
Show the imbalance over the set of all threads in the program.
profile_pe_th
Show the imbalance over PEs of maximum thread times.
profile_th_pe
For each thread, show the imbalance over PEs.
program_time
Shows which PEs took the maximum, median, and minimum time for the whole program.
read_stats, write_stats
If the pat_build -g io option was used to instrument the program or the program was executed with the pat_run -g io option, these options show the I/O statistics by filename and by PE, with maximum, median, and minimum I/O times. The -O io option is a shortcut for both read_stats and write_stats.
samp_profile+src
Show sampled data by line number with each function.
thread_times
For each thread number, show the average of all PE times and the PEs with the minimum, maximum, and median times.
Notes:
In a command line, after options have been interpolated from the files specified by -O options, one table (or observation) is defined by each triplet of options -d, b, and -C. These may appear in any order, but each should be complete before the next begins. (If -C is omitted, -C Custom will be used.) Other options may be interspersed.
By default, some report tables may hide values for individual PEs, or show only the PEs having the maximum, median, and minimum values. The suffix _all can be appended to any of the above options to show the data for all PEs. For example, the option load_balance_all shows the load balance statistics for all PEs involved in program execution. Use this option with caution, as it can yield very large reports.
The content of an option file can be displayed with the -h option (see below), and a file can be copied and modified to define additional report tables. The modified copy should have a name that is not a prefix of one of the above option file names, and the file name or path can be specified as an option to -O. A name is searched for first in $CRAYPAT_ROOT/share/config/ReportPrune, then in . (the current working directory),and then in $HOME.
-O [option-file]|-b|-d|-s -h
Use the -h option to view help content specific to the pat_report -O, -b, -d, or -s options. For example, to view the available option files for the -O option, enter this command:
$ pat_report -O -h
To view the contents of a particular option file, program_time, enter this command:
$ pat_report -O program_time -h
-G group=value
Specifies a comma-separated list of group and value pairs that are used to define function groupings displayed in a report.
The default function groupings are by predefined trace-type names such as USER, MPI, etc. Custom group names may be defined as one or more predefined group names, function names, or function regular expression patterns. For example:
-G ALL_MPI=MPI,MPI_SYNC,MPI_COLLECTIVE
-G RDWR=read,write
-G 3DFUNCS=/3d_.*
The regular expressions are prefixed with / and may need to be enclosed in ‘ ‘ pairs to protect special characters from the command line shell.
To limit the custom groupings to only one table in a report, follow a table specification option with a -G option, where the group name is prepended with table. For example:
-O profile -G table.RDWR=read,write
-H
By default, if hardware performance counter information was collected, it is displayed. To suppress the display of this information and improve the clarity of the calltree and callers table, use the -H option.
-P
By default, the “uninteresting” callers in any report that shows callers or the call-tree are not displayed. To suppress this automatic pruning of callers, use the -P option. This will expose the CrayPat function wrappers used for tracing, as well as internal library functions.
-r
When used with the -f plot option, specifies that the .plot data files contain “raw” data.
-v
Specifies more verbose notes for tables in a text report.
-skey=value
A comma-separated list of key and value pairs that are used to define details of the report appearance. A list of valid keys and their default values follows. When more than one alternative is shown (indicated by the | character), the default value is the first alternative. The valid arguments for the -s option are as follows:
aggr_bb=how | aggr_bb_dd=how
Use this option to specify how the data item dd specified in the -d option is aggregated for the by item bb specified in the -b option. The first form specifies how for all data items. The supported values for how are sum, max, avg, and (when bb is th) select0. The default is sum, with the following exceptions. aggr_th has default value select0, that shows only data from the main thread for multi-threaded programs. aggr_pe_time and aggr_pe_cycles have default value avg, to give derived metrics such as mflops for the whole program rather than per PE.
at_bottom=”=” | ” “
Defines character(s) filling line at bottom of each subsection containing more than one line of data. The first alternative is the default. If -s grid=”no” then the second alternative is used. You are not restricted to = and a blank space; any string may be specified.
at_left=”|” | ” “
Defines character(s) filling indentation of a subsection. The first alternative is the default. If -s grid=”no” then the second alternative is used. Any string may be specified.
at_outdent=”” | ” “
Defines character(s) filling line separating a line from a subsequent line with a less-indented label. The first alternative is the default. If -s grid=”no” then the second alternative is used. Any string may be specified.
at_top=”-” | “”
Defines character(s) filling line at top of each subsection. If empty, no line appears. The first alternative is the default. If -s grid=”no” then the second alternative is used. Any string may be specified.
block_pad=’ ‘
Padding between columns when data is shown in rows.
callers=”hierarchy” | “list”
Determines how callers are shown.
callers_sep=”,”
Defines character separating callers in list form.
calltree=”hierarchy” | “list”
Determines how callees are shown.
calltree_start_depth=0
Hide nodes in calltree with depth less than the specified value. The root has depth zero, so the default does not hide any nodes. A negative value counts up from a leaf node, with -1 being the nominal depth of any leaf node, -2 its immediate parent, and so on.
calltree_stop_depth=-1
Hide nodes in calltree with depth greater than the specified value. A leaf has nominal depth -1, so the default does not hide any nodes.
caption=hide | show
Hide or show the table caption lines of the form: Table 1: …
colhdr.key=value | table.colhdr.key=value
Use the table column header value for the metric whose key is key.
column_pad=” |” |x
The character(s) separating two columns, where x indicates the number of blank spaces between columns. The first alternative is the default. If -s grid=”no” then the second alternative is used, with x=2. Any string may be specified.
content=header,tables,details
The default content of a report consists of three sections.
A header shows the version of CrayPat used to collect the data, information about the system on which the job was run, the number of nodes used, the number of ranks placed on each node, and the path to directory containing the data.
Each of the tables provides a view of data that was collected, either summarized in an observation, or in a a tabular format with explanatory notes.
The details provide information about how the program was built, instrumented for data collection, and run.
Any one or two items from the list may be specified, and compact_header and compact_details provide briefer versions.
csv_label_sep=,
The character that separates labels (function group, function name, PE, etc.) in the csv format. Use TAB for tabs.
csv_sep=,
The character that separates values in the csv format. Use TAB for tabs.
data_size=8 | 4
Scale cache and TLB utilization metrics by data size, cache line size, and page size. The default size is 8 bytes (64-bit, double-precision). The alternative value is 4 bytes (32-bit, single-precision).
demangle=”no” | “yes”
Valid only for C++ names.
demangle_params=”hide” | “show”
Hide demangled function parameters when displaying function names. Demangled function parameters can be seen by setting demangle_params=show.
derived_defs=path_to_file
Optional file with user-defined derived metrics (modeled on definitions in the deriv category found in the files Metrics_deriv.xml or Counters.*.xml in $CRAYPAT_ROOT/share/counters). A user-defined metric’s column header is defined from its -s colhdr.key value, name field, or key field, taking the first value specified.
ditto=” ” | “
Defines the character(s) shown if a value is the same as the value immediately above it. If empty, the value is shown. The first alternative is the default. If -s grid=”no” then the second value is used. Any string may be specified.
filter_input=condition
Use this option to filter the data. The condition should be an expression involving pe such as pe<npes/2 or pe%2==0, where npes refers to the number of PEs in the program. This option is useful when the size of the full data file makes a report incorporating data from all pes take too long or exceed the available memory.
fmt_av=”%5.2f”
Defines format in which averages are printed.
fmt_mf=”%5.2f”
Defines format in which mflops are printed.
fmt_pct=”%4.1f%%”
Defines format in which percentages are printed.
fmt_rate=”%4.3f”
Defines format in which rates (/sec) are printed.
fmt_ratio=”%5.2f”
Defines format in which ratios are printed.
fmt_ti=”%9.6f”
Defines format in which time in seconds is printed.
fmt_vl=”%5.2f”
Defines format in which average vector length is printed.
geometry=wxh+x+y
When used with .plot input, specifies the X11 window geometry and position. For example, geometry=800x600+0+0 specifies an 800 x 600 window positioned in the upper left corner of the X11 window.
grid=”yes” | “no”
Defines the style of separators between columns and between subsections in a hierarchical report. The default is yes. If -s grid=”no” then at_top, at_bottom, at_left, at_outdent, column_pad, and ditto behave as described above.
kern_cols=”yes” | “no”
Shorten lines in long-form reports by trimming trailing white spaces and fitting columns into the smallest possible combined width. The default is yes. Enter no to disable.
labels_first=yes | no
Place labels like function name before or after the data values in each row of a table.
list_sep=”:”
Defines the separator for lists other than callers.
loop_mark=”.LOOP”
String inserted into the name generated for a loop instrumented by the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option.
loop_reg_mark=””
Set to a non-empty string to segregate time in an instrumented region created by the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option from other time in the function containing the region. This typically is not useful, but .LOOPS was the default value in releases prior to 6.1.4.
loop_threshold=
If using profile-guided optimization (PGO), set the minimum amount of time that the program must spend in a defined region in order for the PGO information to appear on the report. This value can be entered as either a percentage or a decimal fraction. The default value is 0.5%.
mpi_p2p_map_bin_pcts=”5,50,75,90,95,99”
Specifies the size boundaries of the MPI P2P map. The specification is a comma-separated list of values between 0 and 100, each expressed as a percentage of total MPI ranks in the job. Endpoints 0 and 100 are implied.
mpi_p2p_map_bin_pes
Like mpi_p2p_map_bin_pcts, but with values expressed as MPI PE ranks.
mro_sm_metric=Dm|Dc
Used with the -O mpi_sm_rank_order option. If set to Dm, the metric is the sum of P2P message bytes sent and received. If set to Dc, the metric is the sum of P2P message counts sent and received.
Default: Dm
mro_metric=ti|…
Used with the -O mpi_rank_order option. Any metric can be specified, but memory traffic hardware performance counter events are recommended.
Default: ti
mro_mpi_pct=value
Specify the minimum percentage of total time that MPI routines must consume before pat_report will suggest an alternative rank order.
Default: 10 (percent)
mro_group=USER|MPI|…
Used with the -O mpi_rank_order option. If specified, the metric is computed only for functions in the specified group.
Default: USER functions
names=”linkage” | “source”
Determines whether function names are printed as shown by nm or in source.
no_data=”–”
Placeholder used wherever there is no data.
no_label=”(N/A)”
Placeholder used wherever label is unknown.
notes=hide | show
Hide or show the table notes following the table caption.
numeric_prefixes=yes | no
By default, each instance of a label that is an ordinal number is shown with a prefix to indicate its meaning, for example pe., thread., acc., line., … Specify no to omit the prefixes (typically for exporting).
omp_barr=‘.BARRIER@li.’
Label for a generic OpenMP barrier, as in foo .BARRIER@li.18.
omp_barr_imp=‘.BARRIER_IMPLICIT@li.’
Label for an OpenMP implicit barrier, as in foo .BARRIER_IMPLICIT@li.81.
omp_barr_exp=‘.BARRIER_EXPLICIT@li.’
Label for an OpenMP explicit barrier, as in foo .BARRIER_EXPLICIT@li.82.
omp_dist=‘.DISTRIBUTE@li.’
Label for an OpenMP distribute construct, as in foo .DISTRIBUTE@li.124.
omp_league=‘.LEAGUE@li.’
Label for an OpenMP teams construct, as in foo .LEAGUE@li.231.
omp_loop=‘.LOOP@li.’
Label for an OpenMP worksharing-loop construct, as in foo .LOOP@li.19.
omp_mstr=‘.MASTER@li.’
Label for an OpenMP master construct, as in foo .MASTER@li.19.
omp_reduce=‘.REDUCTION@li.’
Label for an OpenMP reduction clause, as in foo .REDUCTION@li.323.
omp_reg=‘.REGION@li.’
Label for a generic OpenMP parallel region, as in foo .REGION@li.13.
omp_secs=‘.SECS@li.’
Label for an OpenMP sections construct, as in foo .SECS@li.15.
omp_sect=‘.SECT@li.’
Label for an OpenMP section directive, as in foo .SECT@li.17.
omp_sngl=‘.SINGLE@li.’
Label for an OpenMP single construct, as in foo .SINGLE@li.17.
omp_task=‘.TASK@li.’
Label for an OpenMP task construct, as in foo .TASK@li.187.
omp_team=‘.TEAM@li.’
Label for an OpenMP parallel construct, as in foo .TEAM@li.123.
omp_wksh=‘.WKSH@li.’
Label for a generic OpenMP workshare construct, as in foo .WKSH@li.21.
orphan_limit=x
Where x is the maximum number of messages about missing returns.
overhead=include
The overhead values per traced call are shown in the report and subtracted automatically. Use this option to suppress the subtraction.
ovhd=show
Use this option to see the overhead measurements themselves.
pe=…
Show, select, or hide data for individual PEs in a table. The value can be ALL, HIDE, or any value that can be used for pe selection in the -b option such as [mmm] or ‘[max3,min3]’. This option overrides the way that per-PE data is shown in default tables and in tables specified using the -O option.
peak_flops=core
Show peak flops as a percentage of 1/2 of the calculated module peak. By default this doubles the percent value shown, which can be a more accurate metric on systems with multi-core CPUs that share the floating-point module between cores.
percent=”absolute” | “relative”
Determines whether absolute or relative percentages are used in grand totals and subtotals. An “
absolute
” percentage is a fraction of the total for the whole program. A “relative
” percentage is a fraction of the total in the next level up in the hierarchy of subreports; for example, a data value for a line in a function relative to the total for that function.pp=…
Show, select, or hide the PEs that are partners in MPI point to point communications, if that information was collected and is shown in a table. See pe= for the possible option values.
prune_name=string
Specifies function names to be pruned (removed) from a report. This option takes precedence over the environment variables PAT_REPORT_PRUNE_NAME and PAT_REPORT_PRUNE_NAME_FILE. Its value is processed in the same way as the value of the PAT_REPORT_PRUNE_NAME.
rank_cell_dim=m1xm2x…
Specify a set of cell dimensions to use for rank-order calculations. For example, -s rank_cell_dim=2x3.
rank_grid_dim=m1Xm2X…
Specify a set of grid dimensions to use for rank-order calculations. For example, -s rank_grid_dim=8x5x3.
ranks_per_node=N
Specify the number of ranks per node to use for rank-order calculations. Typically used to specify a value that is different from the placement used in the run that collected the MPI message data. For example, -s ranks_per_node=20.
regions=”show” | “hide” | “noacc” | “noapi” | “noloop” | “noomp” | “nocheck_acc” | “nocheck_api” | “nocheck_loop” | “nocheck_omp”
Show or hide per-region construct data. If “hide” is selected, the data is merged into the data for the containing function. Regions of one or more specific types can be hidden with a comma-separated list such as noapi,noomp. Use the form nocheck_api to suppress the address range checks that can cause some regions to be ignored.
show_callers=”fu”
Defines what is shown for a caller. Default is function name. Must be a sublist of fu, so, li.
show_data=cols | rows | csv | csv,leaf
By default, 7 or fewer data items are shown in columns, otherwise each appears in a separate row with rates and related percentages. The cols and rows options are used to force one display mode or the other. The csv or csv,leaf option formats data as comma-separated values, with or without totals and subtotals, for importing into a spreadsheet, or other processing.
show_level=2 | no
For a table showing data in columns (see -s show_data=), show numeric level numbers at left in each line of a hierarchical table, when the level is greater than or equal to the specified number. For a table in CSV format, omit all level numbers if no is specified, else show all level numbers.
sort_by_xx=”yes” | “no”
xx can be either address, line, pe, thread, or a 2-letter prefix. The default sort is by descending values in the leftmost data column. If values of xx label a contiguous block of lines in a table, then the lines in that block are sorted by ascending value of the label.
source_limit=”4”
Limit source path length.
th=…
Show, select, or hide the threads for which per-thread data is shown in a table. See pe= for the possible option values.
threshold_all=no | yes
Specify yes to apply thresholds to all levels in a hierarchical table. Otherwise, thresholds are not applied to levels like pe and th. This affects whether a selection option like -s pe=[mmm] will ignore a pe with a data value less than or equal to the threshold.
threshold_groups=yes
Specify no to show all function groups for which data was collected, even if the total is smaller than the report threshold.
tsep=yes | no
Do or do not show thousands separators in numeric data values.
-d d-opts
The d-opts is a comma-separated list of data items that determine the content of the summary. The items appear as columns in the report, in the order specified. The valid d-opts values are:
counters hardware performance counter valuestraces number of entries to traced functiontime from rtc timestamps, if tracingflops floating-point operations, from countersHc HBM object sample countHw HBM object sample weightHa HBM object max activeHs HBM object max size (MBytes)HK HBM locked samplesHL HBM lost samplesHP HBM max pending samplesHS HBM max sizeHO HBM num objectsHC HBM total samplesHW HBM total weightHT HBM tracking samplesHU HBM unknown samplesHG HBM unknown weightHD HBM weight scaleHV HBM versionmflops floating-point operations, millions per secondinput number of bytes read, if tracing I/Ooutput number of bytes written, if tracing I/Oio equivalent to input/outputMcs MPI point-to-point PE source to destination mapP CPU hwpc counters plus derived metricsN network hwpc counters plus derived metricspt total time per processtl exclusive time in loopstt total time per threadTc average time per callThe following heap data is always collected at the beginning and end of the main program.
FM Final Max Free Object MBHF Heap Free Delta MBIF Init Heap Free MBIU Init Heap Used MBNF Heap Not Freed MBIf the main program is instrumented using the pat_build -g heap option or the program was executed with the pat_run -g heap option, the following heap data is collected.
ab Tracked MBytes Not Freedac Tracked Objects Not Freedam Tracked Heap HiWater MByteslb Tracked MBytes Not Freedta Total Allocstb Total Alloc Bytestf Total Freesua Allocs Not Trackedub MBytes Not trackeduf Frees Not TrackedIf the main program is instrumented using the pat_build -g io option or the program was executed with the pat_run -g io option, the following I/O data is collected.
rt Read Timewt Write Timerb Read MBwb Write MBrR Read Rate in MB/secwR Write Rate in MB/secrd Reads (Number of Calls)wr Writes (Number of Calls)rC Read Bytes/CallwC Write Bytes/CallIf the main program is instrumented using the pat_build -g mpi option or the program was executed with the pat_run -g mpi option, the following message data is collected.
sc Sent Msg Countsm Sent Msg Total Bytessz Sent Msg Avg Sizecc Collective Msg Countcm Collective Msg Total Bytescz Collective Msg Avg SizeIn addition, the following options report the MPI message sent and collective counts sorted into bins by message size. For collectives that have a root rank, either the sizes of the messages sent or the sizes of the messages received are reported, depending on whether the non-roots send or receive messages.
mb1 Sent messages smaller than 16 bytes.mb2 16B ⇐ sent message size < 256Bmb3 256B ⇐ sent message size < 4KBmb4 4KB ⇐ sent message size < 64KBmb5 64KB ⇐ sent message size < 1MBmb6 1MB ⇐ sent message size < 16MBmb7 Sent messages equal to or greater than 16MB.cb1 Collective messages smaller than 16 bytes.cb2 16B ⇐ collective message size < 256Bcb3 256B ⇐ collective message size < 4KBcb4 4KB ⇐ collective message size < 64KBcb5 64KB ⇐ collective message size < 1MBcb6 1MB ⇐ collective message size < 16MBcb7 Collective messages equal to or greater than 16MB.The following data can be collected on systems with hardware accelerators.
ht Host time (amount of time executed on the host)at Accelerator time (amount of time executed on the accelerator)TA Amount of data copied to accelerator in MBytesFA Amount of data copied from accelerator in MBytesaT CallsAc Accelerator performance counterIf the CCE Fortran compiler -h profile_generate or CCE C/C++ compiler -finstrument-loops option was used, then tl is time adjusted to show exclusive time in loops (and correspondingly less exclusive time in the containing functions), and the following loop-related values are available.
LI Loop Incl Time MaxLM Loop Trips MaxLT Loop Incl Time TotalLU Loop Incl Time / Total Time (as %)La Loop Trips AvgLf Loop FlagsLi Loop Incl Time MinLm Loop Trips MinLn Loop NotesLt Loop Trips TotalLx Loop ExecRt Loop Region TimeEach d-opts item X that is not a rate or average can have one or more of the following derived items:
X% (percent of aggregate value)
max_X (maximum value of X for all PEs)
max_X% (maximum value of X expressed as a percentage of total value for all PEs)
min_X (minimum value of X for all PEs)
min_X% (minimum value of X expressed as a percentage)
avg_X (mean of the X values for all PEs)
avg_X% (mean expressed as a percentage)
cum_X (cumulative value)
cum_X% (cumulative percent of value)
sd_X (standard deviation of the X values for all PEs)
sd_X% (standard deviation expressed as a percentage)
cv_X (coefficient of variation for X=sd_X/avg_X)
imb_X (imbalance, defined as max_X - avg_X)
imb_X% (imb_X/max_X * npes/(npes - 1) * 100%))
This definition provides a value of 100% in the case of maximum imbalance, when only one PE has a non-zero value for X. This value is defined only when the number of PEs (npes) is greater than 1.
For example, to display only data from the PE having the maximum number of megaflops in each function, enter -d max_mflops -b fu,pe=HIDE.
By default percentages are “
relative
”, meaning each value is shown as a percentage of the total for the line at the next higher level in the report. The -s option can be used to specify “relative
” percentages. This can be useful for hierarchical reports; for example, for showing data by line within each function.
-b b-opts
The b-opts is a comma-separated list of items that determine how the data is aggregated and labeled in the summary. The valid b-opts values are:
ad (address)
ai (accid - enumerates accelerator devices)
ca (callers - only if tracing or sampling callstacks)
cn (cpu number)
cr (core)
ct (calltree - only if tracing or sampling callstacks)
ds (dso - .so that defines a function, else a.out)
fi (filename - I/O source/destination)
fu (function)
gr (groups of functions like USER, MPI, IO, etc.)
li (line in the relevant source file)
ni (node id)
nn (numanode)
ol Object allocation or free location, for HBM data
pe (that enumerate the processes in a program)
pe.th (unique ID formed from pe and thread numbers for entire set of program threads; if there is only one thread per process, this should perform the same as pe)
pp (the pe of a P2P communication partner)
ri (router id)
Sd (send distance for MPI P2P communication)
sk (socket)
so (source file names)
th (enumerates threads in a multithreaded program)
to (totals for entire program)
Each item in the comma-separated list gives a level in the table, labeled with names of a category designated by the item. For example, -b function,source,line produces a table with a line for each function, showing data totals labeled by the name of the function. Below the line for a function is a set of lines labeled by the source files (typically one) containing code for that function. Below the line for each source file is a set of indented lines showing data totals for the lines in that function, labeled by line number.
Note that the order of items in the list for the -b option determines the way in which the data is grouped and aggregated in the resulting report. Any unambiguous prefix can be used in place of a full identifier.
Any identifier that implies a list, like counters or callers, can be suffixed with a number or a range to select elements in the list. For example:
$ pat_report -d counter1,counter3 -b function,caller1..2 …
will show data for the first and third counters recorded, aggregated by function, caller and caller’s caller.
Row selection for reports.
Many default reports make use of selection to avoid showing data for every pe. In particular, ‘mmm’ is short for ‘max,med,min’, and shows the lines containing the maximum, median and minimum values of the data sort key (typically shown in the leftmost column). Note that the default selection can be overridden by specifying -s pe=ALL, -s pe=HIDE, or other variants shown below. Variants on the same line are equivalent:
pe=’[max]’
pe=’[m]’
pe=’[max,min]’
pe=’[mm]’
pe=’[max,med,min]’
pe=’[mmm]’
pe=’[max3,min3]’
pe=’[max3,med,min3]’
More general kinds of selection can be used for items in a -b option:
Syntax
Examples
Examples
-b item=name
-b fu=’foo’,pe
-b fu,pe=0
-b item=/pattern
-b fu=’/^MPI’
-b fu=’/add.*dp$’
-b item=’[indices]’
-b fu,pe=’[mmm]’
When the key is pe or thread, then key values for which no values were recorded are also shown, with a data value 0 for the sort data key and the no-data symbol for other columns.
By default, arguments are shown as a list, and callers and calltree as a hierarchy. The alternative behavior can be specified using the -s option, as in -s callers=list.
-T
Set to zero all thresholds specified for columns in a table. The effect is that only values that are truly zero are suppressed. This option can be helpful when you want to see the parts of a table that are hidden by the default thresholds.
-V
Write the CrayPat version number to standard error.
-z
Ignore options specified through environment variable PAT_REPORT_OPTIONS.
ENVIRONMENT VARIABLES
PAT_AP2_FILE_MAX
Changes the default limit of 256 on the number of .ap2 files created in lite mode.
When the limit is less than the number of .xf files, then one or more .ap2 files will contain data from more than one .xf file. the base name of an .ap2 file will be the base name of the first of those .xf files. PAT_AP2_FILE_MAX can be set to zero or a negative value to disable the limit so that each .ap2 file contains data from only one .xf file.
PAT_AP2_KEEP_ADDRS
Set to 1 to disable compression of sampled addresses.
The default behavior, when processing data from .xf files to .ap2 files, is to map all addresses that share the same source file number to a single representative address. This can significantly reduce the size of the .ap2 files, thus reducing the time required to generate reports.
PAT_AP2_PRAGMA
Set to a semicolon-separated list of sqlite pragmas to be supplied to the sqlite library before reading or writing .ap2 files. The default list is:
journal_mode=OFF synchronous=OFF locking_mode=EXCLUSIVE cache_size=4000
PAT_AP2_SQLITE_VFS
Set to unix-none to inhibit file locking. This is now the default for xf_ap2 on Macintosh and Linux systems. Set to DEFAULT to use the sqlite3 library default. Other choices are documented at www.sqlite.org
PAT_REPORT_HELPER_START_FUNCTIONS
Add to, or redefine, the list of start functions used by some programming models for helper threads that provide support for the model, but do not directly execute code from an application.
The value of this variable should be a comma-separated list of function names. If it begins with a comma, then functions are added to the default list. Otherwise, they replace the default list. The default list is:
__kmp_launch_monitor cudbgGetAPIVersion cuptiActivityDisable _dmappi_error_handler _dmappi_queue_handler _dmappi_sr_handler
PAT_REPORT_IGNORE_VERSION, PAT_REPORT_IGNORE_CHECKSUM
If set, turns off checking that the version of CrayPat being used to generate the report is the same version, or has the same library checksum, as the version that was used to build the instrumented executable.
PAT_REPORT_OPTIONS, PAT_REPORT_POST_OPTIONS
If the -z option is specified on the command line, these environment variables are ignored. Otherwise:
If set, the options in these environment variables are evaluated before, or after, any options on the command line.
If not set, the values of these variables recorded in the experiment data file are used, if present.
Note that the first variable provides a convenient means to control the processing and reporting of data during runtime, by means of the pat_report -Q option.
PAT_REPORT_PRUNE_NAME
Prune (remove) functions from a report, based on either the name of the function, source file, or the path of the .so file that provides it. Note that a pruned function is treated as if it had been inlined; its time is attributed to its caller, and if it has callees, they are shown as callees of its caller.
If set to an empty string, no pruning is done.
If not set, the behavior is as if it were set to @default.
If set to a string starting with a comma (,) then @default is prepended.
A non-empty string is processed as a comma-list of items according to the following rules:
If an item begins with @, the subsequent characters are interpreted as the name (or path) of a file. If not a path, the filename is sought in $CRAYPAT_ROOT/share/config/ReportPrune, then in the current working directory, then in $HOME. The file is read, and each non-empty line that does not begin with # is processed as a single item.
Otherwise, if an item contains no forward slash (/), it is processed as if it began with P/.
If an item contains an S preceding its first /, the “target string” for a function is the path to the source file that provides its definition.
If an item contains a D preceding its first /, the “target string” for a function is the path to the .so file that provides its definition.
If neither S nor D precede the first /, the “target string” is the name of the function.
If an item contains a P preceding its first /, the string following the / is matched as a prefix of the target string, and otherwise it is interpreted as a Posix regular expression, compiled by the regcomp(3c) function, and matched against the target string using regexec(3c).
Any function having a target string that is matched by the prefix or regular expression specified by an item is pruned from the report, and its time and other data is attributed to its caller.
A regular expression match is case-sensitive by default, but one or more regular expression qualifiers can precede the first /. These are:
! : Reverse the results of the match.
i : Ignore case when matching.
x : Interpret as an extended, rather than basic, regular expression.
PAT_REPORT_PRUNE_NAME_FILE
Specifies a filename with which to prune (remove) functions by name from a report. If prepended with a comma (,) the default file ($CRAYPAT_ROOT/share/config/ReportPrune/default) is also used. A file is processed according to the rules in PAT_REPORT_PRUNE_NAME.
PAT_REPORT_PRUNE_SRC
If not set, the behavior is the same as if set to /lib.
If set to the empty string, all callers are shown.
If set to a non-empty string or to a comma-separated list of strings, a sequence of callers with source paths containing a string from the list are pruned to leave only the top caller.
Effective only when .ap2 files are created.
PAT_REPORT_PRUNE_NON_USER
If set to 0 (zero), disables the default behavior of pruning based on ownership (by user invoking pat_report) of the compilation directory for the definition of a function. Effective only when .ap2 files are created.
PAT_REPORT_PYTHONHOME
Specifies the pathname of CPython libraries for Python Experiments. If unset, the pathname is inferred in the following order:
$CRAY_PYTHON_PREFIX if set (for example, by the cray-python module).
$PYTHONHOME if set
Otherwise, pathname is assumed to be /usr/lib/libpython or /usr/lib64/libpython
An improper value leads to Python information missing from reports and Python interpreter frames being exposed. or details on Python Experiments, see the pat_run man page.
PAT_REPORT_VERBOSE
If set, produces more feedback about the parsing of the .xf file and includes in the report the values of all environment variables that were set at the time of program execution.
NOTES
The pat_report utility has two different uses. With the -f option, it reformats the data files, and in this case any options other than -V, -i, and -o are ignored. Without the -f option, it produces summary reports from the data records.
The items for the -b and -d options can be specified using the first two (or three) significant letters of the full names shown above (except for the fmt_* options).
Traced functions that are of global scope and defined in header files not owned by the user will be classified as USER functions in pat_report tables. This is especially significant for C++ programs.
For multi-threaded programs, most report tables focus by default on the data from only the main thread(s). Exceptions are tables from option files with _th in their names (ahead of _pe if both appear.) You can use the option -s th=ALL to see individual thread times.
FILES
a.out+pat+PID-nodes|t
Depending on the nature of the program and the environmental conditions in effect at the time of program execution, the instrumented executable, when executed, generates a experiment-data-directory, where:
- a.out
is the name of the original program
PID
is the process ID assigned to the instrumented executable at runtime
node
is the physical node ID upon which the rank zero process was executed
s|t
is a one-letter code indicating the type of experiment performed, either s for sampling or t for tracing
By default, the experiment data directory is created under the current working directory, but this location can be changed by setting the environment variable PAT_RT_EXPDIR_BASE.
Performance data files associated with this executable run are stored in this experiment data directory and include:
xf-files
A subdirectory containing one or more .xf files generated during the run. To save disk space, this subdirectory may be deleted once the ap2-files directory has been generated.
ap2-files
A subdirectory containing one or more .ap2 files, which contain all the information from the original .xf files, but in the more portable Cray Apprentice2 format. This subdirectory is created automatically by an executable instrumented for a Perftools-lite experiment, or otherwise by the first invocation of pat_report on the experiment data directory.
Note: The most significant difference between .xf and .ap2 format is that .xf files require the original instrumented executable and dynamic libraries to be available to provide mapping from addresses to function names and source line numbers, while .ap2 files incorporate this data mapping and are self-contained. Therefore the .ap2 format is recommended if you wish to preserve the data for future reference.By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.
rpt-files
A subdirectory containing one or more text report files generated during the run or by pat_report.
html-files
A subdirectory containing one or more reports in HTML format, which are produced by using pat_report with the -f html option. These files can be opened with any web browser, or opened from the command line on Macintosh or Linux systems by using the open filename.html or xdg-open filename.html commands, respectively.
By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.
plot-files
A subdirectory containing one or more reports in gnuplot format. These files are created by running an instrumented executable with PAT_RT_SUMMARY set to 0 and PAT_RT_SAMPLING_DATA set to a supported value (e.g., cray_pm or cray_rapl), and then using pat_report with the -f plot option. The resulting files can be viewed either by invoking pat_report on the experiment data directory or using gnuplot.
By default, this subdirectory is created in the experiment data directory, but the location can be changed by using the pat_report -o option.
index.ap2
An index data file created as a map to the data within the ap2-files directory.
build-options.apa
File containing recommended parameters for re-instrumenting the program for more detailed performance analysis. This is generated by running an executable instrumented for Automatic Profiling Analysis (pat_build -O apa or pat_run -m apa) and then running pat_report on the resulting experiment data directory.
MPICH_RANK_ORDER*
One or more files containing options for rerunning MPI applications with optimized rank orders. This file is generated either manually, using the grid_order utility, or automatically, by running a performance analysis experiment using Perftools-lite.
SEE ALSO
intro_craypat(1), pat_build(1), pat_opts(1), pat_help(1), pat_report(1), pat_run(1), grid_order(1)
intro_mpi(3)
perftools-base(4), perftools-lite(4), perftools-preload(4)
accpc(5), cray_pm(5), cray_rapl(5), hwpc(5), cray_cassini(5), uncore(5), papi_counters(5)