Cray Compiler Fortran Reference
Fortran Compiler Introduction
The HPE Cray Compiling Environment (CCE) Fortran compiler supports HPE Cray system and supports the Fortran 2018 standard (ISO/IEC 1539:2018) with some exceptions and deferred features as noted elsewhere. The HPE Cray Fortran compiler is also documented in man pages, beginning with the crayftn(1)
man page. Where the information in this manual differs from the man page, the information in the man page is presumed to be more current.
The HPE Cray Fortran Programming Environment
The HPE Cray Fortran Programming Environment consists of the tools and libraries used to develop Fortran applications. These are:
The
ftn
command, which invokes the HPE Cray Fortran compiler. Theftn
command is properly termed a compiler driver, as it is used both to compile source code into object code and to link object code files and libraries to create executable files. This compiling and linking can be performed either as separate processes or as one contiguous process, which has significant implications for file handling considerations. These implications are described later in this section. See thecrayftn(1)
man page for more information.HPE Cray Scientific and Math Libraries (CSML) - a set of high performance libraries that provide portability for scientific applications by implementing APIs for arrays (NetCDF), sparse and dense linear algebra (BLAS, LAPACK, ScaLAPACK) and fast Fourier transforms (FFTW).
The
ftnlx
command, which generates listings and checks for possible errors in Fortran programs. See theftnlx(1)
man page for more information.The
ftnsplit
command, which splits named Fortran files into separate files with one program unit per file. See theftnsplit(1)
man page for more information.The
ftnmgen
command, which invokes the Fortran makefile generator. See theftnmgen(1)
man page for more information.
HPE Cray Fortran Compiler Messages
The HPE Cray Fortran compiler can produce many messages during compilation and linking. To expand on these messages, use the explain
command. For more information, see the explain(1)
man page.
Document-specific Conventions
Cray pointer : The term Cray pointer refers to the Cray pointer data type extension.
Fortran Standard Compatibility
In the Fortran standard, the term processor means the combination of a Fortran compiler and the computing system that executes the code. A processor conforms to the standard if it compiles and executes programs that conform to the standard, provided that the Fortran program is not too large or complex for the computer system in question.
The compiler can be directed to flag and generate messages when nonstandard usage of Fortran is encountered. For more information about this command line option (ftn -en
), see the crayftn(1)
man page. When the option is in effect, the compiler prints messages for extensions to the standard that are used in the program. As required by the standard, the compiler also flags the following items and provides the reason that the item is being flagged:
Obsolescent features
Deleted features
Kind type parameters not supported
Violations of any syntax rules and the accompanying constraints
Characters not permitted by the processor
Illegal source form
Violations of the scope rules for names, labels, operators, and assignment symbols
The HPE Cray Fortran compiler includes extensions to the Fortran standard. Because the compiler processes standard-conforming programs according to the standard, it is considered to be a standard-conforming processor. When the option to note deviations from the Fortran standard is in effect (-en
), extensions to the standard are flagged with ANSI messages when detected at compile time.
Fortran 2018 Compatibility
This release of HPE CCE fully supports the Fortran 2018 standard including coarray TEAMS
, with one limitation. This release does not yet support the REDUCE
intrinsic with CHARACTER
arguments. Support for this feature is expected in a future release.
Fortran Extensions
The HPE Cray Fortran Compiler supports extended features beyond those specified by the current standard. For more information, see HPE Cray Fortran Language Extensions.
Invoke the HPE Cray Fortran Compiler
The ftn(1)
command invokes the HPE Cray Fortran compiler when the HPE Cray Compiling Environment is loaded. Typically the ftn
command processes the input files specified on the command line and generates a binary object file, and then loads the binary object file and generates the executable file a.out.
HPE Cray Fortran Command Syntax
The ftn
command is a driver that invokes the HPE Cray Fortran Compiler when the HPE Cray Compiling Environment is loaded, and links in the libraries required in order to produce code that can be executed on HPE compute nodes. Valid ftn
options include those of the ftn(1)
driver, as well as those specific to the HPE Cray Fortran Compiler:
ftn
[-A module_name[,module_name]...]
[-b bin_obj_file]
[-c]
[-d disable]
[-D identifier[=value]]
[-e enable]
[-f source_form]
[-fbackslash]
[-fopenmp]
[-F]
[-g]
[-G debug_lvl]
[-h arg]
[-I incldir]
[-J dir_name]
[-K trap=opt[,opt]...]
[-l libname]
[-L ldir]
[-m msg_lvl]
[-M msgs]
[-N col]
[-o out_file]
[-O opt[,opt]...]
[-p module_site[,module_site...]]
[-Q path]
[-r list_opt]
[-R runchk]
[-s size]
[-S]
[-T]
[-U identifier[,identifier]...]
[-v]
[-V]
[--version]
[-W phase,"opt...",]
[-x dirlist]
[-Y phase,dirname]
[--]
sourcefile [sourcefile ...]
sourcefile Suffix
The sourcefile.suffix names the file or files to be processed. The file suffixes indicate the content of each file and determine whether the preprocessor, compiler, assembler, or linker will be invoked. At least one source file must be specified, unless the -V
option is specified.
Parameter |
Description |
---|---|
.f, .for |
Fixed-format source, compile |
.F, .FOR |
Fixed-format source, preprocess, compile |
.f90, .f95, .f03, .f08, .f18, .ftn |
Free-format source, compile |
.F90, .F95, .F03, .F08, .F18, .FTN |
Free-format source, preprocess, compile |
.o |
object file, link |
.s |
assembler source, assemble |
The source form specified on the -f source_form option overrides the source form implied by the file suffixes.
If only one source file is specified on the command line, the .o file is created and deleted. To retain the .o file, use the -c
option to disable the linker. Object files produced by HPE Cray Fortran, C, C++, or assembler compilers, can be specified. Object files are passed to the linker in the order in which they appear on the ftn
command line. If the linker is disabled by the -b
or -c
option, no files are passed to the linker.
File Types Used or Created by the Compiler
The compiler uses and creates several types of files during processing:
a.out
Default name of the executable output file. Use the compiler driver command line option -o
to specify an executable name other than a.out.
.i
Files containing output from the source preprocessor
.o
Relocatable object code. During compilation, these relocatable object files are saved in the current directory automatically. If CrayPat is used to conduct performance analysis experiments, the object files created during compilation must be kept in order to preserve source-to-executable function mapping. To do so, use the -h keepfiles
option.
.a
Library files containing external references
.s
Assembly language files. Files with .s extensions are assembled and written to the corresponding .o file.
.mod
By default, the compiler writes a MODULENAME.mod file for each module; MODULENAME is created by taking the name of the module and, if necessary, converting it to upper case. This file contains module information, including any contained procedures. If the -ef
option is specified, the compiler writes modulename.mod for each module, rather than MODULENAME.mod.
Module information files
The compiler creates modules from MODULE
program units. A module is referenced with the USE
statement. The compiler creates a module information file for each module file, with the suffix .mod. By default, all .mod files are named MODULENAME.mod, where MODULENAME is the name of the module (in uppercase) specified in the MODULE
or USE
statement.
The options that change this are -e/dm
, -e/df
, -J
, and -Q
.
-em
is the default.-dm
causes the information to be written to the binary .o file.-ef
modifies-em
to write the information to the modulename.mod file rather than MODULENAME.mod.-ef
is not allowed with-dm
.-J
and-Q
specify a directory where all .mod output is created in and searched for.-J
is only allowed with-em
and-ef
and only affects module information location.-Q
is also allowed with-dm
and affects all non-temporary files.
The search order for satisfying module references in USE statements is as follows:
The current compilation.
The
-J
dir_name directory, if specified.Any directories or files specified with the
-p
and-I
options, in order of specification.Any directories or files specified with the
FORTRAN_MODULE_PATH
environment variable.The current working directory or the
-Q
directory, if specified.
By default, when searching within a directory, the compiler first searches the .mod files, then the .o files, then the .a files, and then the directories, in the order specified.
For module compatibility purposes, the HPE Cray Fortran compiler supports the current release and two previous releases.
\pagebreak
Fortran Command-line Options
The ftn
command invokes the HPE Cray Fortran compiler and accepts the following options and arguments.
-A
module_name[,module_name]...
Directs the compiler to behave as if you entered a
USE module_name
statement for eachmodule_name
in your Fortran source code. TheUSE
statements are entered in every program unit and interface body in the source file being compiled.-b
bin_obj_file
Disables the link step and saves the binary object file of your program in
bin_obj_file
.Only one input file is allowed when the
-b bin_obj_file
option is specified. If you have more than one input file, use the-c
option instead. If only one input file is being processed and neither the-b
nor-c
option is specified, the binary object file of your program is not saved after the link step is completed.If both
-b
bin_obj_file
and-c
are specified, the link step is disabled and the binary object file is written tobin_obj_file
.Default: not set
-c
Disables the link step and saves the binary object file version of your program in
file.o
, where file is the name of the source file. If there is more than one source file, afile.o
is created for each input file specified.Default: not set
-d
disable, -e enable
Disable or enable compiling options. To specify more than one option, enter the options without separators between them; for example,
-e aj.disable/enable
can be one or more of the following options.
|
Action |
---|---|
|
Initialize all undefined local stack, static, and heap variables to 0 (zero). If a user variable is of type character, it is initialized to |
Default: disabled |
|
|
Abort compilation after encountering the first error. |
Default: disabled |
|
|
Treat all module variables as |
Default: enabled |
|
|
If enabled, issue a warning message rather than an error message when the compiler detects a call to a procedure with one or more dummy arguments having the |
Default: disabled |
|
|
Generate binary output. If disabled, inhibit all optimization and allow only syntactic and semantic checking. |
Default: enabled |
|
|
Interface checking: use HPE’s system modules to check library calls in a compilation. If a user procedure has the same name as one in the library, this will produce errors, as the compiler does not skip user-specified procedures when performing checks. |
Default: disabled |
|
|
Enable/disable some types of standard call site checking. The current Fortran standard requires that the number and types of arguments must agree between the caller and callee. These constraints are enforced in cases where the compiler can detect them, however, specifying |
Note: If error-checking is disabled, unexpected compile-time or runtime errors may occur. |
|
In addition, the compiler by default attempts to detect situations in which an interface block should be specified but is not. Specifying |
|
Default: enabled |
|
|
Control a column-oriented debugging feature when using fixed source form. When enabled, the compiler replaces a |
Default: disabled |
|
|
Enable all debugging options. This option is equivalent to specifying the |
Default: disabled |
|
|
Enable/disable masking expression support for non-integer type operands. This allows masking expressions to be evaluated without type conversion. For example, if |
Default: |
|
|
Allow existing declarations to duplicate the declarations contained in a used module. Only existing declarations that declare the function name or generic name in an |
Existing declarations of a procedure must match the interface definitions in the module; otherwise an error is generated. |
|
Default: disabled |
|
|
This option is a modifier to the |
Default: disabled |
|
|
Control preprocessor expansion of macros in Fortran source lines. |
Default: enabled whenever preprocessing is enabled. |
|
|
Allow branching into the code block for a |
Default: disabled |
|
|
Enable support for 8-bit and 16-bit |
Note: Vectorization of 8- and 16-bit objects is deferred. |
|
Default: enabled |
|
|
Initialize all undefined local stack, static, and heap variables of type |
Default: disabled |
|
|
Treat all variables as if an |
Default: disabled |
|
|
Execute |
Default: disabled |
|
|
Allow character literal continuation in free source form without the leading |
Default: enabled |
|
|
When this option is enabled, the compiler creates |
The |
|
Default: enabled |
|
|
Generate messages to note nonstandard Fortran usage. |
|
|
If |
|
If multiple |
|
Default: |
|
|
Display to |
Default: disabled |
|
|
Enable or disable double-precision arithmetic. This option can be used only when the default data size is 64 bits ( |
When the |
|
Similarly, when the |
|
Default: disabled |
|
|
Perform source preprocessing on Fortran source files but do not compile. When specified, source code is included by |
Default: disabled |
|
|
Abort compilation if 100 or more errors are generated. |
Default: enabled |
|
|
Control whether or not the compiler accepts variable names that begin with a leading underscore ( |
Default: disabled |
|
|
Compile all functions and subroutines as if they contained a |
Default: enabled |
|
|
Scale the values of the |
Default: enabled |
|
|
Generate assembly language output and saves it in |
Default: disabled |
|
|
Control preprocessing of Fortran source files. When enabled, source preprocessing is performed. Macro expansion within Fortran source lines is enabled but can be controlled by the |
Default: When not specified, the default is to honor the case of the file name suffix, and other preprocessing options such as |
|
|
Allocate variables to static storage. These variables are treated as if they had appeared in a |
The following types of variables are not allocated to static storage: automatic variables (explicitly or implicitly stated), variables declared with the |
|
Default: disabled |
|
|
Enable support for automatic memory allocation for allocatable variables and arrays that are on the left-hand side of intrinsic assignment statements. |
Using this option may degrade runtime performance, even when automatic memory allocation is not needed. It can affect optimizations for a code region containing an assignment to allocatable variables or arrays; for example, by preventing loop fusion for multiple array syntax assignment statements with the same shape. |
|
Default: enabled |
|
|
If a module variable has initializers, implicit or explicit, and the variable has greater than 10,000 elements to be initialized, optionally create a new module procedure to do the initialization at runtime before MAIN is called. Enabling this option may significantly reduce compile time and reduce the size of the executable for some code, while increasing execution time. If performance is the only issue, disable this option. |
Default: enabled |
|
|
Cause the |
Default: disabled |
|
|
Initialize all memory allocated by Fortran |
Default: disabled |
|
|
Perform source preprocessing and compilation on Fortran source files. When specified, source code is included by both |
-D
identifier[=value]
Defines variables used for source preprocessing as if they had been defined by a
#define
source preprocessing directive. If a value is specified, there can be no spaces on either side of the equal sign. If no value is specified, the default value is1
.Compare to the
-U
identifier option.-E
Performs source preprocessing on Fortran source files, but does not compile. When specified, source code is included by
#include directives
but not by FortranINCLUDE
lines. The preprocessed source code is sent to stdout. This option overrides other preprocessing control,-e/dP
and-e/dZ
.-f
source_form
Specifies whether the Fortran source file is written in fixed source form or free source form. For
source_form
, enterfree
orfixed
.The default is
fixed
for source files that have.f
,.F
,.for
, or.FOR
extensions. The default isfree
for source files that have.f90
,.F90
,.f95
,.F95
,.f03
,.F03
,.f08
,.F08
,.f18
,.F18
,.ftn
, or.FTN
extensions.The upper-case file extensions,
.F
,.FOR
,.F90
,.F95
,.F03
,.F08
,.F18
, or.FTN
, will enable source preprocessing by default.-f
backslash
Change the interpretation of backslashes in character literals from a single backslash character to C-style escape characters. The following combinations are expanded
\a
,\b
,\f
,\n
,\r
,\t
,\v
,\
, and\0
to the ASCII characters alert, backspace, form feed, newline, carriage return, horizontal tab, vertical tab, backslash, and NUL, respectively.-f
cray-program-library-path=program_library
Create and use a persistent repository of compiler information specified by
program_library
. When used with-h wp
, this option provides application-wide, cross-file, automatic inlining. This option is an alias to the-h pl=program_library
compiler option.-f
[no-]openmp
Enable or disable compiler recognition of OpenMP directives. Using
-f no-openmp
is similar to the-h thread0
option, in that it disables OpenMP, but unlike-h thread0
,-f no-openmp
does not affect autothreading. This option is an alias to-h [no]omp
. The HPE Cray Programming Environment will link in the serial version of LibSci when-f no-openmp
is used.Default:
-f no-openmp (-h noomp)
-f
[no-]openmp-simd
Enable or disable compiler recognition of OpenMP SIMD directives. This option may be enabled (
-h omp_simd
or-fopenmp-simd
) when general OpenMP is disabled (-h noomp
or-fno-openmp
), allowing the compiler to take advantage of omp simd constructs for CPU vectorization without enabling CPU threading for omp parallel constructs. This option may not be disabled (-h noomp_simd
or-fno-openmp-simd
) when general OpenMP is enabled (-h omp
or-fno-openmp
). Specifying-O0
with OpenMP disabled (-h noomp
or -fno-openmp
) will disable OpenMP SIMD recognition (-h noomp_simd
or-fno-openmp-simd
). This option is an alias to-h [no]omp_simd
.Default:
-f openmp-simd
-f
denormal-fp-math=ieee
When applied on the link step, begin execution with Gradual Underflow for denormals.
-f
denormal-fp-math=preserve-sign
When applied on the link step, begin execution with Abrupt Underflow (aka flush-to-zero) for denormals. This is default behavior for Fortran on CPUs.
-f
pic, -f PIC
Generates partition independent code (PIC), which allows a virtual address change from one process to another, as is necessary in the case of shared, dynamically linked objects. The virtual addresses of the instructions and data in PIC code are not known until dynamic link time. These are aliases to
-h pic
and-h PIC
.-f
sanitize=check
Turns on runtime checks for various forms of undefined or suspicious behavior. This is an experimental feature currently.
This option controls whether the compiler adds runtime checks for various forms of undefined or suspicious behavior, and is disabled by default. If a check fails, a diagnostic message is produced at runtime explaining the problem.
The following checks are currently supported:
-fsanitize=address
: Enables AddressSanitizer, a memory error detector.
Note: COMMON variables are not supported currently.
-fsanitize=thread
: Enables ThreadSanitizer, a data race detector.
Note:
AddressSanitizer and ThreadSanitizer cannot be used simultaneously.
Lower optimization levels are more likely to produce accurate sanitizer reports
-F
Macro expansion in Fortran source lines is now enabled by default whenever preprocessing is enabled. See the -d|e F
option. The -F
option is obsolete and supported for compatibility with legacy make files.
-g
When specified with no optimization options or with -O0
, provides debugging support identical to specifying the -G0
option. If any optimization option is specified, -g
is ignored.
Default: off
-G
debug_lvl
Controls the tradeoffs between ease of debugging and compiler optimizations. The compiler produces some level of internal debugger information (DWARF) at all times. This DWARF data provides function and source line information to debuggers for tracebacks and breakpoints, as well as type and location information about data variables.
Note that the -g
or -G
options can be specified on a per-file basis, so that only part of an application pays the price for improved debugging.
The -G debug_lvl
arguments are as follows:
debug_lvl |
Support |
---|---|
|
Full DWARF information is available for debugging, but at the cost of a slower and larger executable. Breakpoints can be set at each line. Most optimizations are disabled including floating point optimizations. This level of debugging implies |
|
Most DWARF information is available with partial optimization. Some optimizations make tracebacks and limited breakpoints available in the debugger. Some scalar optimizations and all loop nest restructuring is disabled, but the source code will be visible and most symbols will be available. This allows block-by-block debugging, with the exception of innermost loops. The executable will be faster than with |
|
Partial DWARF information. Most optimizations, tracebacks and very limited breakpoints are available in the debugger. The source code will be visible and some symbols will be available. This level allows post-mortem debugging, but local information such as the value of a loop index variable is not necessarily reliable at this level because such information often is carried in registers in optimized code. The executable will be faster and smaller than with |
-h
arg
The
-h arg
options enable you to access various compiler functions. Some of these options duplicate-O arg
options.-h
[no]acc
Enables or disables the compiler recognition of OpenACC accelerator directives. See the
intro_openacc(7)
man page.Default:
noacc
-h
acc_model=option[option]...
Explicitly control execution and memory model utilized by the accelerator support system. The option arguments identify the type of behavior desired. There are three option sets. Only one member of a set may be used at a time, however, all three sets may be used together.
Default:
auto_async_kernel:fast_addr:no_deep_copy
The option Set 1 is as follows:
option
Description
auto_async_none
Execute kernels and updates synchronously, unless there is an
async
clause present on the kernels or update directive.auto_async_kernel
Default. Execute all kernels asynchronously ensuring program order is maintained.
auto_async_all
Execute all kernels and data transfers asynchronously, ensuring program order is maintained.
The option Set 2 is as follows:
option
Description
no_fast_addr
Use default types for addressing.
fast_addr
Default. Attempt to use 32 bit integers in all addressing to improve performance. Base addresses remain as 64 bit. The performance is improved by potentially using fewer registers and faster arithmetic for offset calculations. This optimization may result in incorrect behavior for codes that make use within accelerator regions of any of the following: very large arrays (offsets would require greater than 32 bits); very large array lower bounds (max offset plus lower bound is greater than 32 bits); bitfields/other bit operations.
The option Set 3 is as follows:
option
Description
no_deep_copy
Default. Do not look inside of an object type to transfer sub-objects. Allocatable members of derived type objects will not be allocated on the device.
deep_copy
(Fortran only) Look inside of derived type objects and recreate the derived type on the accelerator recursively. A derived type object that contains an allocatable member will have memory allocated on the device for the member.
-h
[no]add_paren
Automatically add parentheses to select associative operations (
+,-,*
) to encourage left to right evaluation of floating point and complex expressions. Left to right evaluation is not required by the language standards but some applications may expect it.Default:
noadd_paren
-h
[no]aggress
Cause the compiler to treat a subroutine, function, or main program as a single optimization region. Doing so can improve the optimization of large program units but also increases compile time and size.
Default:
noaggress
-h
[no]align_arrays
Enable or disable padding of arrays in static data. Some statically allocated arrays are aligned and padded for better cache behavior. Common block data is not affected.
Default:
align_arrays
-h
[no]autoprefetch
Enable or disable automatic prefetch optimization. Does not affect
loop_info [no]prefetch
directive.Default:
autoprefetch
-h
[no]autothread
Enable or disable autothreading.
Default:
noautothread
-h
[no]bounds
Enable or disable checking of array bounds. Bounds checking is not performed on arrays dimensioned as (1). Enables
-h overindex
. Equivalent to the-Rb
option.-h
byteswapio
Force byte-swapping of all input and output files for direct and sequential unformatted I/O. Byteswapio is implemented during the linker phase so that it can be uniformly applied across the entire executable. This is a link-time option.
-h
cachen
Specify the level of automatic cache management to be performed, where
n
is a value from0
to3
with0
being no cache management and3
being the most aggressive. Note that cache blocking is controlled by the cblock level, a separate option.Default:
cache0
-h
cblockn
Specify cache blocking policy, where
n
is a level from 0 to 3.0
: No cache blocking is performed. Directives are ignored.1
: Only block according to directives2
: Honor directives and block for largest private cache3
: Honor directives and block for largest cache (prorated for core sharing)Default:
cblock0
-h
[no]caf
Enable the compiler to recognize coarray syntax. The macro
_CRAY_COARRAY
will be defined as1
if-hcaf
is specified on the command line.-hnocaf
is required for Fortran code that will be linked with C++ code.PGAS behavior is determined by the number of physical cores on the node. For more information, see the
intro_pgas(7)
man page.Default:
caf
-h
[no]concurrent
Equivalent to adding a concurrent directive before every loop in the file, including loops created from array syntax. This option may provide significant performance improvements for some codes. The user must ensure that all loops are clearly parallel; private arrays, ambiguous reductions and other special forms may not yield valid parallel code in this mode.
-hnoconcurrent
honors existing concurrent directives. The default is-hnoconcurrent
.-h
[no]contiguous
Declare that every assumed shape array and array pointer target is contiguous, whether or not they have a
CONTIGOUS
keyword, potentially increasing the range of permitted compiler optimizations. By default, the compiler does not assume that all array pointers are pointers associated with contiguous targets or that all assumed shape arrays are contiguous and there is no way to verify this at compile time.Use this option with caution. This additional level of compiler optimization is safe when the memory objects occupy contiguous blocks of memory. If there is potential for hidden dependencies between the memory locations to which the pointers are referring, do not use this option.
Default:
nocontiguous
-h
[no]contiguous_assumed_shape
If
contiguous_assumed_shape
is specified, all assumed-shape dummy arguments are implicitly marked with theCONTIGUOUS
attribute.Default:
nocontiguous_assumed_shape
-h
cpu=target_system
Specify the target system on which the absolute binary file is to be executed, where
target_system
can be:ivybridge
sandybridge
haswell
broadwell
mic-knl
x86-skylake
x86-cascadelake
x86-naples
arm-thunderx2
If
target_system
is set during compilation of any source file, it must be set to the same target during linking and loading.Rather than setting this option directly, users should load one of the targeting modules (for example,
craype-sandybridge
). The targeting modules setCRAY_CPU_TARGET
and define paths to the corresponding libraries. The compiler driver script translatesCRAY_CPU_TARGET
to the corresponding-h cpu=target_system
option when calling the compiler.If a user wishes to override the current
target_system
value set by the module environment (via theCRAY_CPU_TARGET
definition), they should do so by specifying-hcpu=target_system
on the compiler command line.-h
craylibs_arch_override
Forces the Cray math library to honor the processor architecture specified by the
-h cpu
option. Processor architecture is typically specified by loading one of the targeting modules, e.g.,craype-sandybridge
, but can be overridden at link time by using the-h cpu
option.If the
CRAYLIBS_ARCH_OVERRIDE
environment variable is defined, it takes precedence over this option.-h
develop
Reduce compile time at the expense of optimization, by omitting or scaling back optimizations that are known to increase compile time. This option is intended to be used when a program is under development and being recompiled frequently, and is different from and independent of the
-O
options. Consider using this option when using the-O0
or-O1
options results in a longer compile time, or when code compiled with the-O0
or-O1
options runs so slowly as to negate whatever time savings were gained by faster compilation.Default:
off
-h
dir_check
Enable a run time check for the
!dir$ collapse
directive and check the validity of theloop_info
count information. Equivalent to the-Rd
option.-h
display_opt
Display the compiler optimization settings currently in force. This option is identical to the
-eo
option.-h
dynamic
Directs the compiler driver to create dynamically linked executable files and link dynamic libraries at runtime. Note that the preferred invocation is to call the generic
ftn
command with the-dynamic
option, rather than using this compiler-specific option. Compare to the-h shared
and-h static
options and theCRAYPE_LINK_TYPE
environment variable, and see theftn(1)
man page for more information.-h
error_on_warning
If set, change the message level of all warning messages to
error
.Default: not set
-h
find_dirs
Issue warning messages for all unsupported INTEL (
!DIR$
), Fujitsu (!OCL
), PGI (!PGI$
), GCC (!GCC$
), and DEC (!DEC$
) directives.Default: not set
-h
flex_mp=level
Control the aggressiveness of optimizations that may affect floating point and complex repeatability when application requirements require identical results when varying the number of ranks or threads. The valid values for level are:
level
Description
intolerant
Has the highest probability of repeatable results, but also the highest performance penalty.
rigorous
Maintains the bit-reproducibility of
intolerant
but provides most of the performance benefits ofstrict
.strict
Uses some safe optimizations and yields higher performance than
intolerant
, with a high probability of repeatable results.conservative
Uses more aggressive optimization and yields higher performance than
strict
, but results may not be sufficiently repeatable for some applications.default
Uses more aggressive optimization and yields higher performance than
conservative
, but results may not be sufficiently repeatable for some applications.tolerant
Uses most aggressive optimization and yields highest performance, but results may not be sufficiently repeatable for some applications.
Default:
default
-h
[no]fma
Enable or disable the generation of fused multiply add (FMA) instructions, if supported on the target hardware. FMA instructions are enabled by default at
-hfp
levels of1
or higher, but disabled by default at-hfp0
. This option can be used for debugging a numerically sensitive application. The use of FMAs are generally better for performance, but introduce different, although not necessarily incorrect, rounding. This will only affect compiler-generated FMA opportunities and will not affect pre-built libraries.Default:
fma
(enabled, except at-hfp0
)-h
[no]fortran_ptr_alias
The
noortran_ptr_alias
option indicates storage accessed through a Fortran POINTER is only accessible from said POINTER. The compiler is free to assume no overlap with other POINTER based storage or variables with the TARGET attribute. This is a very strong assertion. When applicable, it permits very aggressive optimizaiton.Default:
-h fortran_ptr_alias
-h
[no]fortran_ptr_overlap
: The
nofortran_ptr_overlap
option indicates storage accessed through one Fortran POINTER does not overlap with storage accessed through any other Fortran POINTER; while overlap with non-POINTER variables with TARGET attribute is allowed.-h
fpn[=[no]approx]
Controls the level of floating point optimizations, where
n
is a value between0
and4
, with0
giving the compiler minimum freedom to optimize floating point operations and4
giving it maximum freedom. The higher the level, the less the floating point values conform to the IEEE standard. Use-h fp4
only if your application uses algorithms which are tolerant of reduced precision. Do not use-h fp4
for codes that use Boost I/O or for any codes that do “roll your own” I/O.fp0
may provide well defined values from some intrinsic operations were the Fortran language standard does not specify behavior. For examples see CEILING and FLOOR.Default:
fp2
-h
[no]fp_trap
Control whether the compiler generates code compatible with floating point traps being enabled.
Default:
fp_trap
if traps are enabled using the-K trap
option or if-Ofp[0,1]
is in effect. Otherwise, the default isnofp_trap
.-h
[no]func_trace
For use only with CrayPat (an HPE performance analysis tool). If this option is specified, the compiler inserts CrayPat entry points into each function in the compiled source file. The names of the entry points are
__pat_tp_func_entry
and__pat_tp_func_return
.These are resolved by CrayPat when the program is instrumented using the
pat_build
command. When the instrumented program is executed and it encounters either of these entry points, CrayPat captures the address of the current function and its return address.Default:
nofunc_trace
-h
fusionn
Control loop fusion globally and change the assertiveness of the
FUSION
directive. Loop fusion can improve the performance of loops. although in some rare cases it may degrade overall performance.The
n
argument enables you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of theFUSION
directive. The valid values forn
are:n
Effect
0
No fusion (ignore all
FUSION
directives and do not attempt to fuse other loops)1
Attempt to fuse loops that are marked by the
FUSION
directive.2
Attempt to fuse all loops (includes array syntax implied loops), except those marked with the
NOFUSION
directive.Default:
fusion2
-h
gasp[=opt]:opt]
Request GASP (Global Address Space Performance Analysis) instrumentation. With no options specified, remote data accesses are profiled. When opt is specified, the compiler provides additional instrumentation as follows.
opt
Description
local
Enables instrumentation of events generated by shared local accesses. Instrumenting these events can add runtime overhead to the application.
functions
Enables function instrumentation. Sets
-hipa0
.-h
[no]heap_allocate
-h heap_allocate
forces all variable-size local arrays and temporary arrays to be allocated on the heap.!dir$ heap_allocate
directives are ignored.-h noheap_allocate
places variable-size local arrays and temporary arrays on the stack, except where the!dir$ heap_allocate
directive applies.Default:
noheap_allocate
-h
ignore_unknown_dirs
Suppress generation of warning messages when compiler encounters an unknown directive.
Default: not set
-h
ipalevel
Specifies the level of interprocedural optimization (IPA). level may be one of the following values.
level
Description
0
All inlining/cloning disabled. All inlining and cloning compiler directives are ignored.
1
Inlining/cloning is attempted for call sites and routines that are under the control of a compiler directive.
2
Include level 1. Inline a call site to an arbitrary depth as long as the expansion does not exceed some compiler-determined threshold. The call site must flatten for any expansion to occur. The call site is said to “flatten” when there are no calls present in the expanded code. The call site must reside within the body of a loop and the entire loop body must flatten. A loop body is said to “flatten” when all call sites within the body of the loop are flattened.
3
(Default) Includes levels 1 and 2. Incline call sites that contain constant actual argument(s). Additionally, any call site (regardless of location) that is below some small compiler-determined threshold will inline, provided that the call site flattens. If a routine does not inline, the compiler may clone said routine if there exists a performance benefit.
4
Includes levels 1, 2, and 3. Additionally, a call site does not have to reside in a loop body to inline, nor does the call site necessarily have to flatten.
5
Includes levels 1, 2, 3, and 4. Thresholds are raised and may allow for additional inlining/closing that was not achieved at level 4.
-h
nointerchange
Inhibits the compiler’s attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default.
Specifying the
-h nointerchange
option is equivalent to specifying a NOINTERCHANGE directive prior to every loop. To disable loop interchange on individual loops, use the NOINTERCHANGE directive.-h
keepfiles
The
-h keepfiles
option prevents the removal of the object (.o
) and temporary assembly (.s
) files after an executable is created. Normally the compiler automatically removes these files after linking them to create an executable. Since the original object files are required in order to instrument a program for performance analysis, if planning to use CrayPat to conduct performance analysis experiments, use this option to preserve the object files.Default: not set
-h
keep_frame_pointer
Retain call stack information back to main entry point for CrayPat performance sampling.
Default: not set
-h
list=a|c|d|e|E|i|l|m|o|s|T|x
Produce a listing file. The valid arguments are:
Argument
Description
a
Include all reports in the listing (including source, cross references, options, lint, loopmarks, common block, and options used during compilation).
c
Listing includes a COMMON block report (lists all common blocks and members of each block).
d
Decompiles (translates) the intermediate representation of the compiler into listings that resemble the format of the source code. You can use these files to examine the restructuring and optimization changes made by the compiler, which can lead to insights about changes you can make to your Fortran source to improve its performance. The compiler produces two decompilation listing files per source file specified on the command line, with these two extensions:
.opt
and.cg
.e
Expands included files in the source listing. This option is off by default.
E
Same as
-h list=e
for Fortran.i
Used with the
-h list=m
option to intersperse loop optimization messages within the loopmark listing. By default, the messages are placed at the bottom of the program unit.l
Lists source code and includes lint style checking. The listing includes the COMMON block report (see the
-h list=c
option for more information about the COMMON block report).m
Produces a source listing with loopmark information. To provide a more complete report, this option automatically enables the
-h negmsg
option to show why loops were not optimized. If you do not require this information, use the-h nonegmsg
option on the same command line. Loopmark information will not be displayed if the-d B
option has been specified.o
Show all options used by the compiler during compilation.
s
Lists source code.
T
Retains file.T after processing rather than deleting it. The file.T can be used to call
ftnlx
directly. For more information, see theftnlx(1)
man page.x
Produces a cross-reference listing.
-h
loop_trips=[tiny|small|medium|large|huge]
Specifies runtime loop trip counts for all loops in a compiled source file. This information is used to better tune optimizations to the runtime characteristics of the application.
-h
map_long_double_to_real16
Maps the
C_LONG_DOUBLE
KIND from theISO_C_BINDING
module to a 128-bit REAL. This allows for interoperability in what would otherwise be an incompatible 80-bit extended precision C format. The corresponding C files must be compiled with themlong-double-128
option.-h
[no]modinline
Directs the compiler to create templates for module procedures and store them in
file.o
,MODULENAME.mod
, ormodulename.mod
. These templates are used by IPA to inline/clone a routine. A USE statement makes the templates available to IPA.Default:
modinline
-h
[no]msgs
Controls whether messages describing optimizations performed are written to
stderr
. Similar information in a more-readable format can be obtained by using the-rm
option instead.Default:
nomsgs
-h
[no]negmsgs
Controls whether messages explaining why optimizations such as vectorization or inlining did not occur are written to
stderr
. The-h negmsgs
option enables the-h msgs
option. The-rm
option enables the-h negmsgs
option.Default:
nonegmsgs
-h
network=nic
Specifies the target machine’s interconnection attribute. The supported values are
gemini
andaries
.-h
[no]omp
Enable or disable compiler recognition of OpenMP directives. Using
-h noomp
is similar to the-h thread0
option, in that it disables OpenMP, but unlike-h thread0
,-h noomp
does not affect autothreading. CrayPE will link in the serial version of LibSci when-hnoomp
is used.-fopenmp
is a synonym for-h omp
.Default:
noomp
-h
[no]omp_simd
Enable or disable compiler recognition of OpenMP SIMD directives. This option may be enabled (
-h omp_simd
or-fopenmp-simd
) when general OpenMP is disabled (-h noomp
or-fno-openmp
), allowing the compiler to take advantage of omp simd constructs for CPU vectorization without enabling CPU threading for OMP parallel constructs. This option may not be disabled (-h noomp_simd
or-fno-openmp-simd
) when general OpenMP is enabled (-h omp
or-fno-openmp
). Specifying-O0
with OpenMP disabled (-h noomp
or-fno-openmp
) will disable OpenMP SIMD recognition (-h noomp_simd
or-fno-openmp-simd
). This option is an alias to-f[no-]openmp-simd
.Default:
omp_simd
-h
[no]omp_trace
Enable or disable the insertion of CrayPat OpenMP tracing calls.
Default:
noomp_trace
-h
[no]overindex
Declares that there exists an array subscript applied to a multidimensional array such that the subscript exceeds the declared bounds of its dimension. Such a subscript still must result in an access to the same multidimensional array object.
Default:
nooverindex
-h
page_align_allocate
The
-h page_align_allocate
option directs the compiler to force allocations of arrays larger than the memory page size to be aligned on a page boundary. This option affects only the ALLOCATE statements of the current source file; therefore it must be specified for each source file where this behavior is desired. Using this option can improve DIRECTIO performance.-h
[no]pattern
Enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to highly optimized routines.
The
-h pattern
option is enabled only for optimization levels-O2
,-h vector2
or higher; there is no way to force pattern matching for lower levels.Specifying
-h nopattern
disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives.Default:
pattern
-h
[no]pgas_runtime
The
-h pgas_runtime
option directs the compiler driver to link with the runtime libraries required when linking programs that use UPC or coarrays. In general, a resource manager job launcher such as aprun or srun must be used to launch the resulting executable.The
-h nopgas_runtime
option prevents this runtime library environment from being added to the link line.Use the
-hnopgas_runtime
option when you have a program that does not use UPC or coarrays and you want to execute it outside of the aprun/srun job launch context. For example, you may want to test a serial program that does not contain any UPC or coarray code on a login or service node, or fork/exec an executable on a compute node. Also, compile non-coarray Fortran using the-hnocaf
option.PGAS behavior is determined by the number of physical cores on the node. For more information, see the
intro_pgas(7)
man page.Default:
pgas_runtime
-h
pic|PIC
Generates position independent code (PIC), which allows a virtual address change from one process to another, as is necessary in the case of shared, dynamically linked objects. The virtual addresses of the instructions and data in PIC code are not known until dynamic link time.
-h
pl=program_library
Create and use a persistent repository of compiler information specified by
program_library
. When used with-h wp
, this option provides application-wide, cross-file, automatic inlining. The-f cray-program-library-path=program_library
option is provided as an alias to-h pl=program_library
to match the HPE CCE C/C++ compiler option.The
program_library
repository is implemented as a directory and the information contained inprogram_library
is built up with each compiler invocation. Any compilation that does not have the-h pl
option will not add information to this repository.Because of the persistence of
program_library
, it is the user’s responsibility to manage it. For example,rm -r program_library
might be added to themake clean target
in an application makefile. Becauseprogram_library
is a directory, userm -r
to remove it.If an application makefile works by creating files in multiple directories during a single build, the
program_library
should be an absolute path, otherwise multiple and incomplete program library repositories will be created. For example, avoid-hpl=./PL.1
and use-hpl=/
fullpath/builddir/PL.1
instead.-h
[no]preferred_vector_width[=64|128|256|512]
Specify the preferred vector width to use when vectorizing loops. This option does not guarantee that the specified vector width will be used, only that it is preferred. The optimizer is still free to choose a smaller width if it is expected to perform better. As the set of acceptable widths is target-sensitive and fairly complicated, the optimizer diagnoses any illegal values.
A value is not required when specifying
nopreferred_vector_width
.-h
profile_generate
Enables instrumenting of source code for CrayPat profile-guided optimization. For more information, see the
intro_craypat(1)
andpat_build(1)
man pages.-h
scalarn
Specifies the level of scalar optimization, where
n
can be one of the following levels:n
Description
0
Disables scalar optimization.
1
Specifies conservative scalar optimization.
2
Specifies moderate scalar optimization.
3
Specifies aggressive scalar optimization.
Default:
scalar2
-h
shared
Create a library which may be dynamically linked at runtime. Note that the preferred invocation is to call the generic
ftn
command with the-shared
option, rather than using this compiler-specific option. See theftn(1)
man page for more information.-h
shortcircuitn
Specifies various levels of short circuit evaluation, which is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the
.AND.
and.OR.
operator.n
can be one of the following levels:n
Description
0
Disables short circuiting of IF and ELSEIF statement logical conditions.
1
Specifies short circuiting of IF and ELSEIF logical conditions only when a PRESENT, ALLOCATED, or ASSOCIATED intrinsic procedure is in the condition.
2
Specifies short circuiting of IF and ELSEIF logical conditions, and it is done left to right. This is the default for architectures without predicated vector support.
3
Specifies short circuiting of IF and ELSEIF logical conditions. It is an attempt to avoid making function calls. If either the left or right operand to
.AND
. and.OR
. Operators contain function calls, short circuit evaluation is performed. This is the default for architectures with predicated vector support.-h
static
Directs the linker to use the static version of the libraries, not the dynamic version of the libraries, to create an executable file. Note that the preferred invocation is to call the generic
ftn
command with the-static
option. See theftn(1)
man page for more information.-h
[no]safe_addr
Provides assurance that most conditionally executed memory references are thread safe, which in turn supports a more aggressive use of speculative writes, thereby improving application performance. If
-h nosafe_addr
is specified, the optimizer performs speculative stores only when it can prove absolute thread safety using the information available within the application code.Default:
safe_addr
-h
[no]summary
Stops the log summary from printing out when any Warnings, Comments, Notes, or Optimization messages are issued. If Errors are issued, a log summary will always print out. If both
-V[V[V]]
and-h nosummary
are specified, the last one specified wins.Default:
summary
-h
threadn
Control the compilation and optimization of OpenMP directives, where
n
is a value from 0 to 3 with 0 being off and 3 specifying the most aggressive optimization. This option is identical to the-O threadn
option. If-h thread1
is specified, it is equivalent to specifying-h nosafe_addr
.Default:
thread2
-h
[no]thread_do_concurrent
The
-h thread_do_concurrent
option permits DO CONCURRENT nests to be threaded unless prohibited by theloop_info prefer_nothread
directive.The
-h nothread_do_concurrent
option disallows DO CONCURRENT nests to be threaded unless forced by theloop_info prefer_nothread
directive.Default:
nothread_do_concurrent
-h
[no]offload_do_concurrent
The
-h offload_do_concurrent
option permits DO CONCURRENT nests to be offloaded to a GPU unless prohibited by theloop_info prefer_nothread_do_concurrent
directive.Default:
nooffload_do_concurrent
-h
unrolln
The
-h unrolln
option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll all loops, unless the NOUNROLL directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.The
n
argument enables you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of the UNROLL directive. Use one of these values for n:n
Description
0
No unrolling. (Ignore all UNROLL directives and do not attempt to unroll other loops.)
1
Attempt to unroll loops that are marked by the UNROLL directive.
2
Attempt to unroll all loops (includes array syntax implied loops), except those marked with the NOUNROLL directive.
Default:
unroll2
-h
vectorn
Specifies the level of automatic vectorizing to be performed. Vectorization results in dramatic performance improvements with a small increase in object code size. Vectorization directives are unaffected by this option.
The valid values for
n
are:n
Description
0
Minimal automatic vectorization. Characteristics include low compile time and small compile size. This option is compatible with all scalar optimization levels. The compiler will still vectorize array syntax in order to allow full source level debugging with reasonable performance. When this option is specified in conjunction with
-hfp0
or-hfp1
, then array syntax containing associative floating point or complex operations will not be vectorized.1
Conservative vectorization. The
-h vector1
option is compatible with-h scalar1
,-h scalar2
, and-h scalar3
.2
Moderate vectorization. Loop nests are restructured. The
-h vector2
option is compatible with-h scalar2
or-h scalar3
.3
Aggressive vectorization.
Default:
vector2
-h
vector_classic
Prior to CCE 9.0, the Fortran NOVECTOR directive applied to the rest of the program unit, unless subsequently superseded by a VECTOR directive. Beginning with CCE 9.0, the VECTOR and NOVECTOR directives apply only to the next loop.
The
-h vector_classic
option, if specified, provides the pre-CCE 9.0 behavior and causes the VECTOR and NOVECTOR directives to behave as toggle switches, controlling vectorization for the remainder of the program unit unless superseded by the countervailing directive.Default: not set
-h
wp
Enables the whole program mode. This option causes the compiler backend (IPA, optimizer, code generator) to be invoked at application link time, enabling whole program automatic inlining/cloning and future whole program interprocedural analysis (IPA) optimizations. Requires that
-h pl=program_library
is also specified. The options-h pl=program_library
and-h wp
should be specified on all compiler invocations and on the compiler link invocation.Since the
-h wp
option provides automatic application-wide inlining, the-Oipafrom
option is no longer needed for cross-file inlining and using these two options together is not permitted.Since
-h wp
delays the compiler optimization step until link time,-c
compiles will take less time and the link step will take longer. Normally, this is just a time shift from one build phase to another with roughly the same overall compile time. In some cases increased inlining may cause an increase in overall compile time. Using-h wp
allows the compiler backend to be invoked in parallel during a build. Setting the environment variable NPROC controls the number of concurrent compiler backend invocations and this parallelism may reduce overall compile time.-h ipan
guides heuristics of inlining/cloning expansion while the specification ofpl=program_library
and-h wp
guides location and availability of the candidates for expansion.Default: not set
-h
zero
Initializes all undefined local stack variables to 0 (zero). If a user variable is of type character, it is initialized to NUL. The variables are initialized upon each execution of the procedure. This option is identical to the
-e0
option.Default: not set
-h
[no]zeroinc
Cause the compiler to assume that a constant increment variable (CIV) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV.
-h zeroinc
can cause less strength reduction to occur in loops that have variable increments.Default:
nozeroinc
-I incldir
Specifies a directory to be searched for files named in
INCLUDE
lines and#include
directives. You must specify an-I
option for each directory you want searched. Directories can be specified inincldir
as full pathnames or as pathnames relative to the working directory.If no
-I
is specified, only the working directory and system directories are searched.-J dir_name
Specifies an alternate directory for the module information files. The compiler puts the
.mod
files in this directory and searches for.mod
files in this directory. The compiler will search for modules stored in the directories specified using the-J dir_name
option for the current compilation automatically; it is not necessary to use the-p
option explicitly to make the compiler do this.-J
is not allowed with-dm
.By default, the files are written to the current working directory.
-K trap=opt[,opt]
Enable traps for the specified exceptions. By default, no exceptions are trapped. Enabling traps using this option also has the effect of setting
-h fp_trap
.If the specified options contradict each other, the last option has priority. For example,
-Ktrap=none,fp
is equivalent to-Ktrap=fp
.This option does not affect compile time optimizations; it detects runtime exceptions. This option is processed only at link time and affects the entire program; it is not processed when compiling subprograms. Therefore, traps may be set using this command line option at the beginning of execution of the main program only. The program may subsequently change these settings by calling intrinsic or library procedures. Use of this option may require the specification of
-hfp_trap
when compiling other files of the application.opt
Exceptions
denorm
Trap on denormalized operands.
divz
Trap on divide-by-zero.
fp
Trap on
divz
,inv
, orovf
exceptions.inexact
Trap on inexact result (i.e. rounded result). Enabling traps for inexact results is not recommended.
inv
Trap on invalid operation.
none
Disables all traps (default).
ovf
Trap on overflow (i.e. the result of an operation is too large to be represented).
unf
Trap on underflow (i.e. the result of an operation is too small to be represented).
-l libname
Directs the compiler driver to search for the specified object library file when linking an executable. To request more than one library file, specify multiple
-l
options.When statically linking, the compiler driver searches for libraries by prepending
ldir/lib
on the front oflibname
and appending.a
on the end of it, for eachldir
that has been specified by using the-L
option. It uses the first file it finds.When dynamically linking, the library search process is similar to the static case, with a few differences. The compiler driver searches for libraries by prepending
ldir/lib
on the front oflibname
and appending.so
on the end of it, for eachldir
that has been specified by using the-L
option. If a matching.so
is not found, the compiler driver replaces.so
with.a
and repeats the process from the beginning. It uses the first file it finds.There is no search order dependency for libraries.
If you specify personal libraries by using the
-l
command line option, those libraries are added before the default HPE CCE library list. For example, when the following command line is issued, the linker looks for a library namedlibmylib.a
(following the naming convention) and adds it to the top of the list of default libraries.% ftn -l mylib target.f
-L ldir
Changes the
-l
option search algorithm to look for library files in directoryldir
. To request more than one library directory, specify multiple-L
options.Note that multiple
-L
options are treated cumulatively as if allldir
arguments appeared on one-L
option preceding all-l
options. Therefore, do not attempt to link functions of the same name from different libraries through the use of alternating-L
and-l
options.The compiler driver searches for library files in directory ldir before searching the default directories
/opt/ctl/libs
and/lib
. For example, when statically linking, if-L ../mylib, -L /loclib
, and-l m
are specified, the compiler driver searches for the following files and uses the first one found:../mylibs/libm.a /loclib/libm.a /opt/ctl/libs/libm.a /lib/libm.a
-m msg_lvl
Specifies the minimum compiler message levels to enable. The following list shows the integers to specify in order to generate each type of message and which messages are generated by default. Use the
explain(1)
command to view a message explanation.The
-m
messages types are as follows:msg_lvl
Message Types Enabled
0
Error, Warning, Caution, Note, and Comment
1
Error, Warning, Caution, and Note
2
Error, Warning, and Caution
3
Error and Warning (default)
4
Error
-M msgs
The
-M msgs
option suppresses messages at the Warning, Caution, Note, and Comment levels and can change the default message severity to an Error or a Warning level. You cannot suppress or alter the severity of Error-level messages with this option.To suppress messages, specify one or more integer numbers that correspond to the HPE Cray Fortran Compiler messages you want to suppress. To specify more than one message number, specify a comma (but no spaces) between the message numbers. For example,
-M 110,300
suppresses messages 110 and 300.To change a message’s severity to an Error level or a Warning level, specify an
E
(for Error) or aW
(for Warning) and then the number of the message. For example, consider the following option:-M 300,E600,W400
.This specification results in the following:
Message 300 is disabled and is not issued, provided that it is not an Error-level message by default. Error-level messages cannot be suppressed and cannot have their severity downgraded.
Message 600 is issued as an Error-level message, regardless of its default severity.
Message 400 is issued as a Warning-level message, provided that is it not an Error-level message by default.
-N col
Specifies the line width for fixed and free-format source lines. The value used for col specifies the maximum number of columns per line.
For fixed form sources, col can be set to 72, 80, 132, 255, or 1023.
For free form sources, col can be set to 132, 255, or 1023.
Characters in columns beyond the col specification are ignored.
By default, lines are 72 characters wide for fixed-format sources and have no line size limit for free-form source files.
There is no line size limit for free-format source files.
-O opt[,opt]...
Specifies optimization features. The opt values
0
,1
,2
, and3
(fast
) enable you to specify increasing general levels of optimization. The other opt values enable you to select specific optimization features. All-O
options with the exception of0
,1
,2
, and3
have corresponding-h
options available.The
-O0
,-O1
,-O2
, and-O3
(-Ofast)
specifications do not directly correspond to the numeric optimization levels for scalar optimization and vectorization. For example, specifying-O3
does not necessarily enablescalar3
andvector3
. HPE reserves the right to alter the specific optimizations performed at these levels from release to release. You can use the-eo
option or theftnlx
command to display the optimization options used during compilation.The valid
opt
values are:opt
Optimization Provided
-O0
Disables all optimizations including floating point optimizations and OpenACC acceleration. (Equivalent to specifying
-hfp0
and-hnoacc
). To disable optimizations but leave acceleration enabled, specify-O0 -hacc
. Some informational messages may not be issued.-O1, -O2, -O3
-Ofast
is a synonym for-O3
Default:
O2
[no]aggress
Cause the compiler to treat a subroutine, function, or main program as a single optimization region. Doing so can improve the optimization of large program units but also increases compile time and size.
Default:
noaggress
[no]autoprefetch
Controls automatic prefetch optimization. Does not affect
loop_info [no]prefetch
directive.Default:
autoprefetch
[no]autothread
Enables or disables autothreading.
Default:
noautothread
cachen
Specify the level of automatic cache management, where
n
can be one of the following values:0: Specifies no automatic cache management; all memory references are allocated to cache. Both automatic cache blocking and manual cache blocking (by use of the BLOCKABLE directive) are shut off. Characteristics include low compile time. This option is compatible with all optimization levels.
1: Specifies conservative automatic cache management. Characteristics include moderate compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the predicted cache footprint of the symbol in isolation is small enough to experience reuse.
2: Specifies moderately aggressive automatic cache management. Characteristics include moderate compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the predicted state of the cache model is such that the symbol will be reused.
3: Specifies aggressive automatic cache management. Characteristics include potentially high compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the allocation of the symbol to the cache is predicted to increase the number of cache hits.
fast
Synonym for
-O3
.fpn
Controls the level of floating point optimizations, where
n
is a value between 0 and 3, with 0 giving the compiler minimum freedom to optimize floating point operations and 3 giving it maximum freedom. The higher the level, the less the floating point values conform to the IEEE standard.When
-hfp[0,1]
is specified, it also has the effect of setting-hfp_trap
.Default:
fp2
.fusionn
Control loop fusion globally and change the assertiveness of the FUSION directive.
Loop fusion can improve the performance of loops, although in some rare cases it may degrade overall performance.
The
n
argument enables you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of the FUSION directive.n
can be one of the following values:0: No fusion (ignore all FUSION directives and do not attempt to fuse other loops)
1: Attempt to fuse loops that are marked by the FUSION directive.
2: Attempt to fuse all loops (includes array syntax implied loops), except those marked with the NOFUSION directive.
Default:
fusion2
ipa
levelControl level of interprocedural analysis (IPA) which implies the control over the level of automatic inlining and cloning.
-O ipa
level guides heuristics of inlining/cloning expansion while the specification of-O ipafrom=source
, orpl=program_library
and-hwp
guides location and availability of the candidates for expansion.Inlining is the process of replacing a user procedure call with the procedure definition itself. This saves subprogram call overhead and may allow better optimization of the inlined code. If all calls within a loop are inlined, the loop becomes a candidate for parallelization.
Cloning is a situation in which a procedure is duplicated with modifications such that it will run more efficiently. For example, the compiler will clone a procedure for a specific call site when there are constant actual arguments present in that call site. When the clone is made, the dummy arguments are replaced with the constant actual arguments, and the original call to the procedure is replaced with a call to the duplicate copy.
When
-O ipa
level is used alone, the candidates for expansion are all those functions that are present in the input file to the compile step. If-O ipa
level is used in conjunction with-O ipafrom=source
or in conjunction withpl=program_library
and-hwp
, the candidates for expansion are those functions present in source orprogram_library
, respectively. The valid values for level are:0: Disable interprocedural analysis and optimizations. All inlining and cloning compiler directives are ignored.
1: Directive IPA. Inlining/cloning is attempted for call sites and routines that are under the control of a compiler directive.
2: Inlining. Inline a call site to an arbitrary depth as long as the expansion does not exceed some compiler-determined threshold. The call site must flatten for any expansion to occur. The call site is said to “flatten” when there are no calls present in the expanded code. The call site must reside within the body of a loop and the entire loop body must flatten. A loop body is said to “flatten” when all call sites within the body of the loop are flattened. Includes level 1.
3: (Default) Constant actual argument inlining and tiny routine inlining. This includes levels 1 and 2, plus any call site that contains a constant actual argument. Additionally, any call nest(regardless of location) that is below some small compiler-determined threshold will be inlined, provided that call nest flattens completely. Cloning directives are recognized.
4: Aggressive inlining. This includes levels 1, 2, and 3. Additionally, a call site does not have to reside in a loop body to inline, nor does the call site necessarily have to flatten.
5: Aggressive inlining and aggressive cloning. Includes levels 1, 2, 3, and 4, plus routine cloning is attempted if inlining fails at a given call site.
ipafrom=source[:source]...
Explicitly indicate the procedures to consider for inlining/cloning.
The source arguments identify each file or directory that contains the functions to consider for inlining/cloning. Whenever a call is encountered in the input program that matches a function in source, inlining/cloning is attempted for that call site.
Note that blank spaces are not allowed on either side of the equal sign.
All inlining directives are recognized with explicit inlining.
Note that routines in source are not actually linked with the final program. They are simply templates for the inliner. To have a routine contained in source linked with the program, you must include it in an input file to the compilation.
The valid source arguments are:
Fortran source files: The routines in Fortran source files are candidates for inline expansion and must contain error-free code. Source files that are acceptable for inlining are files that have one of the following extensions:
.f
,.F
,.for
,.FOR
,.f90
,.F90
,.f95
,.F95
,.f03
,.F03
,.f08
,.F08
,.f18
,.F18
,.ftn
, or.FTN
module files:
MODULENAME.mod
andmodulename.mod
files that contain precompiled inlining templates can be specified. However, this is unnecessary, as the compiler will find these files when resolving the USE statement during compilation.Directories: A directory containing any of the Fortran source or Module files described above.
loop_trips=[tiny|small|medium|large|huge]
Specifies runtime loop trip counts for all loops in a compiled source file. This information is used to tune optimizations to the runtime characteristics of the application.
Default: none
[no]msgs
Cause the compiler to write optimization messages to
stderr
.Similar information in a more-readable format can be obtained by using the
-h list=m
(-rm
) option instead. Specifying the-h list=m
option enables-h msgs
.Default:
nomsgs
[no]negmsgs
Cause the compiler to generate messages to
stderr
to indicate why optimizations such as vectorization or inlining did not occur in a given instance.The
-O negmsgs
option enables the-O msgs
option. The-rm
option enables the-O negmsgs
option.Default:
nonegmsgs
nointerchange
Inhibit the compiler’s attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default. Specifying the
-O nointerchange
option is equivalent to specifying aNOINTERCHANGE
directive prior to every loop. To disable loop interchange on individual loops, use theNOINTERCHANGE
directive.[no]omp
Enable or disable compiler recognition of OpenMP directives. Using
-O noomp
is similar to the-O thread0
option, in that it disables OpenMP, but unlike-O thread0
,noomp
does not affect autothreading. The-O noomp
option is identical to the-h [no]omp
option.Default:
noomp
[no]overindex
Declares that there exists an array subscript applied to a multidimensional array such that the subscript exceeds the declared bounds of its dimension. Such a subscript still must result in an access to the same multidimensional array object.
Default:
nooverindex
[no]pattern
Enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to highly optimized routines.
The
-O pattern
option is enabled only for optimization levels-O2
,-O vector2
or higher; there is no way to force pattern matching for lower levels.Specifying
-O nopattern
disables pattern matching and causes the compiler to ignore thePATTERN
andNOPATTERN
directives.Default:
pattern
scalarn
Specifies the level of scalar optimization, where
n
can be one of the following levels:0: Disables scalar optimization.
1: Specifies conservative scalar optimization.
2: (Default) Specifies moderate scalar optimization.
3: Specifies aggressive scalar optimization.
shortcircuitn
Specifies various levels of short circuit evaluation, which is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When enabled, the compiler attempts short circuit evaluation of logical expressions that are used in
IF
statement scalar logical expressions. This evaluation is performed on the.AND.
and.OR.
operator. The valid values forn
are:0: Disables short circuiting of
IF
andELSEIF
statement logical conditions.1: Specifies short circuiting of
IF
andELSEIF
logical conditions only when aPRESENT
ALLOCATED
, orASSOCIATED
intrinsic procedure is in the condition.2: Specifies short circuiting of
IF
andELSEIF
logical conditions, and it is done left to right. This is the default for x86-64 systems.3: Specifies short circuiting of
IF
andELSEIF
logical conditions. It is an attempt to avoid making function calls. If either the left or right operand toAND
andOR
operators contain function calls, short circuit evaluation is performed. This is the default for target cpus other than x86-64.[no]safe_addr
Provides assurance that most conditionally executed memory references are thread safe, which in turn supports a more aggressive use of speculative writes, thereby improving application performance. If
-O nosafe_addr
is specified, the optimizer performs speculative stores only when it can prove absolute thread safety using the information available within the application code.Default:
-O safe_addr
threadn
Control the compilation and optimization of OpenMP directives, where
n
is a value from 0 to 3, with 0 being off and 3 specifying the most aggressive optimization.Default:
-O thread2
.The valid values for
n
are:0: No autothreading or OpenMP threading. The
-O thread0
option is similar to-O noomp
, but-O noomp
disables OpenMP only and does not affect autothreading.1: Specifies strict compliance with the OpenMP standard for directive compilation. Strict compliance is defined as no extra optimizations in or around OpenMP constructs. In other words, the compiler performs only the requested optimizations. If
-O thread1
is specified, it is equivalent to specifying-O nosafe_addr
2: OpenMP parallel regions are subjected to some optimizations; that is, some parallel region expansion. Parallel region expansion is an optimization that merges two adjacent parallel regions in a compilation unit into a single parallel region.
3: Full optimization: loop restructuring, including modifying iteration space for static schedules (breaking standard compliance). Reduction results may not be repeatable.
unrolln
The
-O unrolln
option globally controls loop unrolling and changes the assertiveness of theUNROLL
directive. By default, the compiler attempts to unroll all loops, unless theNOUNROLL
directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.The
n
argument enables you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of theUNROLL
directive.Default:
-O unroll2
.The valid values for
n
are:0: No unrolling. Ignore all
UNROLL
directives and do not attempt to unroll other loops.1: Attempt to unroll loops that are marked by the
UNROLL
directive.2: Attempt to unroll all loops (includes array syntax implied loops), except those marked with the
NOUNROLL
directive.vectorn
Specifies the level of automatic vectorizing to be performed. Vectorization results in dramatic performance improvements with a small increase in object code size. Vectorization directives are unaffected by this option.
Default:
-O vector2
.The valid values for
n
are:0: Minimal automatic vectorization. Characteristics include low compile time and small compile size. This option is compatible with all scalar optimization levels. The compiler will still vectorize array syntax in order to allow full source level debugging with reasonable performance. When this option is specified in conjunction with
-Ofp0
or-Ofp1
, array syntax containing associative floating point or complex operations will not be vectorized.1: Conservative vectorization. The
-O vector1
option is compatible with-O scalar1
,-O scalar2
, and-O scalar3
.2: Moderate vectorization. Loop nests are restructured. The
-O vector2
option is compatible with-O scalar2
or-O scalar3
.3: Aggressive vectorization.
[no]zeroinc
Cause the compiler to assume that a constant increment variable (CIV) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable
J
, the statementJ = J + K
, whereK
can be equal to zero,J
is a CIV.-O zeroinc
can cause less strength reduction to occur in loops that have variable increments.Default:
nozeroinc
.-o out_file
Override the default executable file name,
a.out
, with the name specified in theout_file
argument.If both the
-o out_file
and-c
options are specified, the link step is disabled and the binary file is written toout_file
.If both the
-o out_file
and-eP
(preprocess only) options are specified, the preprocessed source is written toout_file
.-p module_site
Specify where to look for Fortran modules to satisfy
USE
statements. Themodule_site
argument specifies the name of a file or directory to search for modules. Themodule_site
specified can be a.mod
file,.o
(object) file,.a
(archive) file, or a directory.-Q path
Specifies the directory to contain all saved nontemporary files from this compilation (for example, all
.o
and.mod
files). Specific file types (such as.o
files) are saved to a different directory if the-b
,-J
,-o
, or-eS
options are used.By default, this option is disabled and the compiler puts all nontemporary files in the current working directory.
-r list_opt
Produces a listing file. Note that the
-rd
argument does not invoke theftnlx(1)
command. All others do.The
-r list_opt
command is equivalent to-h list=option
.The valid
list_opt
arguments are:a
Includes all reports in the listing (including source, cross references, options, lint, loopmarks, common block, and options used during compilation).
c
Listing includes a
COMMON
block report (lists all common blocks and members of each block).d
Decompiles (translates) the intermediate representation of the compiler into listings that resemble the format of the source code. You can use these files to examine the restructuring and optimization changes made by the compiler, which can lead to insights about changes you can make to your Fortran source to improve its performance.
The compiler produces two decompilation listing files, with these extensions, per source file specified on the command line:
.opt
and.cg
.e
Expands included files in the source listing. This option is off by default.
E
Same as
-re
.i
Used with the
-rm
option to intersperse loop optimization messages within the loopmark listing. By default, the messages are placed at the bottom of the program unit.l
Lists source code and includes lint style checking. The listing includes the
COMMON
block report (see the-rc
option for more information about theCOMMON
block report).m
Produces a source listing with loopmark information. To provide a more complete report, this option automatically enables the
-O negmsg
option to show why loops were not optimized. If you do not require this information, use the-O nonegmsg
option on the same command line. Loopmark information will not be displayed if the-dB
option has been specified.o
Show all options used by the compiler during compilation.
s
Lists source code.
T
Retains
file.T
afte r processing rather than deleting it. Thefile
.T can be used to callftnlx
directly.x
Produces a cross-reference listing.
-R runchk
Specifies any of a group of runtime checks for your program. To specify more than one type of checking, specify consecutive
runchk
arguments: for example,-R bs
. By default, no runtime checks are performed.The valid
runchk
arguments are:b
Enables checking of array bounds. Bounds checking is not performed on arrays dimensioned as (1). Enables
-Ooverindex
.c
Enables conformance checking of array operands in array expressions.
d
Enables a run time check for the
!dir$ collapse
directive and checks the validity of theloop_info
count information.p
Generates run time code to check the association or allocation status of referenced
POINTER
variables,ALLOCATABLE
arrays, or assumed-shape arrays.s
Enables checking of character substring bounds.
-s size
The
-s size
option allows you to modify the sizes of variables, literal constants, and intrinsic function results declared as type REAL, INTEGER, LOGICAL, COMPLEX, DOUBLE COMPLEX, or DOUBLE PRECISION. The valid values for size are:byte_pointer
Applies a byte scaling factor to integers used in pointer arithmetic involving Cray pointers. That is, Cray pointers are moved on byte instead of word boundaries.
default32
Adjusts the data size of default types as follows:
32 bits: REAL, INTEGER, LOGICAL
64 bits: COMPLEX, DOUBLE PRECISION
128 bits: DOUBLE COMPLEX
The data sizes of integers and logicals that use explicit kind and star values are not affected by this option. However, they are affected by the
-eh
option.
default64
Adjusts the data size of default types as follows:
64 bits: REAL, INTEGER, LOGICAL
64 bits: DOUBLE PRECISION (implied
-dp
)128 bits: COMPLEX
128 bits: DOUBLE COMPLEX (implied
-dp
)
If you use
-s default64
at compile time, you must also specify this option when invoking theftn
command for linking.integer32
Adjusts the default data size of default integers and logicals to 32 bits.
real32
Adjusts the data size of default real types as follows:
32 bits: REAL
64 bits: DOUBLE PRECISION
64 bits: COMPLEX
128 bits: DOUBLE COMPLEX
The data sizes of integers and logicals that use explicit kind and star values are not affected by this option. However, they are affected by the
-eh
option.real64
Adjusts the data size of default real types as follows:
64 bits: REAL
64 bits: DOUBLE PRECISION (implied
-dp
)128 bits: COMPLEX
128 bits: DOUBLE COMPLEX (implied
-dp
)
If you use
-s default64
at compile time, you must also specify this option when invoking theftn
command for linkingword_pointer
Applies a word scaling factor to integers used in pointer arithmetic involving Cray pointers. That is, Cray pointers are moved on word instead of byte boundaries.
The default data size options (for example,
-s default64
) do not affect the size of data that explicitly declare the size of the data (for example,REAL(KIND=4) X
).REAL(KIND=16)
andCOMPLEX(KIND=16)
support 128-bit floating point and 256-bit complex types, sometimes referred to as quad-precision.-S
Generates assembly language output and saves it in
file.s
. Has the same effect as-eS
. By default, this option is off.-T
Disables the compiler but displays all options currently in effect. By default, this option is off.
-U identifier[,identifier]...
This option undefines variables used for source preprocessing. This option removes the initial definition of a predefined macro or sets a user predefined macro to an undefined state.
The
-D identifier[=value]
option defines variables used for source preprocessing. If both-D
and-U
are used for the same identifier, in any order, the identifier is undefined.This option is ignored unless one of the following conditions is true:
The Fortran input source file is specified as file.extension, where extension is one of the following:
.F
,.FOR
,.F90
,.F95
,.F03
,.F08
,.F18
, or.FTN
.The
-eP
or-eZ
options have been specified.
-v
Prints information about each compilation phase to the standard error file (
stderr
). The information contains what the compiler, lister, and linker is doing and what it is calling. By default, this option is off.-V
Directs each compilation phase to send a message containing version information to the standard error file (
stderr
). You can specify this option without specifying an input file name; that is, specifyingftn -V
is valid. By default, this option is off.--version
Directs each compilation phase to send a message containing version information to the
stdout
. You can specify this option without specifying an input file name; that is, specifyingftn --version
is valid. Note that-version
is incorrect; it must be--version
. By default, this option is off.-W phase,"opt..."
Passes arguments directly to a phase of the compiling system.
The valid values for phase are:
phase
System Phase
Command
0
(zero)Compiler
ftn
a
Assembler
as
c
Linker
arg
l
Linker
ftnlx
r
Lister
ftnlx
x
Assembler
arg
Arguments to be passed to system phases can be entered in either of two styles. If spaces appear within a string to be passed, the string is enclosed in double quotes. When double quotes are not used, spaces cannot appear in the string. Commas can appear wherever spaces normally appear; an option and its argument can be either separated by a comma or not separated. If a comma is part of an argument, it must be preceded by the
\
character. For example, any of the following command lines would send-e name
to the linker:% ftn -Wl,"-e name" file.F08
% ftn -Wl,-e,name file.F08
% ftn -Wl,"-ename" file.F08
-Wa,"assembler_opt"
passes theassembler_opt
option directly to theas
command, directing it to enable all pseudos, regardless of location field name. This option is meaningful to the system only whenfile.s
is specified as an input file on the command line. For more information about assembler options, see theas(1)
man page.The
-Wr,"lister_opt"
option passeslister_opt
directly to theftnlx
command. For example, specifying-Wr,"-o cfile.o"
passes the argumentcfile.o
directly to theftnlx
command’s-o
option; this directslister_opt
to override the default output listing and put the output file incfile.o
. If specifying the-Wr,"lister_opt"
option, specify the-h list_opt
option in addition to the-Wr.
For more information about options, see theftnlx(1)
man page.The
-Wl,-rpath ldir
option changes the run time library search algorithm to look for files in directoryldir
. To request more than one library directory, specify multiple-rpath
options. Note that a library may be found at link time with an-L
option, but may not be found at run time if a corresponding-rpath
option was not supplied on the link line. Also note that the compiler driver does not pass the-rpath
option to the linker. You must explicitly specify-Wl
when using this option.At link time, all
ldir
arguments are added to the executable. The dynamic linker will search these paths first for shared dynamic libraries at run time, with one exception. The Linux environment variableLD_LIBRARY_PATH
precedes all other search paths for shared dynamically linked libraries. The use ofLD_LIBRARY_PATH
is discouraged. Caution should be used when settingLD_LIBRARY_PATH
, as doing so will change the shared dynamically linked library search paths for all executable files in your environment.-Wx,"arg"
passes command line arguments to the PTX assembler for OpenACC applications.-Wc,"arg"
passes command line arguments to the CUDA linker for OpenACC applications.Caution should be used when setting
LD_LIBRARY_PATH
. Doing so will change the shared dynamically linked library search paths for all executable files in your environment.-x dirlist
Disables specified directives or specified classes of directives. If specifying a multiword directive, either enclose the directive name in quotation marks or remove the spaces between the words in the directive’s name. By default, no directives or specified classes of directives are disabled.
dirlist
can be one of the following options:Option
Description
acc
All OpenACC API directives.
all
All compiler and OpenMP Fortran API directives.
dec
All
!DEC$
directives.dir
All
!DIR$
directives.directive
One or more compiler directives. If specifying more than one, separate them with commas, as follows:
-x INLINEALWAYS,"NO SIDE EFFECTS",BOUNDS
gcc
All gcc directives.
intel
All Intel directives.
ocl
All Fujitsu directives.
pgi
All PGI directives.
omp
All OpenMP Fortran API directives.
conditional_omp
All
C$
and!$
conditional compilation lines.-Y phase,dirname
Specifies a new directory
(dirname)
from which the designated phase should be executed. Phase can be one or more of the following values:phase
System Phase
Command
0
Compiler
ftn
a
Assembler
as
--
Signifies the end of options. After this symbol, specify the files to be processed.
sourcefile [sourcefile…]
Fortran source files to be processed. Possible suffixes of sourcefile indicate the following:
Option
Description
.f, .for
Fixed-format source, compile
.F, .FOR
Fixed-format source, preprocess, compile
.f90, .f95, .f03, .f08, .f18, .ftn
Free-format source, compile
.o
object file, link
.a
assembler source, assemble
The source form specified using the
-f source_form
option overrides the source form implied by the file suffixes.If only one source file is specified on the command line, the
.o
file is created and deleted. To retain the.o
file, use the-c
option to disable the linker. You can specify object files produced by HPE Cray Fortran, C, C++, or assembler compilers. Object files are passed to the linker in the order in which they appear on the ftn command line. If the linker is disabled by the-b
or-c
option, no files are passed to the linker.The source filename and path lengths are limited depending on system. On Linux, the filename must be shorter than 250 characters. The path length can be up to 4096 characters. If the source file is a symlink, the symlinks must not exceed 40 levels.
\pagebreak
Set Environment Variables to the HPE Cray Fortran Compiler
Environment variables are predefined shell variables, taken from the execution environment, that determine some of the shell characteristics. Several environment variables pertain to the HPE Cray Fortran compiler. The HPE Cray Fortran compiler recognizes general and multiprocessing environment variables.
The multiprocessing variables in the following sections affect the way the program will perform on multiple processors. Use environment variables to tune the system for parallel processing without rebuilding libraries or other system software.
The variables allow for controlling parallel processing at compile time and at run time. Compile time environment variables apply to all compilations in a session.
The following examples show how to set an environment variable:
With the standard shell, enter:
$ CRAY_FTN_OPTIONS=options $ export CRAY_FTN_OPTIONS
With the C shell, enter:
% setenv CRAY_FTN_OPTIONS options
The following sections describe the environment variables recognized by the HPE Cray Fortran compiler.
Many of the environment variables described in this chapter refer to the default system locations of Programming Environment components. If the HPE Cray Fortran Compiler Programming Environment has been installed in a non-default location, see the system support staff for path information.
CRAY_FTN_OPTIONS
The CRAY_FTN_OPTIONS
environment variable specifies additional options to attach to the command line. This option follows the options specified directly on the command line. File names cannot appear. These options are inserted at the rightmost portion of the command line before the input files and binary files are listed. This allows the environment variable to be set once and have the specified set of options used in all compilations. This is especially useful for adding options to compilations done with build tools.
For example, assume that this environment variable was set as follows:
% setenv CRAY_FTN_OPTIONS -G0
With the variable set, the following two command line specifications are equivalent:
% ftn -c t.f
% ftn -c -G0 t.f
FORTRAN_MODULE_PATH
As with the HPE Cray Fortran compiler -p module_site
command line option, this environment variable allows for the specification of the files or the directory to search for the modules to use. The files specified can be a .mod file, .o (object) file, .a (archive) file, or a directory. The compiler appends the contents specified by the FORTRAN_MODULE_PATH
environment variable to anything specified with the -p module_site
command line option.
Since the FORTRAN_MODULE_PATH
environment variable can specify multiple files and directories, a colon separates each path as shown in the following example:
% set FORTRAN_MODULE_PATH='path1 : path2 : path3'
LISTIO_PRECISION
The LISTIO_PRECISION
environment variable controls the number of digits of precision printed by list-directed output. The LISTIO_PRECISION
environment variable can be set to FULL or PRECISION.
FULL prints full precision (default).
PRECISION prints x or x+1 decimal digits, where x is the value of the PRECISION intrinsic function for a given real value. This is a smaller number of digits, which usually ensures that the last decimal digit is accurate to within 1 unit. This number of digits is usually insufficient to assure that subsequent input will restore a bit-identical floating-point value.
NLSPATH
The NLSPATH environment variable specifies the message system library catalog path. This environment variable affects compiler interactions with the message system. For more information about this environment variable, see the catopen(3)
man page.
NPROC
The NPROC environment variable specifies the maximum number of processes to be run. Setting NPROC to a number other than 1 can speed up a compilation if machine resources permit.
The effect of NPROC is seen at compilation time, not at execution time. NPROC requests a number of compilations to be done in parallel. It affects all the compilers and also the make
command.
For example, assume that NPROC is set as follows:
setenv NPROC 2
The following command is entered:
ftn -o t main.f sub.f
In this example, the compilations from .f files to .o files for main.f and sub.f happen in parallel, and when both are done, the link step is performed. If NPROC is unset, or set to 1, main.f is compiled to main.o; sub.f is compiled to sub.o, and then the link step is performed.
The NPROC can be set to any value, but large values can overload the system. For debugging purposes, NPROC should be set to 1. By default, NPROC is 1.
ZERO_WIDTH_PRECISION
The ZERO_WIDTH_PRECISION
environment variable controls the field width when field width w of Fw.d is zero on output. The ZERO_WIDTH_PRECISION
environment variable can be set to PRECISION or HALF.
PRECISION specifies that full precision will be written. This is the default.
HALF specifies that half of the full precision will be written.
Run Time Environment Variables
Run time environment variables allow for adjusting the following elements of the run time environment:
CRAY_ACC_DEBUG
Write accelerator-related activity to stdout for debugging purposes. Valid output levels range from 0, which indicates no output, through 3, which indicates verbose.
Default: 0
CRAY_AUTO_APRUN_OPTIONS
Default options for automatic
aprun
. See theaprun(1)
man page.
CRAY_RANK_THREAD_PREFIX
Prepend a string identifying mpi rank and omp thread id to each line written to stdout and stderr.
CRAY_MALLOPT_OFF (Only relevant if -hsystem_alloc is specified)
If set, then the system default mallopt parameters are used, instead of the compiler default parameters. For most programs, run time performance is improved by using the compiler defaults, but more memory may be used.
FORMAT_TYPE_CHECKING
The FORMAT_TYPE_CHECKING environment variable specifies various levels of conformance between the data type of each I/O list item and the formatted data edit descriptor.
When set to RELAXED, the run time I/O library enforces limited conformance between the data type of each I/O list item and the formatted data edit descriptor.
When set to STRICT77, the run time I/O library enforces strict FORTRAN 77 conformance between the data type of each I/O list item and the formatted data edit descriptor.
When set to STRICT90 or STRICT95, the run time I/O library enforces strict Fortran 90/95 conformance between the data type of each I/O list item and the formatted data edit descriptor.
See the following tables:
MALLOC_MMAP_MAX_ (Only relevant if -hsystem_alloc is specified)
Specifies the maximum number of memory chunks to allocate with mmap. The compiler default value is 0. For most programs, run time performance is improved by using the compiler default, but more memory may be used.
MALLOC_TRIM_THRESHOLD_ (Only relevant if -hsystem_alloc is specified)
Specifies the minimum size of the unused memory region at the top of the heap before the region is returned to the operating system. The compiler default value is 536870912 bytes. For most programs, run time performance is improved by using the compiler default, but more memory may be used.
NO_STOP_MESSAGE
If set, and if the STOP stop_code statement in the Fortran code does not specify the optional stop_code, then STOP messages are not produced when this statement is executed.
PGAS_ERROR_FILE
Specifies the location to which
libpgas
(the library which provides an interface to the internal system network) error messages are written. The default is stderr. If stdout is specified, errors will be written to standard output.
TMPDIR
Compiler temporary files and user scratch files are placed in the directory specified by the TMPDIR environment variable.
CRAYLIBS_ARCH_OVERRIDE
Override the default HPE Cray math library run time selection and specify the library to use by CPU architecture. The valid options are:
ivybridge
,sandybridge
,haswell
,broadwell
,mic-knl
,x86-skylake
,x86-cascadelake
,x86-naples
, orarm-thunderx2
.Can be used to specify that a lowest-common-denominator math library be used instead of the default selection, thus ensuring that identical computations produce identical results regardless of the type of compute node CPU actually used. The trade-off is that specifying an older library may affect performance on a newer CPU. For example, if
ivybridge
is specified, the code will run and produce identical results on ahaswell
compute node, but performance may be reduced.Default: If not set, the library specific to the type of CPU selected at run time is used.
aprun
Resource Limits
The aprun
command always forwards its own core and cpu resource limits (RLIMIT_CPU
and RLIMIT_CORE
) to the compute nodes where those limits are set for the application. If a -m
value is specified, RLIMIT_RSS
is also forwarded.
If the APRUN_XFER_LIMITS
run time environment variable is set to a non-zero value, the following resource limits are also forwarded:
RLIMIT_FSIZE
RLIMIT_DATA
RLIMIT_STACK
RLIMIT_RSS
RLIMIT_NPROC
RLIMIT_NOFILE
RLIMIT_MEMLOCK
RLIMIT_AS
RLIMIT_LOCKS
RLIMIT_SIGPENDING
RLIMIT_MSGQUEUE
RLIMIT_NICE
RLIMIT_RTPRIO
This forwarding is disabled by default.
This forwarding of user resource limits can cause problems on systems where the login node’s limits are more restrictive than the default compute node limits.
\pagebreak
HPE Cray Fortran Directives
Directives are instructions that may be inserted into source code in order to specify certain kinds of special processing to be performed by the compiler during compilation.
Directives are not Fortran language statements. Directives are often compiler-specific, and if the HPE CCE compiler encounters a directive that is not supported by HPE CCE, the compiler will generate a message, ignore the directive, and continue with the compilation.
HPE Cray Fortran Directive Use
A directive line begins with the characters CDIR$
or !DIR$
. How to specify a directive depends on the source form being used.
If using fixed source form, indicate a directive line by placing
CDIR$
or!DIR$
in columns 1 through 5. If the compiler encounters a non-blank character in column 6, the line is assumed to be a directive continuation line. Columns 7 and beyond can contain one or more directives. Characters entered in columns beyond the default column width are ignored.If using free source form, indicate a directive by placing
!DIR$
followed by a space and then one or more directives. If the position following the!DIR$
contains a character other than a blank space, tab, or newline character, the line is assumed to be a continuation line. E.g., the asterisk (*) character in column 6 on the second line in the following example indicates that it is a continuation of the first line:!DIR$ NOSIDEEFFECTS !DIR$*ab
Note the following:
The
!DIR$
need not start in column 1, but it must be the first text on the line.The
FIXED
andFREE
directives must appear alone on a directive line and cannot be continued.Do not use source preprocessor (
#
) directives within multiline compiler directives.
To specify more than one directive on a line, separate the directives with commas. Some directives require that one or more arguments be specified; when specifying a directive of this type, no other directive can appear on the line.
Spaces can precede, follow, or be embedded with a directive, regardless of the source form.
Code portability is maintained by using the !DIR$
form of the directive. In the following example, the !
character in column 1 causes other compilers to treat the HPE Cray Fortran compiler directive as if it is a comment:
A=10
!DIR$ NOVECTOR
DO 10,I=1,10...
Range and Placement of Directives
FIXED
and FREE
directives can appear anywhere in the source code. All other directives must appear within the program unit where they are to be applied.
The following directives must be placed in the declarative portion of a program unit and apply only to that program unit:
CACHE
CACHE_NT
COPY_ASSUMED_SHAPE
IGNORE_TKR
MEMORY
NAME
NOSIDEEFFECTS
SAME_TBS
STACK
WEAK
The following directives toggle a compiler feature on or off at the point at which the directive appears in the code. These directives remain in effect until the opposite directive appears, the directive is reset, or until the end of the program unit, at which time the command line settings become the default for the remainder of the compilation:
[NO]BOUNDS
[NO]CLONE
[NO]COLLAPSE
[NO]FUSION
[NO]INLINE
[NO]PATTERN
[NO]PIPELINE
[NO]UNROLL
[NO]VECTOR
RESETCLONE
and RESETINLINE
apply at the point at which they appear in the code and reset cloning or inlining back to the defaults.
The SUPPRESS
directive applies at the point at which it appears.
The following directives apply only to the next loop or block of code encountered lexically:
BLOCKABLE
BLOCKINGSIZE|NOBLOCKING
CONCURRENT
HAND_TUNED
[NO]INTERCHANGE
IVDEP
NEXTSCALAR
NOFISSION
PERMUTATION
PREFERVECTOR
PROBABILITY
SAFE_ADDRESS
SAFE_CONDITIONAL
LOOP_INFO
The following directives alter the status of entities in ways that affect compilation. They do not apply to particular ranges of code.
IGNORE_TKR
INLINEALWAYS|INLINENEVER
CLONEALWAYS|CLONENEVER
NAME
NOSIDEEFFECTS
Interaction with the Command Line
Note the following interactions between directives and the ftn
command line options.
-x
The -x
option accepts one or more directives as arguments. Directives specified with the -x
option are ignored during compilation. To ignore all compiler directives, specify -x all
.
-O0
The -O0
option disables all compiler optimizations. All scalar optimization, vector optimization, and tasking directives are ignored.
-O ipa
n
The -O ipa0
option disables all inlining and cloning optimizations. All inlining and cloning directives are ignored.
-O scalar
n
The -O scalar0
option disables all scalar optimizations. All scalar optimization directives are ignored.
-O vector
n
The -O vector0
option disables all vector optimizations. All vector optimization directives are ignored.
BLOCKABLE
!DIR$ BLOCKABLE
(do_variable,do_variable[,do_variable]…)
The BLOCKABLE
directive specifies that it is legal and desirable to cache block the subsequent loop nest, even when the compiler has not made such a determination. To be legally blockable, the nest must be perfect (without code between constituent loops), rectangular (trip counts of member loops are fixed over the life time of nest), and fully permutable (loop interchange and unrolling is legal at all levels). This directive both permits and requests blocking of the indicated loop nest.
The directive arguments are a comma-delimited list of two or more loop control variables, do_variable.
If a BLOCKINGSIZE
directive is also provided for the indicated loop, the following rules apply.
If
BLOCKINGSIZE
is at least2
, the indicatedBLOCKINGSIZE
is used.If
BLOCKINGSIZE
is0
, the loop itself is not blocked and it is treated as an inner loop (as part of the nest that traverses the cache block tile).If
BLOCKINGSIZE
is1
, the loop itself is not blocked and it is treated as an outer loop (as a loop in the nest that moves from tile to tile).
When no BLOCKINGSIZE
directive is supplied, the compiler chooses the BLOCKINGSIZE
according to its own heuristics.
Example 1: BLOCKINGSIZE 1 followed by its equivalent
subroutine EX1(A, B, n)
real A(n,n), B(n,n)
!dir$ blockable(i,j)
!dir$ blockingsize(512)
do j = 1, n
!dir$ blockingsize(1)
do i = 1, n-1
A(j,i) = B(j,i) + B(j,i+1)
enddo
enddo
end subroutine EX1
subroutine EX1m(A, B, n)
real A(n,n), B(n,n)
do js = 1, n, 512
do i = 1, n-1
do j = js, min( n, js+511 )
A(j,i) = B(j,i) + B(j,i+1)
enddo
enddo
enddo
end subroutine EX1m
Notice that blockingsize(1) is applied to an inner loop, while blockingsize(0) typically is used for outer loops.
Example 2: BLOCKINGSIZE > 1 at both levels
subroutine EX2(A, B, n)
real A(n,n), B(n,n)
!dir$ blockable(i,j)
!dir$ blockingsize(32)
do j = 1, n-1
!dir$ blockingsize(128)
do i = 1, n-1
A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
enddo
enddo
end subroutine EX2
subroutine EX2(A, B, n)
real A(n,n), B(n,n)
do js = 1, n-1, 32
do is = 1, n-1, 128
do j = js, min( n-1, js+31 )
do i = is, min( n-1, is+127 )
A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
enddo
enddo
enddo
enddo
end subroutine EX2
BLOCKINGSIZE, NOBLOCKING
!DIR$ BLOCKINGSIZE
(n1[,n2])
!DIR$ NOBLOCKING
The BLOCKINGSIZE
directive asserts that the loop following the directive is involved in a cache blocking situation for the primary or secondary cache.
The NOBLOCKING
directive prevents the compiler from involving the subsequent loop in a cache blocking situation.
The BLOCKINGSIZE
directive supports one argument:
n
where n is an integer that indicates the block size. If the loop is involved in a blocking situation, it will have a block size of n1 for the primary cache and n2 for the secondary cache. The compiler attempts to include this loop within such a block but cannot guarantee this inclusion.
For n1, specify a value such that n1
.GE. 0
.For n2, specify a value such that n2
.LE. 230
.If n1 or n2 are
0
, the loop is not blocked, but the entire loop is inside the block.
Example: Using !DIR$ BLOCKINGSIZE
In this example, the compiler makes 20 x 20 blocks when blocking, but it could block the loop nest such that loop K is not included in the file.
SUBROUTINE AMAT(X,Y,Z,N,M,MM)
REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
DO K = 1, N
!DIR$ BLOCKABLE(J,I)
!DIR$ BLOCKING SIZE (20)
DO J = 1, M
!DIR$ BLOCKING SIZE (20)
DO I = 1, MM
Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
END DO
END DO
END DO
END
If K is excluded, you can add a BLOCKINGSIZE(0)
directive just before loop K to specify that the compiler should generate a loop such as the following example:
SUBROUTINE AMAT(X,Y,Z,N,M,MM)
REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
DO JJ = 1, M, 20
DO II = 1, MM, 20
DO K = 1, N
DO J = JJ, MIN(M, JJ+19)
DO I = II, MIN(MM, II+19)
Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
END DO
END DO
END DO
END DO
END DO
END
Note that an INTERCHANGE
directive can be applied to the same loop nest as a BLOCKINGSIZE
directive. The BLOCKINGSIZE
directive applies to the loop it directly precedes; it moves with that loop when an interchange is applied.
The NOBLOCKING
directive prevents the compiler from involving the subsequent loop in a cache blocking situation.
BOUNDS, NOBOUNDS
!DIR$ BOUNDS
[array[,array]…]
!DIR$ NOBOUNDS
[array[,array]…]
The BOUNDS
directive specifies that pointer and array references are to be checked. The NOBOUNDS
directive specifies that this checking is to be disabled.
The BOUNDS
and NOBOUNDS
directives support this optional argument:
array
The name of an array. The name cannot be a subobject of a derived type. When no array name is specified, the directive applies to all arrays.
Array bounds checking provides a check of most array references at both compile time and run time to ensure that each subscript is within the array’s declared size. Bounds checking behavior differs with the optimization level. Bounds checking is not performed on arrays dimensioned as 1. Enables -Ooverindex
. Complete checking is guaranteed only when optimization is turned off by specifying -O0
on the ftn
command line.
The h [no]bounds
(-Rb
) command line option controls bounds checking for a whole compilation. The BOUNDS
and NOBOUNDS
directives toggle the feature on and off within a program unit. Either directive can specify particular arrays or can apply to all arrays.
BOUNDS
remains in effect for a given array until the appearance of a NOBOUNDS
directive that applies to that array, or until the end of the program unit. Bounds checking can be enabled and disabled many times in a single program unit.
To be effective, these directives must follow the declarations for all affected arrays. It is suggested that they be placed at the end of a program unit’s specification statements unless they are meant to control particular ranges of code.
The bounds checking feature detects any reference to an array element whose subscript exceeds the array’s declared size.
REAL A(10)
C DETECTED AT COMPILE TIME:
A(11) = X
C DETECTED AT RUN TIME IF IFUN(M) EXCEEDS 10:
A(IFUN(M)) = W
The compiler generates an error message if it detects that an array element section reference with an out-of-bounds subscript attempts to reference memory. If the compiler cannot detect the out-of-bounds subscript (for example, if the subscript includes a function reference), a message is issued for out-of-bound subscripts when the program runs, but the program is allowed to complete execution.
Bounds checking does not inhibit vectorization but typically increases program run time. If an array’s last dimension declarator is *
, checking is not performed on the last dimension’s upper bound. Arrays in formatted WRITE
and READ
statements are not checked.
Array bounds checking does not prevent operand range errors that result when operand prefetching attempts to access an invalid address outside an array. Bounds checking is needed when very large values are used to calculate addresses for memory references.
If bounds checking detects an out-of-bounds array reference, a message is issued for only the first out-of-bounds array reference in the loop.
DIMENSION A(10)
MAX = 20
A(MAX) = 2
DO 10 I = 1, MAX
A(I) = I
10 CONTINUE
CALL TWO(MAX,A)
END
SUBROUTINE TWO(MAX,A)
REAL A(*) ! NO UPPER BOUNDS CHECKING DONE
END
The following messages are issued for the preceding program:
lib-1961 a.out: WARNING
Subscript 20 is out of range for dimension 1 for array
'A' at line 3 in file 't.f' with bounds 1:10.
lib-1962 a.out: WARNING
Subscript 1:20:1 is out of range for dimension 1 for array
'A' at line 5 in file 't.f' with bounds 1:10.
CACHE
!DIR$ CACHE
base_name[,base_name …]
Scope: Declaration
To use the CACHE
directive, place it only in the specification part, before any executable statement.
The CACHE
directive asserts that all memory operations with the specified symbols as the base are to be allocated in cache. This is an advisory directive. The CACHE
directive is meaningful for stores in that it allows the user to override a decision made by the automatic cache management.
base_name
The base name of the object that should be placed into the cache. This can be the base name of any object such as an array, scalar structure, and so on, without member references like C[10]. If a pointer is specified in the list, only the references, not the pointer itself, are cached.
This directive overrides automatic cache management decisions (-h cache
n, -O cache
n) made on the compiler command line. The cache directive may be locally overridden by the use of the LOOP_INFO
directive.
CACHE_NT
!DIR$ CACHE_NT
base_name[,base_name …]
Scope: Declaration
To use the CACHE_NT
directive, place it only in the specification part, before any executable statement.
Use the CACHE_NT
directive to identify objects that should not be placed in cache. This is an advisory directive that specifies objects that should non-temporal reads and writes.
base_name
The base name of the object that should use non-temporal reads and writes. This can be the base name of any object such as an array, scalar structure, and so on, without member references like C[10]. If a pointer is specified in the list, only the references, not the pointer itself, will have the cache non-temporal property.
Advisory directives are directives the compiler will honor if conditions permit it to. When this directive is honored, the performance of code may be improved because the cache is not occupied by objects that have a lower cache reuse rate. In theory, this makes room for objects that have a higher cache reuse rate.
This directive may be locally overridden by use of a LOOP_INFO
directive. This directive overrides automatic cache management decisions (-O cache
n) made on the compiler command line.
CLONE, NOCLONE, RESETCLONE, CLONEALWAYS, CLONENEVER
!DIR$ CLONE
!DIR$ NOCLONE
!DIR$ RESETCLONE
!DIR$ CLONEALWAYS
[name [, name] … ]
!DIR$ CLONENEVER
[name [, name] … ]
Cloning is the attempt to duplicate a procedure under certain conditions and replace dummy arguments with associated constant actual arguments throughout the cloned procedure. The compiler attempts to clone a procedure when a call site contains actual arguments that are scalar integer and/or scalar logical constants. When the constants are exposed to the optimizer, it can generate more efficient code. The cloning directives control whether cloning is attempted over a range of code.
The following directives remain in effect until a different cloning directive is encountered or until the end of the program unit.
CLONE
forces cloning to be attempted at all call sites if the conditions exist for cloning to be done.NOCLONE
disables all cloning.RESETCLONE
returns the cloning to the state specified on the compiler command line.CLONEALWAYS
instructs the compiler to attempt to clone one or more specific procedures, as specified by a comma-delimited list of name values.CLONENEVER
prevents cloning of a comma-delimited list of procedures as specified by name.
In the cases of CLONEALWAYS
and CLONENEVER
, if the directive is placed in the definition of the function, cloning is always or never attempted at every call site to name. If the directive is placed in a function other than the definition, cloning is always or never attempted at every call to name within the specific function containing the directive. An error message is issued if both CLONEALWAYS
and CLONENEVER
are specified for the same procedure within the same program unit.
Use the compiler -h negmsgs
option to see messages that highlight where cloning did occur and conditions that may have inhibited cloning.
COLLAPSE, NOCOLLAPSE
!DIR$ COLLAPSE
[(do_var1,do_var2[,do_var3 …])]
!DIR$ NOCOLLAPSE
The COLLAPSE
directive controls collapse of the immediately following loop nest. The directive enables the compiler to assume appropriate conformity between trip counts. The compiler diagnoses misuse at compile time (when able) or at run time if the -Rd
option is specified during compilation.
The COLLAPSE
directive supports one option:
do_var
The names of the DO
variables of the participating loops. When the COLLAPSE
directive is applied to a loop nest, the do_var variables must be listed in order of increasing access stride. When the COLLAPSE
directive is applied to an array assignment statement, the (do_var1, do_var2 [,do_var3 … ]) syntax is omitted.
The NOCOLLAPSE
directive disqualifies the next immediate loop from collapsing with any other loop. Collapse is almost always desirable, so use the NOCOLLAPSE
directive sparingly. The NOCOLLAPSE
directive immediately before an array assignment statement has no effect.
Loop collapse is a special form of loop coalesce. Any perfect loop nest may be coalesced into a single loop, with explicit rediscovery of the intermediate values of original loop control variables. The rediscovery cost, which generally involves integer division, is quite high; therefore coalesce is rarely suitable for vectorization. By definition, loop collapse occurs when loop coalesce may be done without the rediscovery overhead. To meet this requirement, all memory access must have uniform stride.
In Fortran arrays, uniform stride is achieved when a computation can flow from one column of a multidimensional array into the next, viewing the array as a flat sequence. Hence, array sections such as A(:,3:7)
are generally suitable for collapse, while a section like A(1:n-1,:)
lacks the needed uniformity. Care must be taken when applying the COLLAPSE
directive to assumed shape dummy arguments and Fortran pointers because the underlying storage need not be contiguous.
Example 1: COLLAPSE directive
In this example, the COLLAPSE
will collapse loop I and loop J into a single loop. The COLLAPSE
directive enables the compiler to assume appropriate conformity between trip counts and array extents.
SUBROUTINE S(A, N, N1, N2)
REAL A(N, *)
!DIR$ COLLAPSE (I, J)
DO I = 1, N1
DO J = 1, N2
A(I,J) = A(I,J) + 42.0
ENDDO
ENDDO
END
This results in code that is equivalent to the following. However, the following code is only an example to show the resulting behavior, and should not be coded directly as-is because as program source, it violates the Fortran language standard.
SUBROUTINE S(A, N, N1, N2)
REAL A(N, *)
DO IJ = 1, N1*N2
A(IJ, 1) = A(IJ, 1) + 42.0
ENDDO
END
Example 2: COLLAPSE directive using array syntax
In this example, the directive enables the compiler to assume appropriate conformity between trip counts and array extends.
SUBROUTINE S( A, B )
REAL, DIMENSION(:,:) :: A, B
!DIR$ COLLAPSE
A = B ! USER PROMISES UNIFORM ACCESS STRIDE.
END
CONCURRENT
!DIR$ CONCURRENT
[SAFE_DISTANCE
=n]
Scope: Local
The CONCURRENT
directive indicates that no data dependence exists between array references in different iterations of the loop. This directive affects the loop that immediately follows it. This can be useful for vectorization optimizations.
The CONCURRENT
directive supports one argument:
n
: An integer that represents the number of additional consecutive loop iterations that can be executed in parallel without danger of data conflict. n must be an integer constant > 0. If SAFE_DISTANCE
=n is not specified, the distance is assumed to be infinite and the compiler ignores all cross-iteration dependencies.
The CONCURRENT
directive is ignored if the SAFE_DISTANCE
argument is used and vectorization is requested on the command line.
Consider the following example:
!DIR$ CONCURRENT SAFE_DISTANCE=3
DO I = K+1, N
X(I) = A(I) + X(I-K)
ENDDO
The CONCURRENT
directive in this example informs the optimizer that the relationship K>3 is true. This allows the compiler to load all of the following array references safely during the Ith loop iteration:
X(I-K)
X(I-K+1)
X(I-K+2)
X(I-K+3)
COPY_ASSUMED_SHAPE
!DIR$ COPY_ASSUMED_SHAPE
[array [,array] …]
The COPY_ASSUMED_SHAPE
directive copies assumed-shape dummy array arguments into contiguous local temporary storage upon entry to the procedure in which the directive appears. During execution, it is the temporary storage that is used when the assumed-shape dummy array argument is referenced or defined.
The COPY_ASSUMED_SHAPE
directive applies only to the program unit in which it appears.
The COPY_ASSUMED_SHAPE
directive supports one argument:
array
The name of an array to be copied to temporary storage. If no array names are specified, all assumed-shape dummy arrays are copied to temporary contiguous storage upon entry to the procedure. When the procedure is exited, the arrays in temporary storage are copied back to the dummy argument arrays. If one or more arrays are specified, only those arrays specified are copied. The arrays specified must not have the TARGET attribute.
All arrays specified, or all assumed-shape dummy arrays (if specified without array arguments), on a single COPY_ASSUMED_SHAPE
directive must be shape conformant with each other. Incorrect code may be generated if the arrays are not. The -R c
command line option can be used to verify whether the arrays are shape conformant.
Except when the dummy argument is declared with the CONTIGUOUS
attribute, assumed-shape dummy arguments cannot be assumed to be stored in contiguous storage. In the case of multidimensional arrays, the elements cannot be assumed to be stored with uniform stride between each element of the array. These conditions can arise, for example, when an actual array argument associated with an assumed-shape dummy array is a non-unit strided array slice or section.
If the compiler cannot determine whether an assumed-shape dummy array is stored contiguously or with a uniform stride between each element, some optimizations are inhibited in order to ensure that correct code is generated. If an assumed-shape dummy array is passed to a procedure and becomes associated with an explicit-shape dummy array argument, additional copy-in and copy-out operations may occur at the call site. For multidimensional assumed-shape arrays, some classes of loop optimizations cannot be performed when an assumed-shape dummy array is referenced or defined in a loop or an array assignment statement. The lost optimizations and the additional copy operations performed can significantly reduce the performance of a procedure that uses assumed-shape dummy arrays when compared to an equivalent procedure that uses explicit-shape array dummy arguments.
The COPY_ASSUMED_SHAPE
directive causes a single copy to occur upon entry and again on exit. The compiler generates a test at run time to determine whether the array is contiguous. If the array is contiguous, the array is not copied. This directive allows the compiler to perform all the optimizations it would otherwise perform if explicit-shape dummy arrays were used. If there is sufficient work in the procedure using assumed-shape dummy arrays, the performance improvements gained by the compiler outweigh the cost of the copy operations upon entry and exit of the procedure.
FREE, FIXED
!DIR$ FREE
!DIR$ FIXED
The FREE
and FIXED
directives specify whether the source code in the program unit is written in free source or fixed source form. These directives override the -f
option, if specified on the ftn
command line.
These directives apply to the source file in which they appear and allow for switching source forms with a source file.
Source form can be changed from within an INCLUDE file. After the INCLUDE file is processed, the source form reverts to the source form that was being used prior to processing the INCLUDE file.
The source preprocessor does not recognize the FREE
or FIXED
directives. These directives must not be specified in a file that is to be submitted to the source preprocessor. To specify source form with such files, use the -f fixed
or -f free
option on the ftn
command line.
FUSION, NOFUSION
!DIR$ FUSION
!DIR$ NOFUSION
The FUSION
and NOFUSION
directives direct the compiler to attempt or not attempt loop fusion on the loop following the directive, thus permitting fine-tuning of the selection of which loops the compiler should attempt to fuse. The FUSION
directive should be placed immediately before the DO
statement of the loop that should be fused.
The FUSION
directive instructs the compiler to attempt loop fusion on the following loop unless -h nofusion
was specified on the compiler command line.
The NOFUSION
directive instructs the compiler to not attempt loop fusion on the following loop even when the -h fusion
option is specified on the compiler command line.
If it is desired that only a few loops out of many should be fused, use the FUSION
directive with the -O fusion1
option to confine loop fusion to these few loops. Conversely, if there are only a few loops out of many that should not be fused, use the NOFUSION
directive with the -O fusion2
option to specify no fusion for these loops.
HAND_TUNED
!DIR$ HAND_TUNED
Assert that the code in the next loop nest has been arranged by hand for maximum performance and the compiler should restrict some of the more aggressive automatic expression rewrites. The compiler should still fully optimize and vectorize the loop within the constraints of the directive. The HAND_TUNED
directive applies to the next loop in the same manner as the CONCURRENT
and SAFE_ADDRESS
directives.
Use of this directive may severely impede performance. Use carefully and evaluate performance before and after employing this directive.
HEAP_ALLOCATE, NOHEAP_ALLOCATE
!DIR$ HEAP_ALLOCATE
!DIR$ NOHEAP_ALLOCATE
HEAP_ALLOCATE
puts variable-size array temporaries on the heap until a NOHEAP_ALLOCATE
directive is found or the current scope ends. Following scope exit, array placement returns to the policy in force prior to entry to the scope just exited.
NOHEAP_ALLOCATE
puts variable-size array temporaries on the stack until a HEAP_ALLOCATE
directive is found or the current scope ends. Following scope exit, array placement returns to the policy in force prior to entry to the scope just exited.
The OPTIMIZE
directive recognizes the -h heap_allocate
and -h noheap_allocate
options. The latter is the default.
A -h heap_allocate
argument on the command line overrides the -h noheap_allocate
option in the OPTIMIZE
directive, but a -h noheap_allocate
argument on the command line does not override the OPTIMIZE
directive.
IGNORE_TKR
!DIR$ IGNORE_TKR
[ [ (letter) dummy_arg] … ]
The IGNORE_TKR
directive directs the compiler to ignore the Type, Kind, and/or Rank of specified dummy arguments in the procedure interface.
This directive supports the following arguments:
letter
This can be T
, K
, R
, or any combination of these letters, for example TK
or KR
. The letter applies only to the dummy argument it precedes. If letter appears, dummy_arg must appear.
dummy_arg
If specified, it indicates the dummy arguments for which TKR rules should be ignored. If not specified, TKR rules are ignored for all dummy arguments in the procedure that contains the directive.
The directive causes the compiler to ignore the type, kind, and/or rank of the specified dummy arguments when resolving a generic call to a specific call. The compiler also ignores the type, kind, and/or rank on the specified dummy arguments when checking all the specifics in a generic call for ambiguities.
The following example instructs the compiler to ignore type, kind, and/or rank rules for the dummy arguments of the following subroutine fragment:
subroutine example(A,B,C,D)
!DIR$ IGNORE_TKR A, (R) B, (TK) C, (K) D
Dummy Argument |
What’s Ignored |
---|---|
A |
Type, kind, and rank |
B |
Rank only |
C |
Type and kind |
D |
Kind only |
INLINE, NOINLINE, RESETINLINE, INLINEALWAYS, INLINENEVER
!DIR$ INLINE
!DIR$ NOINLINE
!DIR$ RESETINLINE
!DIR$ INLINEALWAYS
[name [, name] … ]
!DIR$ INLINENEVER
[name [, name] … ]
Inlining replaces calls to user-defined functions with the code that represents the function. This can improve performance by saving the expense of the function call overhead. It also increases the possibility of additional code optimization. Inlining may increase object code size.
The following directives remain in effect until a different inlining directive is encountered or until the end of the program unit.
INLINE
instructs the compiler to attempt to inline functions at all call sitesNOINLINE
disables all inliningRESETINLINE
returns inlining to the state specified on the compiler command line by the-O ipa
n option, or to the default state if no option was specifiedINLINEALWAYS
instructs the compiler to attempt to inline one or more specific procedures, as specified by a comma-delimited list of name valuesINLINENEVER
prevents inlining of a comma-delimited list of procedures as specified by name
In the cases of INLINEALWAYS
and INLINENEVER
, if the directive is placed in the definition of the function, inlining is always or never attempted at every call site to name. If the directive is placed in a function other than the definition, inlining is always or never attempted at every call to name within the specific function containing the directive. An error message is issued if both INLINEALWAYS
and INLINENEVER
are specified for the same procedure within the same program unit.
Example: INLINEALWAYS and INLINENEVER directives
SUBROUTINE S()
!DIR$ INLINEALWAYS S ! THIS SAYS ATTEMPT
! INLINING OF S AT ALL CALLS.
...
END SUBROUTINE
SUBROUTINE T
!DIR$ INLINENEVER S ! DO NOT INLINE ANY CALLS TO S
! IN SUBROUTINE T.
CALL S()
...
END SUBROUTINE
SUBROUTINE V
!DIR$ NOINLINE ! HAS HIGHER PRECEDENCE THAN INLINEALWAYS.
CALL S() ! DO NOT INLINE THIS CALL TO S.
!DIR$ INLINE
CALL S() ! ATTEMPT INLINING OF THIS CALL TO S.
...
END SUBROUTINE
SUBROUTINE W
CALL S() ! ATTEMPT INLINING OF THIS CALL TO S.
...
END SUBROUTINE
INTERCHANGE, NOINTERCHANGE
!DIR$ INTERCHANGE
(do_var1
,do_var2
[,do_var3
… ])
!DIR$ NOINTERCHANGE
Scope: Local
The INTERCHANGE
directive specifies that the order of the two or more loops immediately following the directive should be interchanged.
The NOINTERCHANGE
inhibits loop interchange on the loop that immediately follows the directive.
The INTERCHANGE
directive supports one option:
do_var
Specifies two or more DO
variable names. The do_var names can be specified in any order, and the compiler will reorder the loops. The loops must be perfectly nested. If the loops are not perfectly nested, the results will be unpredictable.
The loops affected by the INTERCHANGE
directive are designated by their DO
variable names. The compiler will reorder the loops such that the loop with do_var1
is outermost, then loop do_var2
, then loop do_var3
, and so on.
Example: INTERCHANGE directive
In this example, the interchange directive reorders the loops. The K loop becomes the outermost, followed by J, and the I loop becomes the innermost.
!DIR$ INTERCHANGE (K, J, I)
DO I = 1,NSIZE1
DO K = 1,NSIZE1
DO J = 1,NSIZE1
X(I,J) = X(I,J) + Y(I,K) * Z(K,J)
ENDDO
ENDDO
ENDDO
IVDEP
!DIR$ IVDEP
[ SAFEVL=vlen
| INFINITEVL ]
Ignore vector dependencies in the loop immediately following the directive. The IVDEP
directive supports these arguments.
vlen
Specifies a vector length in which no dependency will occur. vlen must be an integer between 1 and 1024 inclusive.
INFINITEVL
Specifies an infinite safe vector length. That is, no dependency will occur at any vector length. This is the default. If vlen is not specified, the vector length used is infinity.
When the IVDEP
directive appears before a loop, the compiler ignores vector dependencies, including explicit dependencies, in any attempt to vectorize the loop. IVDEP
applies only to the first IVDEP
loop that follows the directive within the same program unit. An IVDEP
directive before a DO CONCURRENT
loop has no effect.
For array operations, Fortran requires that the complete right-hand side (RHS) expression be evaluated before the assignment to the array or array section on the left-hand side (LHS). If possible dependencies exist between the RHS expression and the LHS assignment target, the compiler creates temporary storage to hold the RHS expression result. If an IVDEP
directive appears before an array syntax statement, the compiler ignores potential dependencies and suppresses the creation and use of array temporaries for that statement. Using array syntax statements allows the reference of referencing arrays in a compact manner. Array syntax allows the use of either the array name, or the array name with a section subscript, to specify actions on all the elements of an array, or array section, without using DO loops.
Whether or not IVDEP
is used, conditions other than vector dependencies can inhibit vectorization.
If a loop with an IVDEP
directive is enclosed within another loop with an IVDEP
directive, the IVDEP
directive on the outer loop is ignored.
When the Cray compiler vectorizes a loop, it may reorder the statements in the source code to remove vector dependencies. When IVDEP
is specified, the statements in the loop or array syntax statement are assumed to contain no dependencies as written, and the Cray compiler does not reorder loop statements.
LOOP_INFO
!DIR$ LOOP_INFO
[options]
The LOOP_INFO
directive allows additional information to be specified about the behavior of a loop, including run-time trip count and hints on cache allocation strategy. This information is provided to the optimizer and can produce faster code sequences. The LOOP_INFO
directive supports are large number of optional arguments.
The following trip count arguments use the variable c to indicate an expression that evaluates to an integer constant at compilation time. Use these immediately before a FOR
loop to indicate minimum, maximum, and estimated trip counts. The compiler will diagnose misuse at compile time when able to, or when option -h dir_check
is specified.
MIN_TRIPS
(c)
Specifies guaranteed minimum number of trips.
EST_TRIPS
(c)
Specifies estimated or average number of trips.
MAX_TRIPS
(c)
Specifies guaranteed maximum number of trips.
The following cache allocation arguments use the variable symbol to indicate the base name of the object that should or should not be placed into cache. This can be the base name of any object, such as an array or scalar structure without member references. If specifying a pointer in the list, only the references, not the pointer itself, are subject to the instruction. For cache allocation hints, use the LOOP_INFO
directives to override default settings, CACHE
or CACHE_NT
directives, or automatic cache management decisions. The cache hints are local and apply only to the specified loop nest.
CACHE
(symbol[,symbol] …)
Specifies that symbol, or a comma-delimited list of symbols, is to be allocated in cache. This is the default if no hint is specified and the cache_nt
directive is not specified.
CACHE_NT
(symbol[,symbol] … )
Specifies that symbol, or a comma-delimited list of symbols, is to use non-temporal reads and writes, and not be allocated in cache.
The following optional arguments do not require variables.
PREFETCH
Specifies a preference that prefetches be performed for the following loop.
NOPREFETCH
Specifies a preference that no prefetches be performed for the following loop.
PREFER_THREAD
The PREFER_THREAD
and PREFER_NOTHREAD
directives are special cases of the LOOP_INFO
advisory directive. Use these directives to indicate a preference for turning threading on or off for the subsequent loop. Use !DIR$ LOOP_INFO PREFER_THREAD
to indicate your preference that the loop following the directive be threaded.
PREFER_NOTHREAD
Use !DIR$ LOOP_INFO PREFER_NOTHREAD
to indicate that the loop should not be threaded.
The PREFETCH
directive instructs the compiler to preload scalar data into the first-level cache to improve the frequency of cache hits and to lower latency. Prefetch instructions are generated in situations where the compiler expects them to improve performance. Strategic use of prefetch instructions can hide latency for scalar loads that feed vector instructions or scalar loads in purely scalar loops. Prefetch instructions are generated at default and higher levels of optimization. Thus, they are turned off at -O0
or -O1
. Prefetch instructions can be turned off at the loop level by specifying the NOPREFETCH
directive.
MEMORY
UNSUPPORTED FEATURE:
The Cray !DIR$ MEMORY
directive is no longer supported. Users are encouraged to prepare to transition to OpenMP 5.0 allocators instead, which provide similar capabilities through standard mechanisms. HPE CCE currently provides partial functional support for OpenMP 5.0 allocators, including support for the “pinned” allocator trait when targeting an NVIDIA or AMD GPU. Support for the “high bandwidth” predefined memory space is planned for a future HPE CCE release.
NAME
!DIR$ NAME
(fortran_name=”external_name” [, fortran_name=”external_name” ] … )
Scope: Global
The NAME
directive allows the specification of a case-sensitive external name or a name that contains characters outside of the Fortran character set. The NAME
directive supports the following arguments:
fortran_name
The name used for the object throughout the Fortran program.
external_name
The external form of the name.
The rules for Fortran naming do not apply to the external_name string. Any character sequence is valid.
The name directive can be used, for example, to write calls to C routines. The Fortran standard BIND
feature provides some of the capability of the NAME
directive.
Example: Calling a C routine from a Fortran program
PROGRAM MAIN
!DIR$ NAME (FOO="XyZ")
CALL FOO ! XyZ is really being called
END PROGRAM
NEXTSCALAR
!DIR$ NEXTSCALAR
The NEXTSCALAR
directive disables vectorization for the first DO
or DO WHILE
loop following the directive. The directive applies to one loop only; the first loop that appears after the directive but within the same program unit. If the NEXTSCALAR
directive appears before any array syntax statement, it disables vectorization for the array syntax statement.
NEXTSCALAR
is ignored if vectorization has been disabled.
NOFISSION
!DIR$ NOFISSION
The NOFISSION
directive instructs the compiler not to split the loop immediately following the directive. This directive should be placed immediately before the DO
statement of the loop that should not be split.
Fission is prevented only for the loop level specified. Loops nested within the indicated loop remain fission candidates unless likewise annotated.
NOSIDEEFFECTS
!DIR$ NOSIDEEFFECTS
f[, f … ]
The NOSIDEEFFECTS
directive allows the compiler to keep information in registers across a single call to a subprogram without reloading the information from memory after returning from the subprogram. This directive is not needed for intrinsic functions or VFUNCTIONS
.
This directive supports one argument:
f
Symbolic name of a subprogram that the user is sure has no side effects. f must not be the name of a dummy procedure, module procedure, or internal procedure.
NOSIDEEFFECTS
declares that a called subprogram does not redefine any variables that meet the following conditions:
Local to the calling program
Passed as arguments to the subprogram
Accessible to the calling subprogram through host association
Declared in a common block or module
Accessible through
USE
association
A procedure declared NOSIDEEFFECTS
should not define variables in a common block or module shared by a program unit in the calling chain. All arguments should have the INTENT(IN)
attribute; that is, the procedure must not modify its arguments. If these conditions are not met, results are unpredictable.
The NOSIDEEFFECTS
directive must appear in the specification part of a program unit and must appear before the first executable statement.
The compiler may move invocations of a NOSIDEEFFECTS
subprogram from the body of a DO
loop to the loop preamble if the arguments to that function are invariant in the loop. This may affect the results of the program, particularly if the NOSIDEEFFECTS
subprogram calls functions such as the random number generator or the real-time clock.
The effects of the NOSIDEEFFECTS
directive are similar to those that can be obtained by specifying the PURE
prefix on a function or a subroutine declaration.
OPTIMIZE
!DIR$ OPTIMIZE
[(option[ option])]
The OPTIMIZE
directive enables optimization in the function in which it appears, overriding the optimization level set via the compiler command line. The OPTIMIZE
directive with no option specified is equivalent to OPTIMIZE -O2
.
The OPTIMIZE
directive may only appear in the declarative section of a program unit. A program unit may be a program, subroutine, function, module, or submodule, but not a block data program unit. OPTIMIZE
does not affect any modules invoked with the USE
statement in the program unit that contains them. They do affect CONTAIN
ed procedures that do not include an explicit OPTIMIZE
directive.
The OPTIMIZE
directive accepts the following subset of the command line options that control optimization. Refer to Fortran Command-line Options or the crayftn(1)
man page for more detailed information.
-O
level-h acc
-h acc_model=
-h add_paren
-h [no]aggress
-h align_arrays
-h [no]autothread
-h [no]autoprefetch
-h cache
n-h concurrent
-h contiguous
-h contiguous_assumed_shape
-h flex_mp=
level-h fp
n-h fp_trap
-h fusion
n-h [no]heap_allocate
-h infinitevl
-h loop_trips
-h msgs
-h negmsgs
-h nointerchange
-h omp
-h overindex
-h page_align_allocate
-h [no]pattern
-h preferred_vector_width=
-h scalar
n-h shortcircuit
level-h thread
n-h unroll
n-h vector
n-h zero
PATTERN, NOPATTERN
!DIR$ PATTERN
!DIR$ NOPATTERN
By default, the compiler detects coding patterns in source code sequences and replaces these sequences with calls to optimized library routines. In most cases, this replacement improves performance. There are cases, however, in which this substitution degrades performance. This can occur, for example, in loops with very low trip counts.
The NOPATTERN
directive disables pattern matching and causes the compiler to generate inline code for the loop immediately following the directive. When the NOPATTERN
directive is encountered, pattern matching is suspended for the remainder of the program unit or until a PATTERN
directive is encountered.
The PATTERN
directive is used to resume pattern matching within the program.
When the -O nopattern
command line option is in effect, the NOPATTERN
and PATTERN
compiler directives are ignored.
In the following example, the compiler normally would detect that the loop is a matrix multiply and replace it with a call to a matrix multiply library routine. By preceding the loop with a NOPATTERN
directive, however, pattern matching is inhibited and no replacement is done.
!DIR$ NOPATTERN
DO k= 1,n
DO i= 1,n
DO j= 1,m
A(i,j) = A(i,j) + B(i,k) * C(k,j)
END DO
END DO
END DO
PERMUTATION
!DIR$ PERMUTATION
(symbol [, symbol] … )
symbol
Integer array that has no repeated values for the entire routine.
The PERMUTATION
directive specifies that an integer array has no repeated values. This directive is useful when the integer array is used as a subscript for another array (vector-valued subscript). This directive may improve code performance.
The PERMUTATION
directive is not a loop-based directive. It applies to the entire enclosing routine, regardless of where it is placed.
In a sequence of array accesses that read array element values from the specified symbols with no intervening accesses that modify the array element values, each of the accessed elements will have a distinct value.
When an array element that has a subscript that is an element of an integer array with a subscript that depends on the loop index is on the left side of the equal sign in a loop, many-to-one assignment is possible. Many-to-one assignment occurs if any repeated elements exist in the subscripting array. If it is known that the integer array is used merely to permute the elements of the subscripted array, it can often be determined that many-to-one assignment does not exist with that array reference.
Sometimes a vector-valued subscript is used as a means of indirect addressing because the elements of interest in an array are sparsely distributed; in this case, an integer array is used to select only the desired elements, and no repeated elements exist in the integer array. The permutation directive does not apply to the array a
. Rather, it applies to the pointer used to index into it, ipnt
. By knowing that ipnt
is a permutation, the compiler can safely generate an unordered scatter for the write to a
.
!DIR$ PERMUTATION(IPNT) ! IPNT has no repeated values
...
DO I = 1, N
A(IPNT(I)) = B(I) + C(I)
END DO
PGAS BUFFERED_ASYNC
!DIR$ PGAS BUFFERED_ASYNC
The PGAS BUFFERED_ASYNC
directive batches PGAS operations into bulk data transfers. PGAS data references made by the single statement immediately following the PGAS BUFFERED_ASYNC
directive will be batched into bulk data transfers.
Before using this directive, the user should port the code to use the PGAS DEFER_SYNC
directive.
No ordering or correctness guarantees between buffered async (BA) and non-BA references are made. No ordering guarantees between BA references are made. Users should insert a fence or barrier if they require ordering guarantees. This directive will allow the compiler to violate language ordering semantics.
Both fence and barrier imply global visibility for BA references. It is the user’s responsibility to ensure BA references do not target overlapping memory.
No automatic progress guarantees are made. The only way to guarantee progress is if both the source and target are actively making BA references or are inside a barrier/fence. User implemented spin-wait routines may encounter deadlock.
The purpose of the BUFFERED_ASYNC
directive is to achieve higher performance by batching small references into bulk data transfers. This should only be applied to references targeting non-contiguous irregular memory where the compiler is unable to pattern match to an optimized communication pattern.
Care should be taken to ensure many thousands of BA operations take place before a fence. There is overhead added to achieve bulk data transfers. Using BA references may greatly increase the application’s memory footprint.
PGAS DEFER_SYNC
!DIR$ PGAS DEFER_SYNC
The PGAS DEFER_SYNC
directive defers the synchronization of PGAS data. PGAS data references may by the single statement immediately following the PGAS DEFER_SYNC
directive will not be synchronized until the next fence instruction.
The compiler normally synchronizes the references in a statement as late as possible without violating program semantics. The purpose of the DEFER_SYNC
directive is to synchronize the references even later, beyond where the compiler can determine it is safe.
For example, if there is a remote-memory access (RMA) put
near the end of a subroutine, the compiler must guard against the put
value being read back immediately after the subroutine returns, so the put
is synchronized just before returning. The programmer, however, may know that the value is not read back and can insert a PGAS DEFER_SYNC
directive.
Example: Coarray Fortran
subroutine my_put( x, image, value )
integer :: x[*], image, value
!dir$ pgas defer_sync
x[image] = value
end subroutine
PIPELINE, NOPIPELINE
!DIR$ PIPELINE
!DIR$ NOPIPELINE
Software-based vector pipelining (software vector pipelining) provides additional optimization beyond the normal hardware-based vector pipelining. In software vector pipelining, the compiler analyzes all vector loops and automatically attempts to pipeline a loop if doing so can be expected to produce a significant performance gain. This optimization also performs any necessary loop unrolling.
In some cases the compiler either does not pipeline a loop that could be pipelined or pipelines a loop without producing performance gains. In these situations, use the PIPELINE
or NOPIPELINE
directive to advise the compiler to pipeline or not pipeline the loop immediately following the directive.
Software vector pipelining is valid only for the innermost loop of a loop nest. These directives are advisory only. While the NOPIPELINE
directive can be used to inhibit automatic pipelining, and the PIPELINE
directive can be used to attempt to override the compiler’s decision not to pipeline a loop, the compiler cannot be forced to pipeline a loop that cannot be pipelined.
Vector loops that have been pipelined generate compile-time messages to that effect, if optimization messaging is enabled (-h msgs
).
PREFERVECTOR
!DIR$ PREFERVECTOR
Directs the compiler to vectorize the loop immediately following the directive if the loop contains more than one loop in the nest that can be vectorized. The directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard.
In the following example, both loops can be vectorized, but the compiler generates vector code for the outer DO I
loop:
!DIR$ PREFERVECTOR
DO I = 1, N
DO J = 1, M
A(I) = A(I) + B(J,I)
END DO
END DO
PREFETCH
!DIR$ PREFETCH
[([line
(num)][, level
(num)] [, write
][, nt
])] var[, var] …
PREFETCH
is a general directive that instructs the compiler to generate explicit prefetch instructions to load data from memory into cache prior to read or write access.
The PREFETCH
directive supports the following options:
lines
(num)
Specifies the number of cache lines to be prefetched. num is an expression that evaluates an integer constant at compilation time. By default, the number of cache lines prefetched is 1.
level
(num)
Specifies the level of cache into which data is loaded. num is an expression that evaluates an integer constant at compilation time. The cache level defaults to 1, the level closest to the processing unit.
write
Specifies that the prefetch is for data to be written. When data is to be written, a prefetch instruction can move a block into the cache so that the expected store will be to the cache. Prefetch for write generally brings the data into the cache in an exclusive or modified state. By default, the prefetch is for data to be read. If the target architecture does not support prefetch for write, the prefetch will automatically become a prefetch for read.
nt
Specifies that the prefetch is for non-temporal data. By default, the prefetch is for temporal data. Data with temporal locality (persistence), is expected to be accessed multiple times.
var
The memory location to be prefetched, which can be any valid variable, member, or array element reference.
The compiler issues the prefetch instruction when it encounters the PREFETCH
directive. The directive allows the user to influence almost every aspect of prefetch behavior. The default behavior prefetches one cache line, into L1 cache, for read access, and assumes temporal locality.
The PREFETCH
directive can be used inside and outside of loops, in a loop preamble, or before a function call to reduce cache-miss memory latency.
The compiler will attempt to avoid multiple prefetches to the same cache line, which can be created as a result of optimization.
All variables specified on the same PREFETCH
directive line share the same behavior. If different behavior is needed for different variables, use multiple PREFETCH
directive lines.
The general PREFETCH
directive supersedes the effects of any relevant loop_info [no]prefetch directives
and the -h [no]autoprefetch
command line option.
The Cray Fortran compiler command line option -x prefetch
can be used to disable all general PREFETCH
directives in Fortran source code.
Example: PREFETCH directive
real*8 a(m,n), b(n,p), c(m,p), arow(n)
...
do j = 1, p
!dir$ prefetch (lines(3), nt) arow(1),b(1,j)
do k = 1, n, 4
!dir$ prefetch (nt) arow(k+24),b(k+24,j)
c(i,j) = c(i,j) + arow(k) * b(k,j)
c(i,j) = c(i,j) + arow(k+1) * b(k+1,j)
c(i,j) = c(i,j) + arow(k+2) * b(k+2,j)
c(i,j) = c(i,j) + arow(k+3) * b(k+3,j)
enddo
enddo
PREPROCESS
!DIR$ PREPROCESS
[expand_macros]
The PREPROCESS
directive allows an include file to be preprocessed when the compilation does not specify the preprocessing command line option. This directive does not cause preprocessing of included files, unless they too use the directive. If the preprocessing command line option is used, preprocessing occurs normally for all files.
To use the directive, it must be the first line in the include file and in each included file that needs to be preprocessing.
The PREPROCESS
directive supports this option:
expand_macros
The optional expand_macros clause allows the compiler to expand all macros within the include files. Without this clause, macro expansion occurs only within preprocessing directives.
PROBABILITY, PROBABILITY_ALMOST_ALWAYS, PROBABILITY_ALMOST_NEVER
!DIR$ PROBABILITY
const
!DIR$ PROBABILITY_ALMOST_ALWAYS
!DIR$ PROBABILITY_ALMOST_NEVER
The probability directives specify information used by interprocedure analysis (IPA) and the optimizer to produce faster code sequences. The specified probability is a hint, rather than a statement of fact. This information is used to guide inlining decisions, branch elimination optimizations, branch hint marking, and the choice of the optimal algorithmic approach to the vectorization of conditional code. These directives can appear anywhere executable code is legal. Each directive applies to the block of code where it appears. It is important to realize that the directive should not be applied to a conditional test directly; rather, it should be used to indicate the relative probability of a THEN
or ELSE
code block being executed.
The PROBABILITY
directive supports one argument.
const
Expression that evaluates to a floating point constant at compilation time. (0.0 <= const <= 1.0.)
Specify almost_never
and almost_always
by using the probability const
values 0.0 and 1.0, respectively.
This example states that the probability of entering the block of code with the assignment statement is 0.3 or 30%. This also means that a[i]
is expected to be greater than b[i]
30% of the time. Note that the probability directive appears within the conditional block of code, rather than before it. This removes some of the ambiguity that has plagued other implementations that tie the directive directly to the conditional code.
IF ( A(I) > B(I) ) THEN
!DIR$ PROBABILITY 0.3
A(I) = B(I)
ENDIF
For vector IF
code, a probability of very low (<0.1) or probability_almost_never
causes the compiler to use vector gather/scatter methods used for sparse IF
vector code instead of the vector merge methods used for denser IF
code. For example:
DO I = 1,N
IF ( A(I) > 0.0 ) THEN
!DIR$ PROBABILITY_ALMOST_NEVER
B(I) = B(I)/A(I) + A(I)/B(I) ! EVALUATE USING
! SPARSE METHODS
ENDIF
ENDDO
Note that the PROBABILITY
directive appears within the conditional, rather than before the condition. This removes some of the ambiguity of tying the directive directly to the conditional test.
SAFE_ADDRESS
!DIR$ SAFE_ADDRESS
Scope: Local
Specifies that it is safe to speculatively execute memory references within all conditional branches of a loop; these memory references can be safely executed in each iteration of the loop. For most code, this directive can improve performance significantly by preloading vector expressions. However, most loops do not require this directive to have preloading performed. SAFE_ADDRESS
is required only when the safety of the operation cannot be determined or index expressions are very complicated.
The SAFE_ADDRESS
directive is an advisory directive. That is, the compiler may override the directive if it determines the directive is not beneficial. If the directive is not used on a loop and the compiler determines that it would benefit from the directive, it issues a message indicating such. The message is similar to this:
DO I = 1,N
FTN-6375 FTN_DRIVER.EXE: VECTOR X7, FILE = 10928.F, LINE = 110
A LOOP STARTING AT LINE 110 WOULD BENEFIT FROM "!DIR$ SAFE_ADDRESS"
If using the directive on a loop and the compiler determines that it does not benefit from the directive, it issues a message that states the directive is superfluous and can be removed. To see the messages, use the -O msgs
option.
Incorrect use of the directive can result in segmentation faults, bus errors, or excessive page faulting. However, it should not result in incorrect answers. Incorrect usage can result in very severe performance degradations or program aborts.
In this example, the compiler will not preload vector expressions, because the value of j
is unknown. However, if it is known that references to b (i,j)
are safe to evaluate for all iterations of the loop, regardless of the condition, the SAFE_ADDRESS
directive can be used. With the directive, the compiler can load b (i,j)
with a full vector mask, merge 0.0
where the condition is true, and store the resulting vector using a full mask.
SUBROUTINE X3( A, B, N, M, J )
REAL A(N), B(N,M)
!DIR$ SAFE_ADDRESS
DO I = 1,64 ! VECTORIZED LOOP
IF ( A(I).NE.0.0 ) THEN
B(I,J) = 0.0 ! VALUE OF 'J' IS UNKNOWN
ENDIF
ENDDO
END
SAFE_CONDITIONAL
!DIR$ SAFE_CONDITIONAL
The SAFE_CONDITIONAL
directive specifies that it is safe to execute all memory references and arithmetic operations within all conditional branches of the subsequent scalar or vector loop nest. It can improve performance by allowing the hoisting of invariant expressions from conditional code and by allowing prefetching of memory references.
The SAFE_CONDITIONAL
directive is an advisory directive. The compiler may override the directive if it determines the directive is not beneficial.
Incorrect use of the directive can result in segmentation faults, bus errors, excessive page faulting, or arithmetic aborts. However, it should not result in incorrect answers. Incorrect usage can result in severe performance degradations or program aborts.
In the following example, the compiler cannot precompute the invariant expression s1*s2
because these values are unknown and may cause an arithmetic trap if executed unconditionally. However, if the condition is known to be true at least once, then it is safe to use the SAFE_CONDITIONAL
directive and execute s1*s2
speculatively. With the directive, the compiler evaluates s1*s2
outside of the loop, rather than under control of the conditional code. In addition, all control flow is removed from the body of the vector loop as s1*s2
no longer poses a safety risk.
SUBROUTINE SAFE_COND( A, N, S1, S2 )
REAL A(N), S1, S2
!DIR$ SAFE_CONDITIONAL
DO I = 1,N
IF ( A(I) /= 0.0 ) THEN
A(I) = A(I) + S1*S2
ENDIF
ENDDO
END
SAME_TBS
!DIR$ SAME_TBS
(array, array[, array])
The SAME_TBS
directive informs the compiler that the specified assumed-shape arrays are of the same rank and type, and that they have identical low-bound, extent, and stride multiplier information for corresponding dimensions.
This information allows the compiler to generate more efficient code by reducing the number of potentially distinct intermediate values required for array element accesses. This may offer significant execution performance improvement when using assumed-shape dummy arrays of corresponding type, low-bound, extent, and stride.
The SAME_TBS
directive supports this option:
array
Two or more array arguments are required. array is the name of an assumed-shape dummy array. The arrays specified must not have the TARGET
attribute. All arrays specified on a single SAME_TBS
directive must have same element type, bounds, and strides. Use the -Rd
command line option to verify that the arrays have the same element type, bounds, and strides.
The SAME_TBS
directive applies only to the program unit in which it appears.
Ordinarily, for multidimensional assumed-shape arrays, some classes of loop optimizations cannot be performed when an assumed-shape dummy array is referenced or defined in a loop or an array assignment statement. The lost optimizations and the additional copy operations performed can significantly reduce the performance of a procedure that uses assumed-shape dummy arrays when compared to an equivalent procedure that uses explicit-shape array dummy arguments. This directive may provide significant performance improvement depending on certain factors such as greater numbers of assumed-shape arrays and smaller array sizes.
STACK
!DIR$ STACK
The STACK
directive causes storage to be allocated to the stack in the program unit that contains the directive. This directive overrides the -ev
command line option in specific program units of a compilation unit.
Data specified in the specification part of a module or in a DATA
statement is always allocated to static storage. This directive has no effect on static storage allocation.
All SAVE
statements are honored in program units that also contain a STACK
directive. This directive does not override the SAVE
statement.
If the compiler finds a STACK
directive and a SAVE
statement without any objects specified in the same program unit, a warning message is issued.
The following rules apply when using this directive:
It must be specified within the scope of a program unit.
If it is specified in the specification part of a module, a message is issued. The
STACK
directive is allowed in the scope of a module procedure.If it is specified within the scope of an interface body, a message is issued.
SUPPRESS
!DIR$ SUPPRESS
[var [,var] … ]
Scope: Local and Global
The SUPPRESS
directive suppresses scalar optimization for all variables or only for those specified at the point where the directive appears. This often prevents or adversely affects vectorization of any loop that contains SUPPRESS
.
The SUPPRESS
directive supports an optional comma-delimited list of variables.
var
Variable that is to be stored in memory. If more than one variable is specified, use a comma to separate the variables. If no variables are listed, all variables in the program unit are stored.
At the point at which !DIR$ SUPPRESS
appears in the source code, variables in registers are stored to memory (to be read out at their next reference), and expressions containing any of the affected variables are recomputed at their next reference after !DIR$ SUPPRESS
. The effect on optimization is equivalent to that of an external subroutine call with an argument list that includes the variables specified by !DIR$ SUPPRESS
(or, if no variable list is included, all variables in the program unit).
Example: SUPPRESS directive
Below is an example of the SUPPRESS
directive used with an IF
statement. The directive takes effect only if it is on an execution path. Optimization proceeds normally if the directive path is not executed because of a GOTO
or IF
. In this example, optimization replaces the reference to A
in the PRINT
statement with the constant 1.0
, even though !DIR$ SUPPRESS
appears between A=1.0
and the PRINT
statement. The IF
statement can cause the execution path to bypass !DIR$ SUPPRESS
. If SUPPRESS
appears before the IF
statement, A
in PRINT *
is not replaced by the constant 1.0
.
SUBROUTINE SUB (L)
LOGICAL L
A = 1.0 ! A is local
IF (L) THEN
!DIR$ SUPPRESS ! Has no effect if L is false
CALL ROUTINE()
ELSE
PRINT *, A
END IF
END
UNROLL, NOUNROLL
!DIR$ UNROLL
[n]
!DIR$ NOUNROLL
Scope: Local
The UNROLL
directive allows the user to control unrolling for individual loops or to specify no unrolling of a loop. Loop unrolling can improve program performance by revealing cross-iteration memory optimization opportunities such as read-after-write and read-after-read.
The UNROLL
directive supports one argument:
n
Where n specifies either no loop unrolling (n = 0
or 1
) or the total number of loop body copies to be generated (2
<= n <= 63
).
If a value for n is not specified, the compiler will determine the number of copies to generate based on the number of statements in the loop nest.
The NOUNROLL
directive disables loop unrolling for the next loop. This is equivalent to specifying UNROLL0
or UNROLL1
.
The UNROLL
directive can be used only on loops with iteration counts that can be calculated before entering the loop. If UNROLL
is specified on a loop that is not the innermost loop in a loop nest, the inner loops must be nested perfectly. That is, all loops in the nest can contain only one loop, and the innermost loop can contain work. Note that the compiler cannot always safely unroll non-innermost loops due to data dependencies. In these cases, this directive is ignored.
The advantages of loop unrolling include:
Improved loop scheduling by increasing basic block size
Reduced loop overhead
Improved chances for cache hits
Example 1: Unroll outer loops
In the following example, assume that the outer loop of the following nest will be unrolled by 2:
!DIR$ UNROLL 2
DO I = 1, 10
DO J = 1,100
A(J,I) = B(J,I) + 1
END DO
END DO
With outer loop unrolling, the compiler produces the following nest, in which the two bodies of the inner loop are adjacent:
DO I = 1, 10, 2
DO J = 1,100
A(J,I) = B(J,I) + 1
END DO
DO J = 1,100
A(J,I+1) = B(J,I+1) + 1
END DO
END DO
The compiler jams, or fuses, the inner two loop bodies together, producing the following nest:
DO I = 1, 10, 2
DO J = 1,100
A(J,I) = B(J,I) + 1
A(J,I+1) = B(J,I+1) + 1
END DO
END DO
Example 2: Illegal unrolling of outer loops
Outer loop unrolling is not always legal because the transformation can change the semantics of the original program. For example, unrolling the following loop nest on the outer loop would change the program semantics because of the dependency between A(...,I)
and A(...,I+1)
.
!DIR$ UNROLL 2
DO I = 1, 10
DO J = 1,100
A(J,I) = A(J-1,I+1) + 1
END DO
END DO
Example 3: Unroll nearest neighbor pattern
The following example shows unrolling with nearest neighbor pattern. This allows register reuse and reduces memory references from 2 per trip to 1.5 per trip.
!DIR$ UNROLL 2
DO J = 1,N
DO I = 1,N ! VECTORIZE
A(I,J) = B(I,J) + B(I,J+1)
ENDDO
ENDDO
The preceding code fragment is converted to the following code:
DO J = 1,N,2 ! UNROLLED FOR REUSE OF B(I,J+1)
DO I = 1,N ! VECTORIZED
A(I,J) = B(I,J) + B(I,J+1)
A(I,J+1) = B(I,J+1) + B(I,J+2)
END DO
END DO
VECTOR, NOVECTOR
!DIR$ VECTOR
[clause [,clause] … ]
!DIR$ NOVECTOR
The VECTOR
and NOVECTOR
directives apply only to the next loop.
The NOVECTOR
directive suppresses compiler attempts to vectorize loops and array syntax statements. It overrides any other vectorization-related directives, as well as the -O vector
n command line option. This directive is ignored if vectorization or scalar optimization has been disabled.
The VECTOR
directive supports the following optional clauses:
ALWAYS
Vectorize the loop that immediately follows the directive. This directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard. This directive has the same effect as the PREFERVECTOR
directive.
ALIGNED
Directs the compiler to generate aligned data movement instructions for array references when vectorizing. For current INTEL processors, data alignment is necessary for efficient vectorization. Use with care to improve performance. If some of the access patterns are actually unaligned, using the ALIGNED
clause may generate incorrect code. This directive also directs the compiler to ignore explicit and implicit vector dependencies.
UNALIGNED
Directs the compiler to generate unaligned data movement instructions for all array references when vectorizing.
Differences between HPE CCE versions
Prior to CCE 9.0, the NOVECTOR
directive applied to the rest of the program unit unless subsequently superseded by a VECTOR
directive. The NOVECTOR
and VECTOR
directives behaved as toggle switches, controlling vectorization for the remainder of the program unit unless superseded by the countervailing directive.
Beginning with CCE 9.0, the VECTOR
and NOVECTOR
directives apply only to the next loop. The HPE Cray Fortran -h vector_classic
command line option is provided in order to provide pre-CCE 9.0 behavior.
WEAK
!DIR$ WEAK
procedure_name [,procedure_name] …
!DIR$ WEAK
procedure_name=stub_name [,procedure_name1=stub_name1] …
Scope: Global
The WEAK
directive specifies an external identifier that may remain unresolved throughout the compilation. The WEAK
directive supports the following arguments:
procedure_name
A weak object in the form of a variable or procedure.
stub_name
A stub procedure that exists in the code. The stub_name will be called if a strong reference does not exist for procedure_name. The stub_name procedure must have the same name and dummy argument list as procedure_name.
A weak external does not increase the total memory requirements of a program. The WEAK
directive can prevent the compiler driver from adding the binary to a program, resulting in a smaller program and less use of memory.
The first form of the directive allows the declaration of one or more weak references on one line.
The second form allows the assigning of a strong reference to a weak reference.
Declaring an object as a weak external directs the linker to do one of these tasks:
Link the object if it is already linked. That is, if a strong reference already exists or is defined by the program, that reference will be used.
If a strong reference is specified in the weak directive (second form), and a reference is not defined by the program, then the strong reference is assigned to the weak reference.
If no strong reference exists, the object is left as an unsatisfied external. The linker does not display an unsatisfied external message for unresolved weak references.
Note that the linker treats weak externals as unsatisfied externals, so they remain silently unresolved if no strong reference occurs during compilation. Thus, it is the developer’s responsibility to ensure that run time references to weak external names do not occur unless the linker (using some “strong” reference elsewhere) has actually linked the entry point in question.
The attributes that weak externals must have depend on the form of the weak directive used:
First form, weak externals must be declared, but not defined or initialized, in the source file.
Second form, weak externals may be declared, but not defined or initialized, in the source file.
Either form, weak externals cannot be declared with a
static
storage class.
\pagebreak
Source Preprocessing
Source preprocessing helps port a program from one platform to another by allowing source text to be specified that is platform specific.
For a source file to be preprocessed automatically, it must have an uppercase extension, either .F or .FOR (for a file in fixed source form), or .F90, F95,.F03, .F08, .F18, or .FTN (for a file in free source form). To specify preprocessing of source files with other extensions, including lowercase ones, use the -eP or -eZ options described in Command Line Options.
General Rules
Alter the source code through source preprocessing directives. These directives are fully explained below in Directives. The directives must be used according to the following rules:
Do not use source preprocessor (#) directives within multiline compiler directives (CDIR$, !DIR$, C$OMP, or !$OMP).
A source file that contains an #if directive cannot be included without a balancing #endif directive within the same file.
The #if directive includes the #ifdef and #ifndef directives.
If a directive is too long for one source line, the backslash character () is used to continue the directive on successive lines. Successive lines of the directive can begin in any column.
The backslash character () can appear in any location within a directive in which white space can occur. A backslash character () in a comment is treated as a comment character. It is not recognized as signaling continuation.
Every directive begins with the pound character (#), and the pound character (#) must be in column 1.
Blank and tab (HT) characters can appear between the pound character (#) and the directive keyword.
Form feed (FF) or vertical tab (VT) characters cannot be written to separate tokens on a directive line. That is, a source preprocessing line must be continued, by using a backslash character (), if it spans source lines.
Blanks are significant, so the use of spaces within a source preprocessing directive is independent of the source form of the file. The fields of a source preprocessing directive must be separated by blank or tab (HT) characters.
Any user-specified identifier that is used in a directive must follow Fortran rules for identifier formation. The exceptions to this rule are as follows:
The first character in a source preprocessing name (a macro name) can be an underscore character (_).
Source preprocessing names are significant in their first 132 characters whereas a typical Fortran identifier is significant only in its first 63 characters.
Source preprocessing identifier names are case sensitive.
Numeric literal constants must be integer literal constants or real literal constants, as defined for Fortran.
Comments written in the style of the C language, beginning with /* and ending with */, can appear anywhere within a source preprocessing directive in which blanks or tabs can appear. The comment, however, must begin and end on a single source line.
Directive syntax allows an identifier to contain the ! character. Therefore, placing the ! character to start a Fortran comment on the same line as the directive should be avoided.
Directives
The blanks shown in the syntax descriptions of the source preprocessing directives are significant. The tab character (HT) can be used in place of a blank. Multiple blanks can appear wherever a single blank appears in a syntax description.
#include
DirectiveThe
#include
directive directs the system to use the content of a file. Just as with theINCLUDE
line path processing defined by the Fortran standard, an#include
directive effectively replaces that directive line by the content of filename. This directive has the following formats:#include "filename" #include <filename>
filename
A file or directory to be used. In the first form, if filename does not begin with a slash (/) character, the system searches for the named file, first in the directory of the file containing the `#include` directive, then in the sequence of directories specified by the `-I` option(s) on the `ftn` command line, and then the standard (default) sequence. If filename begins with a slash (/) character, it is used as is and is assumed to be the full path to the file. The second form directs the search to begin in the sequence of directories specified by the `-I` option(s) on the `ftn` command line and then search the standard (default) sequence.
The Fortran standard prohibits recursion in
INCLUDE
files, so recursion is also prohibited in the#include
form.The
#include
directives can be nested.When the compiler is invoked to do only source preprocessing, not compilation, text will be included by
#include
directives but not by FortranINCLUDE
lines.#define
DirectiveThe
#define
directive declares a variable name and assigns a value to the variable. It also allows the definition of a function-like macro. This directive has the following format:#define identifier value #define identifier (dummy_arg_list) value
The first format defines an object-like macro (also called a source preprocessing variable), and the second defines a function-like macro. In the second format, the left parenthesis that begins the dummy_arg_list must immediately follow the identifier, with no intervening white space.
identifier
The name of the variable or macro being defined. Rules for Fortran variable names apply; that is, the name cannot have a leading underscore character (_). For example, `ORIG` is a valid name, but `_ORIG` is invalid.
dummy_arg_list
A list of dummy argument identifiers.
value
The value is a sequence of tokens. The value can be continued onto more than one line using backslash (\) characters.
If a preprocessor identifier appears in a subsequent
#define
directive without being the subject of an intervening#undef
directive, and the value in the second#define
directive is different from the value in the first#define
directive, then the preprocessor issues a warning message about the redefinition. The second directive’s value is used.When an object-like macro’s identifier is encountered as a token in the source file, it is replaced with the value specified in the macro’s definition. This is referred to as an invocation of the macro.
The invocation of a function-like macro is more complicated. It consists of the macro’s identifier, immediately followed by a left parenthesis with no intervening white space, then a list of actual arguments separated by commas, and finally a terminating right parenthesis. There must be the same number of actual arguments in the invocation as there are dummy arguments in the
#define
directive. Each actual argument must be balanced in terms of any internal parentheses. The invocation is replaced with the value given in the macro’s definition, with each occurrence of any dummy argument in the definition replaced with the corresponding actual argument in the invocation.For example, the following program prints
Hello, world.
when compiled and run:PROGRAM P #define GREETING 'Hello, world.' PRINT *, GREETING END PROGRAM P
The following program prints
Hello, world.
when compiled and run:PROGRAM P #define GREETING(str1, str2) str1, str1, str2 PRINT *, GREETING('Hello, ', 'world.') END PROGRAM P
#undef
DirectiveThe
#undef
directive sets the definition state of identifier to an undefined value. If identifier is not currently defined, the#undef
directive has no effect. This directive has the following format:#undef identifier
identifier
The name of the variable or macro being defined.
#
(Null) DirectiveThe null directive simply consists of the pound character (#) in column 1 with no significant characters following it. That is, the remainder of the line is typically blank or is a source preprocessing comment. This directive is generally used for spacing out other directive lines.
Conditional Directives
Conditional directives cause lines of code to either be produced by the source preprocessor or to be skipped. The conditional directives within a source file form if-groups. An if-group begins with an
#if
,#ifdef
, or#ifndef
directive, followed by lines of source code that may or may not be skipped. Several similarities exist between the FortranIF
construct and if-groups:The
#elif
directive corresponds to theELSE IF
statement.The
#else
directive corresponds to theELSE
statement.Just as an
IF
construct must be terminated with anEND IF
statement, an if-group must be terminated with an#endif
directive.Just as with an
IF
construct, any of the blocks of source statements in an if-group can be empty. For example, the following directives can be written:#if MIN_VALUE == 1 #else ... #endif
Determining which group of source lines (if any) to compile in an if-group is essentially the same as the Fortran determination of which block of an
IF
construct should be executed.#if
DirectiveThe
#if
directive has the following format:#if expression
expression
An expression. The values in expression must be integer literal constants or previously defined preprocessor variables. The expression is an integer constant expression as defined by the C language standard. All the operators in the expression are C operators, not Fortran operators. The expression is evaluated according to C language rules, not Fortran expression evaluation rules. Note that unlike the Fortran `IF` construct and `IF` statement logical expressions, expression in an `#if` directive need not be enclosed in parentheses.
The
#if
expression can also contain the unarydefined
operator, which can be used in either of the following formats:defined identifier
defined (identifier)
When the
defined
subexpression is evaluated, the value is 1 if identifier is currently defined, and 0 if it is not.All currently defined source preprocessing variables in expression, except those that are operands of
defined
unary operators, are replaced with their values. During this evaluation, all source preprocessing variables that are undefined evaluate to 0.Note that the following two directives are not equivalent:
#if X
#if defined(X)
In the first case, the condition is true if X has a nonzero value. In the second case, the condition is true only if X has been defined (has been given a value that could be 0).
#ifdef
DirectiveThe
#ifdef
directive is used to determine if identifier is predefined by the source preprocessor, has been named in a#define
directive, or has been named in aftn -D
command line option. This directive has the following format:#ifdef identifier
The
#ifdef
directive is equivalent to either of the following two directives:#if defined identifier
#if defined (identifier)
#ifndef
DirectiveThe
#ifndef
directive tests for the presence of an identifier that is not defined. This directive has the following format:#ifndef identifier
This directive is equivalent to either of the following two directives:
#if !defined identifier
#if !defined (identifier)
#elif
DirectiveThe
#elif
directive serves the same purpose in an if-group as does theELSE IF
statement of a FortranIF
construct. This directive has the following format:#elif expression
expression
The expression follows all the rules of the integer constant expression in an
#if
directive.#else
DirectiveThe
#else
directive serves the same purpose in an if-group as does theELSE
statement of a FortranIF
construct. This directive has the following format:#else
#endif
DirectiveThe
#endif
directive serves the same purpose in an if-group as does theEND IF
statement of a FortranIF
construct. This directive has the following format:#endif
Predefined Macros
The HPE Cray Fortran compiler source preprocessing supports a number of predefined macros. They are divided into groups as follows:
Macros based on the host machine:
Macro
Description
unix, __unix, unix
Always defined. (The leading characters in the second form consist of 2 consecutive underscores; the third form consists of 2 leading and 2 trailing underscores.)
Macros based on CLE system targets:
Macro
Description
_ADDR64
Defined for CLE systems as targets. The target system must have 64-bit address registers.
_MAXVL_8, _MAXVL_16, _MAXVL_32, _MAXVL_64, _MAXVL_128
MAXVL (Maximum Vector Length)
is defined by dividing the size of the widest hardware vector register by the number of bits in the data type. For x86 targets supporting AVX512, the values are 64, 32, 16, 8, and 4. For x86 targets not supporting AVX512, the values are 32, 16, 8, 4, and 2. For ARM targets, the values vary according to the maximum hardware vector length supported by the system.Macros based on the HPE Cray Fortran compiler:
Macro
Description
_CRAYFTN
Defined as 1.
_CRAY_COARRAY
Defined as 1 if
-hcaf
is specified on the command line. If-hnocaf
is specified, this macro is undefined._OPENMP
Defined as the publication date of the OpenMP standard supported, as a string of the form yyyymm.
_RELEASE_MAJOR
Defined as the major release level of the compiler.
_RELEASE_MINOR
Defined as the minor release level of the compiler.
_RELEASE_PATCHLEVEL
Represents the patch level of the compiler (the third field in the version string).
_RELEASE_STRING
Defined as a string that describes the version of the compiler.
Macros based on the source file:
Macro
Description
line, __ LINE__
Defined to be the line number of the current source line in the source file.
file, FILE
Defined to be the name of the current source file.
date, DATE
Defined to be the current date in the form mm/dd/yy.
time, TIME
Defined to be the current in the form hh:mm:ss.
The following predefined macros are based on the source file:
Command Line Options
The following ftn
command line options affect source preprocessing. See the crayftn(1)
man page for more information about these options.
The
-D
identifier=value option defines variables used for source preprocessing.The
-dF
option controls macro expansion in Fortran source statements.The
-eP
option performs source preprocessing on file.f90, file.F90, file.F95, file.F03, file.F08, file.F18,file.ftn
, orfile.FTN
but does not compile. The -eP option produces file.i.The -eZ option performs source preprocessing and compilation on file.f90,file.F90, file.F95, file.F03, file.F08, file.F18,
file.ftn
, orfile.FTN
. The -eZ option producesfile.i
.The
-U identifier
, identifier … option undefines variables used for source preprocessing. For more information about this option, see Fortran Command-line Options.The
-D identifier
=value and -U identifier, identifier… options are ignored unless one of the following is true:The Fortran input source file is specified as either file.f90, file.F90, file.F95, file.F03, file.F08, file.F18
file.ftn
, orfile.FTN
.The -eP or -eZ options have been specified.
\pagebreak
OpenMP Overview
The OpenMP API provides a parallel programming model that is portable across shared memory architectures from HPE and other vendors. The OpenMP specification is accessible at https://www.openmp.org. OpenMP is disabled by default in HPE CCE and must be explicitly enabled using the -homp
or -fopenmp
option.
Supported Version
CCE supports full OpenMP 5.0 and partial OpenMP 5.1 and 5.2. The following OpenMP 5.1 features are supported:
masked
construct withoutfilter
clause (Fortran)metadirective
dynamicuser condition
andtarget_device
selectors (Fortran)assume
andassumes
directives (Fortran)nothing
directive (Fortran)
The following OpenMP 5.2 features are supported:
otherwise
clause for metadirective (Fortran)
Compiling
OpenMP is disabled at default and must be explicitly enabled. These HPE CCE options affect OpenMP applications:
-h [no]omp
-f openmp
(synonym for-h omp
)-h threadn
Executing
For OpenMP applications, use both the OMP_NUM_THREADS
environment variable to specify the number of threads and the aprun -d
depth option to specify the number of CPUs hosting the threads. The number of threads specified by OMP_NUM_THREADS
should not exceed the number of cores in the CPU. If neither the OMP_NUM_THREADS
environment variable nor the omp_set_num_threads
call is used to set the number of OpenMP threads, the system defaults to 1 thread.
Debugging
The -g
option is compatible with the -homp
option, and together the options provide debugging support for OpenMP directives. The -g
option, when specified with no optimization options or with -O0
, provides debugging support identical to specifying the -G0
option. If any optimization is specified, -g
is ignored.
OpenMP Implementation Defined Behavior
The OpenMP Application Program Interface Specification, presents a list of implementation defined behaviors. The HPE implementation is described in the following sections.
Atomicity of memory access by multiple threads
When multiple threads access the same shared memory location and at least one thread is a write, threads should be ordered by explicit synchronization to avoid data race conditions and the potential for non-deterministic results. Always use explicit synchronization for any access smaller than one byte.
Internal Control Variables (ICVs)
ICV |
Initial Value |
Note |
---|---|---|
nthreads-var |
1 |
|
dyn-var |
TRUE |
Behaves according to Algorithm 2-1 of the specification. |
run-sched-var |
static |
|
stacksize-var |
128 MB |
|
wait-policy-var |
ACTIVE |
|
thread-limit-var |
64 |
Threads may be dynamically created up to an upper limit which is 4 times the number of cores/node. It is up to the programmer to try to limit oversubscription. |
max-active-levels-var |
4095 |
|
def-sched-var |
static |
The chunksize is rounded up to improve alignment for vectorized loops. |
Dynamic Adjustment of Threads
The ICV dyn-var is enabled by default. Threads may be dynamically created up to an upper limit which is 4 times the number of cores/node. It is up to the programmer to try to limit oversubscription.
If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads specified for the parallel region exceeds the number that the runtime system can supply, the program terminates. The number of physical processors actually hosting the threads at any given time is fixed at program startup and is specified by the aprun -d
depth option. The OMP_NESTED
environment variable and the omp_set_nested()
call control nested parallelism. To enable nesting, set OMP_NESTED
to true or use the omp_set_nested()
call. Nesting is disabled by default.
Directives and Clauses
atomic
directiveWhen supported by the target architecture, atomic directives are lowered into hardware atomic instructions. Otherwise, atomicity is guaranteed with a lock. OpenMP
atomic
directives are compatible with C11 and C++11 atomic operations, as well as GNU atomic builtins.
for
directiveFor the
schedule(guided
,chunk) clause, the size of the initial chunk for the master thread and other team members is approximately equal to the trip count divided by the number of threads.For the
schedule(runtime)
clause, the schedule type and, optionally, chunk size can be chosen at runtime by setting theOMP_SCHEDULE
environment variable. If this environment variable is not set, the default behavior of theschedule(runtime)
clause is as if theschedule(static)
clause appeared instead.In the absence of the
schedule
clause, the defaultschedule
isstatic
and the default chunk size is approximately the number of iterations divided by the number of threads.The integer type or kind used to compute the iteration count of a collapsed loop are signed 64-bit integers, regardless of how the original induction variables and loop bounds are defined. If the schedule specified by the runtime schedule clause is specified and run-sched-var is auto, then the HPE implementation generates a static schedule.
do
andparallel do
directivesFor the
schedule(guided
,chunk) clause, the size of the initial chunk for the master thread and other team members is approximately equal to the trip count divided by the number of threads.For the
schedule(runtime)
clause, the schedule type and, optionally, chunk size can be chosen at runtime by setting theOMP_SCHEDULE
environment variable. If this environment variable is not set, the default behavior of theschedule(runtime)
clause is as if theschedule(static)
clause appeared instead.In the absence of the
schedule
clause, the defaultschedule
isstatic
and the default chunk size is approximately the number of iterations divided by the number of threads.The integer type or kind used to compute the iteration count of a collapsed loop are signed 64-bit integers, regardless of how the original induction variables and loop bounds are defined. If the schedule specified by the runtime schedule clause is specified and run-sched-var is auto, then the HPE implementation generates a static schedule.
parallel
directiveIf a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads specified for the parallel region exceeds the number that the runtime system can supply, the program terminates.
The number of physical processors actually hosting the threads at any given time is fixed at program startup and is specified by the
aprun -d
depth option.The
OMP_NESTED
environment variable and theomp_set_nested()
call control nested parallelism. To enable nesting, setOMP_NESTED
totrue
or use theomp_set_nested()
call. Nesting is disabled by default.
private
clauseIf a variable is declared as private, the variable is referenced in the definition of a statement function, and the statement function is used within the lexical extent of the directive construct, then the statement function references the private version of the variable.
sections
constructMultiple structured blocks within a single sections construct are scheduled in lexical order and an individual block is assigned to the first thread that reaches it. It is possible for a different thread to execute each section block, or for a single thread to execute multiple section blocks. There is not a guaranteed order of execution of structured blocks within a section.
single
directiveA single block is assigned to the first thread in the team to reach the block; this thread may or may not be the master thread.
threadprivate
directiveThe threadprivate directive specifies that variables are replicated, with each thread having its own copy. If the dynamic threads mechanism is enabled, the definition and association status of a thread’s copy of the variable is undefined, and the allocation status of an allocatable array is undefined.
thread_limit
clauseThe
thread_limit
clause places a limit on the number of threads that a team construct may create. For non-host GPU accelerator targets, this clause controls the number of CUDA threads per thread block. Only constant integer expressions are supported. If HPE CCE does not support athread_limit
expression, then it will issue a warning message indicating the default value that will be used instead.
Library Routines
omp_get_max_active_levels()
The
omp_get_max_active_levels()
routine returns the maximum number of nested parallel levels currently allowed. There is a single max-active-levels-var internal control variable for the entire runtime system. Thus, a call toomp_get_max_active_levels()
will bind to all threads, regardless of which thread calls it.
omp_get_nested()
The deprecated
omp_get_nested()
routine returns whether nested parallelism is enabled or disabled, according to the value of the max-active-levels-var internal control variable. The default isfalse
.
omp_set_dynamic()
The
omp_set_dynamic()
routine enables or disables dynamic adjustment of the number of threads available for the execution of subsequent parallel regions by setting the value of the dyn-var internal control variable. The default ison
.
omp_set_max_active_levels()
Sets the max-active-levels-var internal control variable. Defaults to
64
. If then argument is less than 1, then set to1
.
omp_set_nested()
The deprecated
omp_set_nested()
routine enables or disables nested parallelism, by setting the max-active-levels-var internal control variable. The default isfalse
.
omp_set_num_threads()
Sets the nthreads-var internal control variable to a positive integer. If the argument is less than 1, then sets nthreads-var to
1
.
omp_set_schedule()
Sets the schedule type as defined by the current specification. There are no implementation-defined schedule types.
omp_set_num_threads
Sets
nthreads-var
to a positive integer. If the argument is < 1, then setnthreads-var
to 1.
omp_set_schedule
Sets the schedule type as defined by the current specification. There are no implementation defined schedule types.
Runtime Library Definitions
It is implementation-defined and determines whether the include file omp_lib.h
or the module omp_lib
(or both) is provided. It is implementation-defined whether any of the OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different KIND type can be Fortran accommodated. Both omp_lib.h and the module omp_lib
are provided. HPE Cray Fortran uses generic interfaces for routines. If an OMP runtime library routine is defined to be generic, use of arguments of kind other than those specified by OMP_*_KIND
constants is undefined.
Environment Variables
CRAY_OMP_CHECK_AFFINITY
This environment variable is superseded by OMP_DISPLAY_AFFINITY
. HPE recommends that users use OMP_DISPLAY_AFFINITY
instead of this environment variable.
CRAY_OMP_CHECK_AFFINITY
is a run time environment variable. Set it to TRUE to display affinity binding for each OpenMP thread. The messages contain the hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.
OMP_DISPLAY_AFFINITY
This is a runtime environment variable. Set it to TRUE to display formatted affinity binding for each OpenMP thread. The default format includes the hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding. The format can be changed using the OMP_AFFINITY_FORMAT
environment variable, which is documented in the OpenMP 55.0 API Syntax Reference Guide.
OMP_DYNAMIC
The default value is true.
OMP_MAX_ACTIVE_LEVELS
The default value is 64.
OMP_NESTED
This environment variable is deprecated. Use OMP_MAX_ACTIVE_LEVELS
instead.
OMP_NUM_THREADS
If this environment variable is not set and you do not use the omp_set_num_threads() routine to set the number of OpenMP threads, the default is to the maximum number of available CPUs on the system.
The maximum number of threads per compute node is 4 times the number of allocated processors. If the requested value of OMP_NUM_THREADS
is more than the number of threads an implementation can support, the behavior of the program depends on the value of the OMP_DYNAMIC
environment variable. If OMP_DYNAMIC
is false, the program terminates. If OMP_DYNAMIC
is true, it uses up to 4 times the number of allocated processors.
OMP_PROC_BIND
When set to false, the OpenMP runtime does not attempt to set or change affinity binding for OpenMP threads. When not false, this environment variable controls the policy for binding threads to places. Care must be taken when using OpenMP affinity binding with other binding mechanisms. For example, when launching an application with ALPS aprun, the -cc cpu affinity binding option (the default) should only be used with OMP_PROC_BIND=false or OMP_PROC_BIND=auto, otherwise, the ALPS/CLE binding will severely over-constrain OpenMP binding. When setting OMP_PROC_BIND to a value other than false or auto, applications should be launched with -cc depth or -cc none. Using -cc depth is particularly important when running multiple PEs per compute node, since it will allow each PE to bind to CPUs in non-overlapping subsets of the node. Valid values for this environment variable are true, false, auto, or a comma-separated list of spread, close, and master. A value of true is mapped to spread.
The default value for OMP_PROC_BIND
is auto, an HPE-specific extension. The auto binding policy directs the OpenMP runtime library to select the affinity binding setting that it determines to be most appropriate for a given situation. If there is only a single place in the place-partition-var ICV, and that place corresponds to the initial affinity mask of the master thread, then the auto binding policy maps to false (i.e., binding is disabled). Otherwise, the auto binding policy causes threads to bind in a manner that partitions the available places across OpenMP threads
OMP_PLACES
This environment variable has no effect if OMP_PROC_BIN
D=false; when OMP_PROC_BIND
is not false, then OMP_PLACES
defines a set of places, or CPU affinity masks, to which threads are bound. When using the threads, cores, and sockets keywords, places are constructed according to the CPU topology presented by Linux. However, the place list is always constrained by the initial affinity mask of the master thread. As a result, specific numeric CPU identifiers appearing in OMP_PLACES
will map onto CPUs in the initial CPU affinity mask. If an application is launched with -cc none, then numeric CPU identifiers will exactly match Linux CPU numbers. If instead it is launched with -cc depth, then numeric CPU identifier 0 will map to the first CPU in the initial affinity mask for the master thread; identifier 1 will map to the second CPU in the initial mask, and so on. This allows the same OMP_PLACES
environment variable for all PEs to be used, even when launching multiple PEs per node; the -cc depth setting ensures that each PE begins executing with a non-overlapping initial affinity mask, allowing each instance of the OpenMP runtime to assign thread affinity within those non-overlapping affinity masks.
The default value of OMP_PLACES
depends on the value of OMP_PROC_BIND
. If OMP_PROC_BIND
is auto, then the default value for OMP_PLACES
is cores. Otherwise, the default value of OMP_PLACES
is threads.
OMP_SCHEDULE
The default value for this environment variable is static
. For the schedule(runtime) clause, the schedule type and, optionally, chunk size can be chosen at run time by setting the OMP_SCHEDULE
environment variable.
OMP_STACKSIZE
The default value is 128 MB.
OMP_THREAD_LIMIT
Sets the number of OpenMP threads to use for the entire OpenMP program by setting the thread-limit-var ICV. The HPE implementation defaults to 4 times the number of available processors.
OMP_WAIT_POLICY
Provides a hint to an OpenMP implementation about the desired behavior of waiting threads by setting the wait-policy-var ICV. Possible values are ACTIVE and PASSIVE, as defined by the OpenMP specification, and AUTO, an HPE-specific extension. The default value for this environment variable is AUTO, which direct the OpenMP runtime library to select the most appropriate wait policy for the situation. In general, the AUTO policy behaves like ACTIVE, unless the number of OpenMP threads or affinity binding results in over subscription of the available hardware processors. If over subscription is detected, the AUTO policy behaves like PASSIVE
HPE-specific OpenMP API
This section describes OpenMP API specific to HPE.
cray_omp_set_wait_policy
subroutine cray_omp_set_wait_policy ( policy )
character(*), intent(in) :: policy
void cray_omp_set_wait_policy( const char *policy );
This routine allows dynamic modification of the wait-policy-var ICV value, which corresponds to the OMP_WAIT_POLICY
environment variable. The policy argument provides a hint to the OpenMP runtime library environment about the desired behavior of waiting threads; acceptable values are AUTO
, ACTIVE
, or PASSIVE
(case insensitive). It is an error to call this routine in an active parallel region. The OpenMP runtime library supports a “wait policy” and a “contention policy,” both of which can be set with the following environment variables:
OMP_WAIT_POLICY=(AUTO|ACTIVE|PASSIVE)
CRAY_OMP_CONTENTION_POLICY=(Automatic|Standard|MonitorMwait)
These environment variables allow the policies to be set once at program launch for the entire execution. However, in some circumstances it would be useful for the programmer to explicitly change the policy at various points during a program’s execution. This HPE-specific routine allows the programmer to dynamically change the wait policy (and potentially the contention policy). This addresses the situation when an application needs OpenMP for the first part of program execution, but there is a clear point after which OpenMP is no longer used. Unfortunately, the idle OpenMP threads still consume resources since they are waiting for more work, resulting in performance degradation for the remainder of the application. A passive-waiting policy might eliminate the performance degradation after OpenMP is no longer needed, but the developer may still want an active-waiting policy for the OpenMP-intensive region of the application. This routine notifies all threads of the policy change at the same time, regardless of whether they are idle or active (to avoid deadlock from waiting and signaling threads using different policies).
CRAY_OMP_CHECK_AFFINITY
This environment variable is superseded by OMP_DISPLAY_AFFINITY
. HPE recommends that users use OMP_DISPLAY_AFFINITY
instead of this environment variable.
omp_lib
If the omp_lib
module is not used and the kind of the actual argument does not match the kind of the dummy argument, the behavior of the procedure is undefined.
omp_get_wtime omp_get_wtick
These procedures return real(kind=8)
values instead of double precision values.
Optimizations
A certain amount of overhead is associated with multiprocessing a loop. If the work occurring in the loop is small, the loop can actually run slower by multiprocessing than by single processing. To avoid this, make the amount of work inside the multiprocessed region as large as possible, as is shown in the following examples.
Consider the following code:
DO K = 1, N
DO I = 1, N
DO J = 1, N
A(I,J) = A(I,J) + B(I,K) * C(K,J)
END DO
END DO
END DO
For the preceding code fragment, parallelize the J
loop or the I
loop. The K
loop cannot be parallelized because different iterations of the K
loop read and write the same values of A(I,J)
. Try to parallelize the outermost DO
loop if possible, because it encloses the most work. In this example, that is the I
loop. For this example, use the technique calledloop interchange. Although the parallelizable loops are not the outermost ones, the loops can be reordered to make one of them outermost.
Thus, loop interchange would produce the following code fragment:
!$OMP PARALLEL DO PRIVATE(I, J, K)
DO I = 1, N
DO K = 1, N
DO J = 1, N
A(I,J) = A(I,J) + B(I,K) * C(K,J)
END DO
END DO
END DO
Now the parallelizable loop encloses more work and shows better performance. In practice, relatively few loops can be reordered in this way. However, it does occasionally happen that several loops in a nest of loops are candidates for parallelization. In such a case, it is usually best to parallelize the outermost one.
Occasionally, the only loop available to be parallelized has a fairly small amount of work. It may be worthwhile to force certain loops to run without parallelism or to select between a parallel version and a serial version, on the basis of the length of the loop.
The loop is worth parallelizing if N
is sufficiently large. To overcome the parallel loop overhead, N
needs to be around 1000, depending on the specific hardware and the context of the program. The optimized version would use an IF
clause on the PARALLEL DO
directive:
!$OMP PARALLEL DO IF (N .GE. 1000), PRIVATE(I)
DO I = 1, N
A(I) = A(I) + X*B(I)
END DO
\pagebreak
OpenACC Use
HPE CCE supports full OpenACC 2.0 and partial OpenACC 2.6 for Fortran (OpenACC is not supported for C or C++). The following OpenACC 2.6 features are supported:
attach
/detach
behavior and clausesdefault(present)
clauseImplied present-or behavior for
copy
,copyin
,copyout
, andcreate
data clauses
OpenACC directives are supported for offloading to NVIDIA GPUs, AMD GPUs, or the current CPU target. An appropriate accelerator target module must be loaded in order to use OpenACC directives.
OpenACC is a parallel programming model that facilitates the use of an accelerator device attached to a host CPU. The OpenACC API allows the programmer to supplement information available to the compilers in order to offload code from a host CPU to an attached accelerator device.
This release supports the OpenACC Application Programming Interface standard developed by PGI, Cray Inc., and NVIDIA, with support from CAPS enterprise. For further information, refer to http://www.openacc-standard.org.
For the most current information regarding the HPE implementation of OpenACC, see the intro_openacc(7)
man page. See the OpenACC.EXAMPLES(7)
man page for example OpenACC codes.
OpenACC Execution Model
The CPU host offloads compute intensive regions to the accelerator device. The accelerator executes parallel regions, which contain work sharing loops executed as kernels on the accelerator. The CPU host manages execution on the accelerator by allocating memory on the accelerator, initiating data transfer, sending code, passing arguments to the region, waiting for completion, transferring accelerator results back to the CPU host and releasing memory.
The accelerator on the HPE system supports multiple levels of parallelism. The accelerator executes a kernel composed of parallel threads or vectors. Vectors (threads) are grouped into sets called workers. Threads in a set of workers are scheduled together and execute together. Workers are grouped into larger sets called gangs. One or more gangs may comprise a kernel. To summarize, a kernel is executed as a set of gangs of workers of vectors.
The compiler determines the number of gangs/workers/vectors based on the problem and then maps the vectors, workers, and gangs onto the accelerator architecture. Specifying the number of gangs, workers, or vectors is optional but may permit tuning to a particular target architecture. The way that the compiler maps a particular problem onto a constellation of gangs, workers, and vectors which are then mapped onto the accelerator architecture is implementation defined.
OpenACC terminology is situated in the context of the PGAS programming model. In the PGAS model, there may be one or more Processing Elements (PEs) per node. Each PE is multi-threaded and each thread can execute vector instructions. The PGAS thread concept is not the same as the OpenACC thread concept.
OpenACC Memory Model
The memory on the accelerator is separate from host memory. Accelerator device memory is not mapped onto the host’s virtual memory space. All data movement between host and accelerator memory is initiated by the host through the library functions that move data. Also, it is not assumed that the accelerator can access host memory, though it is supported by some devices. In this model, data movement between memories is managed by the compiler according to OpenACC directives. The programmer needs to be aware of device memory size, as well as memory bandwidth between host and device in order to effectively accelerate a region of code.
Current accelerators implement a weak memory model; they do not support memory coherence between operations executed by different execution units - an execution unit is a hardware abstraction which can execute one or more gangs. If an operation updates a memory location and another reads from the same location, or two operations store a value to the same location, the hardware may not guarantee repeatable results. Some potential errors of this type are prevented by the compiler, but it is possible to write an accelerator parallel region that produces inconsistent results. Memory coherence is guaranteed when memory operations referencing the same location are separated by an explicit barrier.
Map the OpenACC Programming Model onto Accelerator Components
The compiler maps the OpenACC execution model (kernels, gangs, workers, vectors) onto the accelerator architecture as described in the following sections.
Stream Multiprocessors (SM) and Scalar Processor (SP) cores
The OpenACC execution model maps to the NVIDIA GPU hardware as follows (GPU terms are in parenthesis): One or more OpenACC kernels may execute on an GPU. The compiler divides a kernel into one or more gangs (blocks) of vectors (threads). Several concurrent gangs (blocks) of threads may execute on one SM depending on several factors, including memory requirements, compiler optimizations, or user directives. A single block (gang) does not span SMs and will remain on one SM until completion. When the SM encounters a block (gang), each gang (block) is further broken up into workers (warps) which are groups of threads to execute in parallel. Scheduling occurs at the granularity of the worker (warp). Individual threads within a warp start together and execute one common instruction at a time. If conditional branching occurs within a worker (warp), the warp serially executes each branch path taken causing some threads to wait until threads converge back to the same instruction. Data dependent conditional code within a warp usually has negative performance impact. Worker (warp) threads also fetch data from memory together and when accessing global memory, the accesses of the threads within a warp are grouped to minimize transactions. Each thread in a worker (warp) is executed on a different SP core.
There may be up to 32 threads in a worker (warp) - a limit defined by the hardware.
See the intro_openacc(7)
man page for more detail on Partition Mapping.
Memory
There is a hierarchy of memory spaces used by OpenACC threads. Each thread has its own private local memory. Each gang of workers of threads has shared memory visible to all threads of the gang. All OpenACC threads running on a GPU have access to the same global memory. Global memory on the accelerator is accessible to the host CPU.
Mixed Model Support
OpenMP directives may appear inside of OpenACC data or host data regions only. OpenMP directives are not allowed inside of any other OpenACC directives.
OpenACC may not appear inside OpenMP directives. To have OpenACC directives nested inside of OpenMP constructs, place them in calls that are not inlined.
Compile with OpenACC
The HPE CCE compiler recognizes OpenACC directives, by default. Use either the ftn
or cc
command to compile.
The HPE CCE compiler does not produce CUDA code. It generates PTX (Parallel Thread Execution) instructions which are then translated into assembly.
Note the following interactions between directives and command line options.
-x
The
-x
option accepts one or more directives as arguments. Directives specified with the-x
option are ignored during compilation. To ignore all directives, specify-x all
. To ignore accelerator directives, specify-x acc
.-h [no]acc
-h noacc
disables OpenACC directives.-h acc_model=option [:option ...]
Explicitly controls the execution and memory model utilized by the accelerator support system. The option arguments identify the type of behavior desired. There are three option sets. Only one member of a set may be used at a time; however, all three sets may be used together.
Default:
auto_async_kernel:fast_addr:no_deep_copy
option Set 1:
auto_async_none
Execute kernels and updates synchronously, unless there is an async clause present on the kernels or update directive.
auto_async_kernel
(Default) Execute all kernels asynchronously ensuring program order is maintained.
auto_async_all
Execute all kernels and data transfers asynchronously, ensuring program order is maintained.
option Set 2:
no_fast_addr
Use default types for addressing.
fast_addr
(Default) Attempt to use 32 bit integers in all addressing to improve performance. Base addresses remain as 64 bit. The performance is improved by potentially using fewer registers and faster arithmetic for offset calculations. This optimization may result in incorrect behavior for codes that make use within accelerator regions of any of the following: very large arrays (offsets would require greater than 32 bits), very large array lower bounds (max offset plus lower bound is greater than 32 bits), bitfields/other bit operations.
option Set 3:
no_deep_copy
Do not look inside of an object type to transfer sub-objects. Allocatable members of derived type objects will not be allocated on the device.
deep_copy : Look inside of derived type objects and recreate the derived type on the accelerator recursively. A derived type object that contains an allocatable member will have memory allocated on the device for the member.
Module Support
To compile, ensure that PrgEnv-cray
module is loaded. Then, load either the craype-accel-nvidia35
module for Kepler support or the craype-accel-nvidia60
module for Pascal support.
The craype-accel-host
module supports compiling and running an OpenACC application on the host X86 processor. This provides source code portability between systems with and without an accelerator. The accelerator directives are automatically converted at compile time to OpenMP equivalent directives.
Use either the ftn
or cc
command to compile.
Debug
Use either Allinea DDT or Rogue Wave TotalView.
The following applies to all debuggers:
To enable debugging, compile use the
-g
option.When compiling with the debug option (
-g
), HPE CCE may require additional memory from the accelerator heap, exceeding the 8MB default. In this case, there will be malloc failures during compilation. The environment variable CRAY_ACC_MALLOC_HEAPSIZE specifies the accelerator heap size in bytes. It may be necessary to increase the accelerator heap size to 32MB (33554432), 64MB (67108864), or greater by setting CRAY_ACC_MALLOC_HEAPSIZE accordingly. The accelerator heap size defaults to 8MB.Debug one rank/image/thread/PE per node.
HPE CCE does not generate CUDA code, but generates PTX code. Debuggers will not display CUDA intermediate code.
To enter an OpenACC region using a debugger, breakpoints may be set inside the OpenACC region. It is not possible to do a single step into the region from the code immediately prior to the start of an OpenACC directive.
OpenACC Directives
For information on the OpenACC directives, see the OpenACC 2.0 Specification available at http://www.openacc-standard.org.
For the most current information regarding the HPE implementation of OpenACC, see the intro_openacc(7)
man page. See the OpenACC.EXAMPLES(7)
man page for example OpenACC codes.
Runtime Routines
Runtime routines defined by the standard specification are supported unless otherwise noted in the intro_openacc(7)
man page.
Extended OpenACC Run Time Library Routines
Extended OpenACC run time library routines are HPE-specific low level routines that give object oriented programmers a mechanism for moving objects from the host CPU to the accelerator and copying memory between the host and the accelerator. These routines are implemented in C. See the intro_openacc(7)
man page.
HPE Cray Fortran provides a wrapper interface to the C routines using ISO C bindings. To use these routine bindings from Fortran, include the header file openacc_lib.h or use the openacc_lib module. Please see the example “Using_OPENACC_LIB” on the OpenACC.EXAMPLES(7)
man page.
Environment Variables
The following are environment variables are defined by the API specification:
ACC_DEVICE_NUM
ACC_DEVICE_TYPE
The following environment variable is HPE specific:
CRAY_ACC_MALLOC_HEAPSIZE
Specifies the accelerator heap size in bytes. The accelerator heap size defaults to 8MB. When compiling with the debug option (-g), HPE CCE may require additional memory from the accelerator heap, exceeding the 8MB default. In this case, there will be
malloc
failures during compilation. It may be necessary to increase the accelerator heap size to 32MB (33554432), 64MB (67108864), or greater.
OpenACC Examples
See the OpenACC.EXAMPLES(7)
man page for example OpenACC codes.
\pagebreak
Conformance Checks
The amount of error-checking of edit descriptors with input/output (I/O) list items during formatted READ and WRITE statements can be selected through a compiler driver option or through an environment variable.
By default, the compiler provides only limited error-checking.
Use the compiler driver options to choose the table to be used for the conformance check. The table is then part of the executable and no environment variable is required. The compiler driver options allow a choice of checking or no checking with a particular version of the Fortran standard for formatted READ and WRITE. See the following tables: RELAXED Compatibility Between Data Types and Data Edit Descriptors, STRICT77 Compatibility Between Data Types and Data Edit Descriptors, and STRICT90 and STRICT95 Compatibility Between Data Types and Data Edit Descriptors in Input/Output Editing.
The environment variable FORMAT_TYPE_CHECKING is evaluated during execution. The environment variable overrides a table chosen through the compiler driver option. The environment variable provides an intermediate type of checking that is not provided by the compiler driver option. The environment variable FORMAT_TYPE_CHECKING is described in Set Environment Variables to the HPE Cray Fortran Compiler.
To select the least amount of checking, use one or more of the following ftn
command line options.
On Cray Linux Environment (CLE) systems with formatted READ, use:
ftn -W1,--defsym,_RCHK=_RNOCHK *.f(note the double dashes that precede defsym)
On CLE systems with formatted WRITE, use:
ftn -W1,--defsym,_WCHK=_WNOCHK *.f
On CLE systems with both formatted READ and WRITE, use:
ftn -W1,--defsym,_WCHK=_WNOCHK -W1,--defsym,_RCHK=_RNOCHK *.f
To select strict amount of checking for either FORTRAN 77 or Fortran 90, use one or more of the following
ftn
command line options.On CLE systems with formatted READ, use:
ftn -W1,--defsym,_RCHK=_RCHK77 *.f ftn -W1,--defsym,_RCHK=_RCHK90 *.f
On CLE systems with formatted WRITE, use:
ftn -W1,--defsym,_WCHK=_WCHK77 *.f ftn -W1,--defsym,_WCHK=_WCHK90 *.f
On CLE systems with both formatted READ and WRITE, use:
ftn -W1,--defsym,_WCHK=_WCHK77 -W1,--defsym,_RCHK=_RCHK77 *.f ftn -W1,--defsym,_WCHK=_WCHK90 -W1,--defsym,_RCHK=_RCHK90 *.f
\pagebreak
HPE Cray Fortran Language Extensions
The HPE Cray Fortran Compiler supports extended features beyond those specified by the current standard. Some of these extensions are widely implemented in other compilers and likely to become standard features in the future, while others are unique and specific to HPE Cray systems. The implementation of any extension may change in order to conform to future language standards.
The listings provided by the compiler identify language extensions when the -e n
command line option is specified.
128-Bit Precision
The Fortran Compiler supports 128-bit floating point and 256-bit complex predefined types using the X86-64 ABI definitions for type names and data layout. These types are sometimes referred to as “quad-precision”. In Fortran, use real(kind=16) and complex(kind=16) to declare variables of these types. In C and C++, use __float128, and __float128 complex.
Fortran and C forms of intrinsic math functions (for example, QSIN
, QCOS
, QTAN
, QSQRT
, sinq
, cosq
, tanq
) offer full support for quad-precision types. See the intro_quad_precision(3i)
man page for a complete list of intrinsic functions that support quad-precision.
The base type itself uses 128 bits of storage with a guaranteed minimum alignment on a 128-bit boundary, little endian, has a 15-bit exponent, a 113-bit mantissa, and an exponent bias of 16383, and is compatible with the gcc implementation.
Characters, Lexical Tokens, and Source Form
Characters Allowed in Names
Variables, named constants, program units, common blocks, procedures, arguments, constructs, derived types (types for structures), namelist groups, structure components, dummy arguments, and function results are among the elements in a program that have a name. As extensions, the Cray Fortran compiler permits the following characters in names:
alphanumeric_character |
currency_symbol |
---|---|
currency_symbol |
$ |
A name must begin with a letter and can consist of letters, digits, and underscores. The Cray Fortran compiler permits use of the dollar sign ($) in a name, but it cannot be the first character of a name.
Cray does not recommend using $ in user names because it can cause conflicts with the names of internal variables or library routines.
Switch Source Forms
The Cray Fortran compiler allows switching between fixed and free source forms within a source or include file by using the FIXED and FREE compiler directives.
Continuation Line Limit
The Cray Fortran compiler allows a statement to have an unlimited number of continuation lines. In free source form, the Cray Fortran compiler allows a statement to have an unlimited number of continuation lines.
Statement and Line length
In free source form, the Cray Fortran compiler allows up to 10,000 characters per line, with a total statement length up to 1,000,000 characters.
D Lines in Fixed Source Form
The Cray Fortran compiler allows a D or d character to occur in column one in fixed source form. Typically, the compiler treats a line with a D or d character in column one as a comment line. When the -e d
command line option is in effect, however, the compiler replaces the D or d character with a blank and treats the rest of the line as a source statement. This can be used, for example, for debugging purposes if the rest of the line contains a PRINT statement.
This functionality is controlled through the -e d
and -d d
options on the compiler command line. For more information about these options, see the ftn(1)
man page.
Types
The Cray Fortran compiler supports the following additional data types. This preserves compatibility with other vendor’s systems.
Cray pointer
Cray character pointer
Boolean (or typeless)
The Cray Fortran compiler also supports the TYPEALIAS statement as a means of creating alternate names for existing types and supports an expanded form of the ENUM statement.
Alternate Form of LOGICAL Constants
The Cray Fortran compiler no longer accepts .T. and .F. as alternate forms of .true. and .false., respectively.
Cray Pointer Type
The Cray POINTER statement declares one variable to be a Cray pointer (that is, to have the Cray pointer data type) and another variable to be its pointee. The value of the Cray pointer is the address of the pointee. This POINTER statement has the following format:
POINTER (pointer_name, pointee_name (array_spec) )
, (pointer_name, pointee_name (array_spec) ) ...
pointer_name
Pointer to the corresponding pointee_name. pointer_name contains the address of pointee_name. Only a scalar variable can be declared type Cray pointer; constants, arrays, coarrays, statement functions, and external functions cannot.
pointee_name
Pointee of corresponding pointer_name. Must be a variable name, array declarator, or array name. The value of pointer_name is used as the address for any reference to pointee_name; therefore, pointee_name is not assigned storage. If pointee_name is an array declarator, it can be explicit-shape (with either constant or nonconstant bounds) or assumed-size.
array_spec
If present, this must be either an explicit_shape_spec_list (with either constant or nonconstant bounds) or an assumed_size_spec. A codimension used to indicate a coarray may not appear in array_spec.
Fortran pointers are declared as follows:
POINTER :: object-name-list
Cray Fortran pointers and Fortran standard pointers cannot be mixed.
Example:
POINTER(P,B),(Q,C)
This statement declares Cray pointer P and its pointee B, and Cray pointer Q and pointee C; the pointer’s current value is used as the address of the pointee whenever the pointee is referenced.
An array that is named as a pointee in a Cray POINTER statement is a pointee array. Its array declarator can appear in a separate type or DIMENSION statement or in the pointer list itself. In a subprogram, the dimension declarator can contain references to variables in a common block or to dummy arguments. As with nonconstant bound array arguments to subprograms, the size of each dimension is evaluated on entrance to the subprogram, not when the pointee is referenced. For example:
POINTER(IX, X(N,0:M))
In addition, pointees must not be deferred-shape or assumed-shape arrays. An assumed-size pointee array is not allowed in a main program unit.
Pointers can be used to access user-managed storage by dynamically associating variables and arrays to particular locations in a block of storage. Cray pointers do not provide convenient manipulation of linked lists because, for optimization purposes, it is assumed that no two pointers have the same value. Cray pointers also allow the accessing of absolute memory locations.
The range of a Cray pointer or Cray character pointer depends on the size of memory for the machine in use.
Restrictions on Cray pointers are as follows:
A Cray pointer variable should only be used to alias memory locations by using the LOC intrinsic.
A Cray pointer cannot be pointed to by another Cray or Fortran pointer; that is, a Cray pointer cannot also be a pointee or a target.
A Cray pointer cannot appear in a PARAMETER statement or in a type declaration statement that includes the PARAMETER attribute.
A Cray pointer variable cannot be declared to be of any other data type.
A Cray character pointer cannot appear in a DATA statement.
An array of Cray pointers is not allowed.
A Cray pointer cannot be a component of a structure.
Restrictions on Cray pointees are as follows:
A Cray pointee cannot appear in a SAVE, STATIC, DATA, EQUIVALENCE, COMMON, AUTOMATIC, or PARAMETER statement or Fortran pointer statement.
A Cray pointee cannot be a dummy argument; that is, it cannot appear in a FUNCTION, SUBROUTINE, or ENTRY statement.
A function value cannot be a Cray pointee.
A Cray pointee cannot be a structure component.
An equivalence object cannot be a Cray pointee.
Cray pointees can be of type character, but their Cray pointers are different from other Cray pointers; the two kinds cannot be mixed in the same expression.
The Cray pointer is a variable of type Cray pointer and can appear in a COMMON list or be a dummy argument in a subprogram.
The Cray pointee does not have an address until the value of the Cray pointer is defined; the pointee is stored starting at the location specified by the pointer. Any change in the value of a Cray pointer causes subsequent references to the corresponding pointee to refer to the new location.
Cray pointers can be assigned values in the following ways:
A Cray pointer can be set as an absolute address. For example:
Q = 0
Cray pointers can have integer expressions added to or subtracted from them and can be assigned to or from integer variables. For example:
P = Q + 100
However, Cray pointers are not integers. For example, assigning a Cray pointer to a real variable is not allowed.
The (nonstandard)
LOC
intrinsic function generates the address of a variable and can be used to define a Cray pointer, as follows:P = LOC(X)
The following example uses Cray pointers in the ways just described:
SUBROUTINE SUB(N) INTEGER WORDS COMMON POOL(100000), WORDS(1000) INTEGER BLK(128), WORD64 REAL A(1000), B(N), C(100000-N-1000) POINTER(PBLK,BLK), (IA,A), (IB,B), & (IC,C), (ADDRESS,WORD64) ADDRESS = LOC(WORDS) + 64*KIND(WORDS) PBLK = LOC(WORDS) IA = LOC(POOL) IB = IA + 1000*KIND(POOL) IC = IB + N*KIND(POOL)
BLK is an array that is another name for the first 128 words of array WORDS. A is an array of length 1000; it is another name for the first 1000 elements of POOL. B follows A and is of length N. C follows B. A, B, and C are associated with POOL. WORD64 is the same as BLK(65) because BLK(1) is at the initial address of WORDS.
If a pointee is of a noncharacter data type that is one machine word or longer, the address stored in a pointer is a word address. If the pointee is of type character or of a data type that is less than one word, the address is a byte address. The following example also uses Cray pointers:
PROGRAM TEST REAL X(*), Y(*), Z(*), A(10) POINTER (P_X,X) POINTER (P_Y,Y) POINTER (P_Z,Z) INTEGER*8 I,J !USE LOC INTRINSIC TO SET POINTER MEMORY LOCATIONS !*** RECOMMENDED USAGE, AS PORTABLE CRAY POINTERS *** P_X = LOC(A(1)) P_Y = LOC(A(2)) !USE POINTER ARITHMETIC TO DEMONSTRATE COMPILER AND COMPILER !FLAG DIFFERENCES !*** USAGE NOT RECOMMENDED, HIGHLY NON-PORTABLE *** P_Z = P_X + 1 I = P_Y J = P_Z IF ( I .EQ. J ) THEN PRINT *, 'NOT A BYTE-ADDRESSABLE MACHINE' ELSE PRINT *, 'BYTE-ADDRESSABLE MACHINE' ENDIF END
On Cray systems, this prints the following:
Byte-addressable machine
Cray does not recommend the use of pointer arithmetic because it is not portable.
For purposes of optimization, the compiler assumes that the storage of a pointee is never overlaid on the storage of another variable; that is, it assumes that a pointee is not associated with another variable or array. This kind of association occurs when a Cray pointer has two pointees, or when two Cray pointers are given the same value. Although these practices are sometimes used deliberately (such as for equivalencing arrays), results can differ depending on whether optimization is turned on or off. The code developer is responsible for preventing such association. For example:
POINTER(P,B), (P,C) REAL X, B, C P = LOC(X) B = 1.0 C = 2.0 PRINT *, B
Because B and C have the same pointer, the assignment of 2.0 to C gives the same value to B; therefore, B will print as 2.0 even though it was assigned 1.0.
As with a variable in common storage, a pointee, pointer, or argument to a
LOC
intrinsic function is stored in memory before a call to an external procedure and is read out of memory at its next reference. The variable is also stored before a RETURN or END statement of a subprogram.
Cray Character Pointer Type
If a pointee is declared as a character type, its Cray pointer is a Cray character pointer.
Restrictions for Cray pointers also apply to Cray character pointers. In addition, the following restrictions apply:
When included in an I/O statement iolist, a Cray character pointer is treated as an integer.
If the length of the pointee is explicitly declared (that is, not of an assumed length), any reference to that pointee uses the explicitly declared length.
If a pointee is declared with an assumed length (that is, as CHARACTER(*)), the length of the pointee comes from the associated Cray character pointer.
A Cray character pointer can be used in a relational operation only with another Cray character pointer. Such an operation applies only to the character address and bit offset; the length field is not used.
Boolean Type
A Boolean constant represents the literal constant of a single storage unit. There are no Boolean variables or arrays, and there is no Boolean type statement. Binary, octal, and hexadecimal constants are used to represent Boolean values. For more information about Boolean expressions, see Expressions and Assignment.
Alternate Form of ENUM Statement
An enumeration defines the name of a group of related values and the name of each value within the group. The Cray Fortran compiler allows the following additional form for enum_def (enumerations):
|
||
---|---|---|
enum_def_stmt |
is |
ENUM, ,BIND(C) :: type_alias_name |
or |
ENUM kind_selector :: type_alias_name |
kind_selector. If it is not specified, the compiler uses the default integer kind.
type_alias_name is the name to assign to the group. This name is treated as a type alias name.
TYPEALIAS Statement
A TYPEALIAS statement allows another name to be defined for an intrinsic data type or user-defined data type. Thus, the type alias and the type specification it aliases are interchangeable. Type aliases do not define a new type.
This is the form for type aliases:
TYPEALIAS forms |
||
---|---|---|
type_alias_stmt |
is |
TYPEALIAS :: type_alias_list |
type_alias |
is |
type_alias_name => type_spec |
This example shows how a type alias can define another name for an intrinsic type, a user-defined type, and another type alias:
TYPEALIAS :: INTEGER_64 => INTEGER(KIND = 8), &
TYPE_ALIAS => TYPE(USER_DERIVED_TYPE), &
ALIAS_OF_TYPE_ALIAS => TYPE(TYPE_ALIAS)
INTEGER(KIND = 8) :: I
TYPE(INTEGER_64) :: X, Y
TYPE(TYPE_ALIAS) :: S
TYPE(ALIAS_OF_TYPE_ALIAS) :: T
A type alias or the data type it aliases can be used interchangeably. That is, explicit or implicit declarations that use a type alias have the same effect as if the data type being aliased was used. For example, the above declarations of I, X, and Y are the same. Also, S and T are the same.
If the type being aliased is a derived type, the type alias name can be used to declare a structure constructor for the type.
The following are allowed as the type_spec in a TYPEALIAS statement:
Any intrinsic type defined by the Cray Fortran compiler.
Any type alias in the same scoping unit.
Any derived type in the same scoping unit.
Data Object Declarations and Specifications
The Cray Fortran compiler accepts the following extensions to declarations. The maximum rank is equal to 31. The standard requires a maximum rank of 15.
BOZ Constraints in DATA Statements
The Cray Fortran compiler permits a default real object to be initialized with a BOZ, typeless, or character (used as Hollerith) constant in a DATA statement. BOZ constants are formatted in binary, octal, or hexadecimal. No conversion of the BOZ value, typeless value, or character constant takes place.
The Cray Fortran compiler permits an integer object to be initialized with a BOZ, typeless, or character (used as Hollerith) constant in a type declaration statement. The Cray Fortran compiler also allows an integer object to be initialized with a typeless or character (used as Hollerith) constant in a DATA statement.
If the last item in the data_object_list is an array name, the value list can contain fewer values than the number of elements in the array. Any element that is not assigned a value is undefined.
The following alternate forms of BOZ constants are supported:
literal-constant |
is |
typeless-constant |
typeless-constant |
is |
octal-typeless-constant |
octal-typeless-constant |
is |
digit digit… B |
or |
” digit digit… “O |
|
or |
‘digit digit… ‘O |
|
hexadecimal-typeless-constant |
is |
X’ hex-digit hex-digit… ‘ |
or |
X” hex-digit hex-digit… “ |
|
or |
‘ hex-digit hex-digit… ‘X |
|
or |
” hex-digit hex-digit… “X |
AUTOMATIC Attribute and Statement
The Cray Fortran AUTOMATIC attribute specifies stack-based storage for a variable or array. Such variables and arrays are undefined upon entering and exiting the procedure. The following is the format for the AUTOMATIC specification:
type, AUTOMATIC , attribute-list [::] entity-list
automatic-stmt |
is |
AUTOMATIC [::]entity-list |
entity-list
For entity-list, specify a variable name or an array declarator. If an entity-list item is an array, it must be declared with an explicit-shape-spec with constant bounds. If an entity-list item is a pointer, it must be declared with a deferred-shape-spec.
If an entity-list item has the same name as the function in which it is declared, the entity-list item must be scalar and of type integer, real, logical, complex, or double precision.
If the entity-list item is a pointer, the AUTOMATIC attribute applies to the pointer itself and not to any target that may become associated with the pointer.
Subject to the rules governing combinations of attributes, attribute-list can contain the following:
DIMENSION
TARGET
POINTER
VOLATILE
The following entities cannot have the AUTOMATIC attribute:
Pointers or arrays used as function results
Dummy arguments
Statement functions
Automatic array or character data objects
An entity-list item cannot have the following characteristics:
It cannot be defined in the scoping unit of a module.
It cannot be a common block item.
It cannot be specified more than once within the same scoping unit.
It cannot be initialized with a DATA statement or with a type declaration statement.
It cannot also have the SAVE or STATIC attribute.
It cannot be specified as a Cray pointee.
IMPLICIT Statement
Implicit Extensions
The Cray Fortran compiler accepts the IMPLICIT AUTOMATIC or IMPLICIT STATIC syntax. It is recommended that none of the IMPLICIT extensions be used in new code.
Storage Association of Data Objects
EQUIVALENCE Statement Extensions
The Cray Fortran compiler allows equivalencing of character data with noncharacter data. The Fortran standard does not address this. It is recommended that equivalencing is not performed in this manner, however, because alignment and padding differs across platforms, thus rendering the code less portable.
COMMON Statement Extensions
The Cray Fortran compiler treats named common blocks and blank common blocks identically, as follows:
Variables in blank common and variables in named common blocks can be initialized.
Named common blocks and blank common are always saved.
Named common blocks of the same name and blank common can be of different sizes in different scoping units.
Expressions and Assignment
Expressions
In Fortran, calculations are specified by writing expressions. Expressions look much like algebraic formulas in mathematics, particularly when the expressions involve calculations on numerical values.
Expressions often involve nonnumeric values, such as character strings, logical values, or structures; these also can be considered to be formulas that involve nonnumeric quantities rather than numeric ones.
Rules for Forming Expressions
The Cray Fortran compiler supports exclusive disjunct expressions of the form:
|
|
|
|
||
|
Intrinsic and Defined Operations
Cray supports the following intrinsic operators as extensions:
|
|
|
---|---|---|
|
|
|
|
|
|
The Cray Fortran less than or greater than intrinsic operation is represented by the <> operator and the .LG. keyword. This operation is suggested by the IEEE standard for floating-point arithmetic, and the Cray Fortran compiler supports this operator. Only values of type real can appear on either side of the <> or .LG. operators. If the operands are not of the same kind type value, the compiler converts them to equivalent kind types. The <> and .LG. operators perform a less-than-or-greater-than operation as specified in the IEEE standard for floating-point arithmetic.
The Cray Fortran compiler no longer allows abbreviations for logical and masking operators. The abbreviations .A., .O., .N., and .X. are no longer synonyms for .AND., .OR., .NOT., and .XOR., respectively. This change does not affect user-defined operators and operator overloads; therefore, users can create user-defined operators to behave as shorthand operators.
The masking of Boolean operators and their abbreviations, which are extensions to Fortran, can be redefined as defined operators. If a masking operator is redefined, the definition overrides the intrinsic masking operator definition. See Bitwise Logical Expressions for a list of the operators.
Intrinsic Operations
In the following table, the symbols I, R, Z, C, L, B, and P stand for the types integer, real, complex, character, logical, Boolean, and Cray pointer, respectively. Where more than one type for x2 is given, the type of the result of the operation is given in the same relative position in the next column. Boolean and Cray pointer types are extensions of the Fortran standard.
Intrinsic operator |
Type of x1 |
Type of x2 |
Type of result |
---|---|---|---|
Unary +, - |
I, R, Z, B, P |
I, R, Z, I, P |
|
Binary +, -, *, /, ** |
I |
I, R, Z, B, P |
I, R, Z, I, P |
R |
I, R, Z, B |
R, R, Z, R |
|
Z |
I, R, Z |
Z, Z, Z |
|
B |
I, R, B, P |
I, R, B, P |
|
P |
I, B, P |
P, P, P |
|
(For Cray pointer, only + and - are allowed.) |
|||
// |
C |
C |
C |
.EQ., ==, .NE., /= |
I |
I, R, Z, B, P |
L, L, L, L, L |
R |
I, R, Z, B, P |
L, L, L, L, L |
|
Z |
I, R, Z, B, P |
L, L, L, L, L |
|
B |
I, R, Z, B, P |
L, L, L, L, L |
|
P |
I, R, Z, B, P |
L, L, L, L, L |
|
C |
C |
L |
|
.GT., >, .GE., >=, .LT., <, .LE., <= |
I |
I, R, B, P |
L, L, L, L |
R |
I, R, B |
L, L, L |
|
C |
C |
L |
|
P |
I, P |
L, L |
|
.LG., <> |
R |
R |
L |
.NOT. |
L |
L |
|
I, R, B |
B |
||
.AND., .OR., .EQV., .NEQV., .XOR. |
L |
L |
L |
I, R, B |
I, R, B |
B |
Arithmetic operator (the binary +, -, *, /, and **) followed by Unary operator (+ or -, before the second operand) is allowed. This is an extension to the Fortran standard.
The operators .NOT., .AND., .OR., .EQV., and .XOR. can also be used in the Cray Fortran compiler’s bitwise masking expressions; these are extensions to the Fortran standard. The result is Boolean (or typeless) and has no kind type parameters.
Bitwise Logical Expressions
A bitwise logical expression (also called a masking expression) is an expression in which a logical operator operates on individual bits within integer, real, Cray pointer, or Boolean operands, giving a result of type Boolean. Each operand is treated as a single storage unit. The result is a single storage unit, which is either 32 or 64 bits depending on the -s
option specified during compilation. Boolean values and bitwise logical expressions use the same operators but are different from logical values and expressions.
Operator category |
Intrinsic operator |
Operand types |
---|---|---|
Bitwise masking (Boolean) expressions |
.NOT., .AND., .OR., .XOR., .EQV., .NEQV. |
Integer, real, typeless, or Cray pointer. |
Bitwise logical operators can also be written as functions; for example A .AND. B can be written as IAND(A,B) and .NOT. A can be written as NOT(A).
x1 x2 |
Integer |
Real |
Boolean |
Pointer |
Logical |
Character |
---|---|---|---|---|---|---|
Integer |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Not valid |
Not valid** |
Real |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Not valid |
Not valid** |
Boolean |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Not valid |
Not valid** |
Pointer |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Masking, operation, Boolean result. |
Not valid |
Not valid** |
Logical |
Not valid** |
Not valid** |
Not valid** |
Not valid** |
Logical operation, logical results |
Not valid** |
Character |
Not valid** |
Not valid** |
Not valid** |
Not valid** |
Not valid |
Not valid** |
x1 and x2 represent operands for a logical or bitwise expression, using operators .NOT., .AND., .OR., .XOR., .NEQV., and .EQV..
** Indicates that if the operand is a character operand of 32 or fewer characters, the operand is treated as a Hollerith constant and is allowed.
Bitwise logical expressions can be combined with expressions of Boolean or other types by using arithmetic, relational, and logical operators. Evaluation of an arithmetic or relational operator processes a bitwise logical expression with no type conversion. Boolean data is never automatically converted to another type.
A bitwise logical expression performs the indicated logical operation separately on each bit. The interpretation of individual bits in bitwise multiplication-exprs, summation-exprs, and general expressions is the same as for logical expressions. The results of binary 1 and 0 correspond to the logical results TRUE and FALSE, respectively, in each of the bit positions. These values are summarized as follows:
.NOT. 1100 1100 1100 1100 1100
=0011 .AND. 1010 .OR. 1010 .XOR. 1010 .EQV. 1010
---- ---- ---- ----
1000 1110 0110 1001
Assignment
The Cray Fortran compiler supports Boolean and Cray pointer intrinsic assignments. The Cray Fortran compiler supports type Boolean or BOZ constants in assignment statements in which the variable is of type integer or real. The bits specified by the constant are moved into the variable with no type conversion.
Array Reference
The Cray Fortran compiler allows arrays to be referenced with fewer than the declared number of dimensions. The subscripts specified in the array reference are used for the leftmost dimensions, and the lower bounds are used for the rightmost subscripts that were omitted. This extension to the Fortran standard applies to both arrays and coarrays.
When the option to note deviations from the Fortran standard is in effect (-en), this type of an array reference will cause compilation messages.
Input/Output Statements
The Fortran standard does not specifically describe the implementation of I/O processing. This section provides information about processor-dependent areas and the implementation of the support for I/O.
File Connection
OPEN Statement
The OPEN statement specifies the connection properties between the file and the unit. The Values for Keyword Specifier Variables in an OPEN Statement table indicates the keyword specifiers in an OPEN statement that are Cray Fortran compiler extensions.
Specifier |
Possible Values |
Default Value |
---|---|---|
FORM |
SYSTEM |
Unformatted with no records marks |
CONVERT |
LITTLE_ENDIAN, BIG_ENDIAN, CRAY, NATIVE |
NATIVE |
The FORM specifier has the following format:FORM= scalar-char-expr
A file opened with FORM=”SYSTEM” is unformatted and has no record marks.
The CONVERT specifier converts unformatted data between BIG- and LITTLE-ENDIAN representation. Overrides any numeric conversion specified via assign or by compilation option.
The CONVERT specifier has the following format:CONVERT=”format-specifier” format-specifier describes the format of the file being opened and it only applies for that single file. It may be one of the following strings:
LITTLE_ENDIAN
Specifies little endian integer data and IEEE floating-point data. Has no effect except to override any numeric conversion specified via assign statement or by compilation option.
BIG_ENDIAN
Specifies big endian integer data and IEEE floating-point data. This has the same effect as specifying
-hbyteswapio
on the compilation, but it applies on a per file basis. The assign -Nswap_endian f:filename command also converts the named file to BIG_ENDIAN format.CRAY
Indicates BIG_ENDIAN integer data and Cray floating point data of size REAL(8) or COMPLEX(8). It has the same effect as the
assign
command: assign -Ncray f:filename.NATIVE
Default. Same effect as “LITTLE_ENDIAN”.
Error, End-of-record, and End-of-file Conditions
End-of-file Condition and the END-specifier
Multiple End-of-file Records
The file position prior to data transfer depends on the method of access: sequential or direct. Although the Fortran standard does not allow files that contain an end-of-file to be positioned after the end-of-file prior to data transfer, the Cray Fortran compiler permits more than one end-of-file for some file structures.
Input/Output Editing
Data Edit Descriptors
Integer Editing
The Cray Fortran compiler allows w to be zero for the G edit descriptor, and it permits w to be omitted for the I, B, O, Z, or G edit descriptors.
The Cray Fortran compiler allows signed binary, octal, or hexadecimal values as input.
If the minimum digits (m) field is specified, the default field width is increased, if necessary, to allow for that minimum width.
Real Editing
The Cray Fortran compiler allows the use of B, O, and Z edit descriptors of REAL data items. The Cray Fortran compiler accepts the Dw.dEe edit descriptor.
The Cray Fortran compiler accepts the ZERO_WIDTH_PRECISION environment variable, which can be used to modify the default size of the width w field. This environment variable is examined only upon program startup. Changing the value of the environment variable during program execution has no effect. For more information about the ZERO_WIDTH_PRECISION environment, see ZERO_WIDTH_PRECISION.
The Cray Fortran compiler allows w to be zero or omitted for the D, E, EN, ES, or G edit descriptors.
The Cray Fortran compiler does not restrict the use of Ew.d and Dw.d to an exponent less than or equal to 999. The Ew.dEe form must be used.
Default Fractional and Exponent Digits
Data Size and Representation |
w |
d |
e |
---|---|---|---|
4-byte (32-bit) IEEE |
17 |
9 |
2 |
8-byte (64-bit) IEEE |
26 |
17 |
3 |
Logical Editing
The Cray Fortran compiler allows w to be zero or omitted on the L or G edit descriptors.
Character Editing
The Cray Fortran compiler allows w to be zero or omitted on the G edit descriptor.
Q Control Edit Descriptor
The Cray Fortran supports the Q edit descriptor. The Q edit descriptor is used to determine the number of characters remaining in the input record. It has the following format:Q
When a Q edit descriptor is encountered during execution of an input statement, the corresponding input list item must be of type integer. Interpretation of the Q edit descriptor causes the input list item to be defined with a value that represents the number of characters remaining to be read in the formatted record.
For example, if c is the character position within the current record of the next character to be read, and the record consists of n characters, then the item is defined with the following value MAX(n-c+1,0).
If no characters have yet been read, then the item is defined as n (the length of the record). If all the characters of the record have been read (c>n), then the item is defined as zero.
The Q edit descriptor must not be encountered during the execution of an output statement.
The following example code uses Q on input:
INTEGER N
CHARACTER LINE * 80
READ (*, FMT='(Q,A)') N, LINE(1:N)
List-directed Formatting
Input values are generally accepted as list-directed input if they are the same as those required for explicit formatting with an edit descriptor. The exceptions are as follows:
When the data list item is of type integer, the constant must be of a form suitable for the I edit descriptor. The Cray Fortran compiler permits binary, octal, and hexadecimal based values in a list-directed input record to correspond to I edit descriptors.
Namelist Formatting Extensions
The Cray Fortran compiler has extended the namelist feature. The following additional rules govern namelist processing:
An ampersand (&) or dollar sign ($) can precede the namelist group name or terminate namelist group input. If an ampersand precedes the namelist group name, either the slash (/) or the ampersand must terminate the namelist group input. If the dollar sign precedes the namelist group name, either the slash or the dollar sign must terminate the namelist group input.
Octal and hexadecimal constants are allowed as input to integer and single-precision real namelist group items. An error is generated if octal and hexadecimal constants are specified as input to character, complex, or double-precision real namelist group items.
Octal constants must be of the following form:
O”123”
O’123’
o”123”
o’123’
Hexadecimal constants must be of the following form:
Z”1a3”
Z’1a3’
z”1a3”
z’1a3’
I/O Editing
Usually, data is stored in memory as the values of variables in some binary form. On the other hand, formatted data records in a file consist of characters. Thus, when data is read from a formatted record, it must be converted from characters to the internal representation. When data is written to a formatted record, it must be converted from the internal representation into a string of characters.
The tables below list the control and data edit descriptor extensions supported by the Cray Fortran compiler and provide a brief description of each.
Summary of Control Edit Descriptors
Descriptor |
Description |
---|---|
$ or |
Suppress carriage control |
Summary of Data Edit Descriptors
Descriptor |
Description |
---|---|
Q |
Return number of characters left in record |
The following tables show the use of the Cray Fortran compiler’s edit descriptors with all intrinsic data types. In these tables:
NA indicates invalid usage that is not allowed.
I,O indicates that usage is allowed for both input and output.
I indicates legal usage for input only.
NA indicates invalid usage that is not allowed.
I,O indicates that usage is allowed for both input and output.
I indicates legal usage for input only.
Default Compatibility Between I/O List Data Types and Data Edit Descriptors
Data types |
Q |
Z |
R |
O |
L |
I |
G |
F |
ES |
EN |
E |
D |
B |
A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int |
I |
I,O |
I,O |
I,O |
NA |
I,O |
I,O |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
Real |
NA |
I,O |
I,O |
I,O |
NA |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
Comp |
NA |
I,O |
I,O |
I,O |
NA |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
Log |
NA |
I,O |
I,O |
I,O |
I,O |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
Char |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
The table below, RELAXED Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to RELAXED. Not all data edit descriptors support all data sizes; for example, a 16-byte real variable with an I edit descriptor cannot be read/write.
RELAXED Compatibility Between Data Types and Data Edit Descriptors
Data types |
Q |
Z |
R |
O |
L |
I |
G |
F |
ES |
EN |
E |
D |
B |
A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int |
I |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
NA |
I,O |
I,O |
Real |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
Comp |
NA |
I,O |
I,O |
I,O |
NA |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
Log |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
NA |
I,O |
I,O |
Char |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
STRICT77 Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to STRICT77.
STRICT77 Compatibility Between Data Types and Data Edit Descriptors
Data types |
Q |
Z |
R |
O |
L |
I |
G |
F |
ES |
EN |
E |
D |
B |
A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int |
NA |
I,O |
NA |
I,O |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
NA |
Real |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
NA |
NA |
I,O |
I,O |
NA |
NA |
Comp |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
NA |
NA |
I,O |
I,O |
NA |
NA |
Log |
NA |
NA |
NA |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
Char |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
STRICT90 and STRICT95 Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to STRICT90 or STRICT95.
STRICT90 or STRICT95 Compatibility Between Data Types and Data Edit Descriptors
Data types |
Q |
Z |
R |
O |
L |
I |
G |
F |
ES |
EN |
E |
D |
B |
A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int |
NA |
I,O |
NA |
I,O |
NA |
I,O |
I,O |
NA |
NA |
NA |
NA |
NA |
I,O |
NA |
Real |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
NA |
NA |
Com |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
I,O |
I,O |
I,O |
I,O |
I,O |
NA |
NA |
Log |
NA |
NA |
NA |
NA |
I,O |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
NA |
Char |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
NA |
NA |
NA |
NA |
NA |
NA |
I,O |
Program Units
Main Program
Program Statement Extension
The HPE Cray Fortran compiler supports the use of a parenthesized list of args at the end of a program statement. The compiler ignores any args specified after program-name.
Block Data Program Units
Block Data Program Unit Extension
The HPE Cray Fortran compiler permits named common blocks to appear in more than one block data program unit.
Procedures
Procedure Interface
Interface Duplication
The HPE Cray Fortran compiler allows specification of an interface body for the program unit being compiled if the interface body matches the program unit definition.
Procedure Definition
Recursive Function Extension
The HPE Cray Fortran compiler allows direct recursion for functions that do not specify a RESULT clause on the FUNCTION statement.
Empty CONTAINS Sections
The HPE Cray Fortran compiler allows a CONTAINS statement with no internal or module procedure following.
Intrinsic Procedures and Modules
Intrinsic Procedures
The HPE Cray Fortran compiler has implemented intrinsic procedures in addition to the ones required by the standard. These procedures have the status of intrinsic procedures, but programs that use them may not be portable. It is recommended that such procedures be declared INTRINSIC to allow other processors to diagnose whether or not they are intrinsic for those processors.
The nonstandard intrinsic procedures supported by the HPE Cray Fortran compiler are summarized in the following table. For more information about a particular procedure, see its man page.
Procedure |
Description |
---|---|
ACOSD |
Arccosine, value in degrees |
AMO_AADD |
Atomic memory add |
AMO_AADDF |
Atomic memory add, return new |
AMO_AFADD |
Atomic memory add, return old |
AMO_AAX |
Atomic memory AND and XOR |
AMO_AFAX |
Atomic memory AND and XOR, return old |
AMO_AANDF |
Atomic memory AND, return new |
AMO_AFAND |
Atomic memory AND, return old |
AMO_ANANDF |
Atomic memory NAND, return new |
AMO_AFNAND |
Atomic memory NAND, return old |
AMO_AORF |
Atomic memory OR, return new |
AMO_AFOR |
Atomic memory OR, return old |
AMO_AXORF |
Atomic memory XOR, return new |
AMO_AFXOR |
Atomic memory XOR, return old |
AMO_ACSWAP |
Atomic memory swap, return old |
AMO_ASWAP |
Atomic memory swap, return new |
AMO_AFLUSH |
Atomic memory flush forces |
ASIND |
Arcsine, value in degrees |
ATAND |
Arctangent, value in degrees |
ATAND2 |
Arctangent, value in degrees |
CO_BCAST |
Broadcast a coarray to all images in an application. |
CO_SUM |
Sum of corresponding elements on all images in a coarray application |
CO_MIN, CO_MAX |
Maximum or minimum value of corresponding elements on all images in a coarray application |
COSD |
Cosine, argument in degrees |
COT |
Cotangent |
EXIT |
Program termination |
FREE |
Free Cray pointee memory |
GET_BORROW_S@ |
Get scalar borrow bit |
GSYNC |
Complete outstanding memory references |
IBCHNG |
Reverse bit within a word |
ILEN |
Length in bits of an integer |
INT_MULT_UPPER |
Upper bits of integer product |
LOC |
Address of argument |
MALLOC |
Allocate Cray pointee memory |
MASK |
Creates a bit mask in a word |
SET_BORROW_S@ |
Set scalar borrow bits |
SET_CARRY_S@ |
Set scalar carry bits |
SIND |
Sin, argument in degrees |
SIZEOF |
Size of argument in bytes |
SUB_BORROW_S@ |
Subtract scalar with borrow |
TAND |
Tangent, argument in degrees |
CO_BCAST, CO_SUM, CO_MIN, and CO_MAX are collective intrinsic subroutines, which are extensions of the Fortran 2008 standard. Support for teams is deferred. For specific information about these routines, see the co_bcast(3i)
,co_max(3i)
, co_sum(3i)
man pages.
Many intrinsic procedures have both a vector and a scalar version. If a vector version of an intrinsic procedure exists, and the intrinsic is called within a vectorizable loop, the compiler uses the vector version of the intrinsic. For information about which intrinsic procedures vectorize, see the intro_intrin(3i)
man page.
For more information about the atomic memory intrinsic procedures see the amo(3i)
man page.
Exceptions and IEEE Arithmetic
The Exceptions
The intrinsic module IEEE_EXCEPTIONS supplied with the Cray Fortran compiler contains three named constants in addition to those specified by the standard. These are of type IEEE_STATUS_TYPE and can be used as arguments to the IEEE_SET_STATUS subroutine. Their definitions correspond to common combinations of settings and allow for simple and fast changes to the IEEE mode settings. The constants are:
Name |
Effect of CALL IEEE_SET_STATUS (Name) |
---|---|
ieee_cri_nostop_mode |
Clears all currently set exception flags |
Disables halting for all exceptions |
|
Enables setting of all exception flags |
|
Sets rounding mode to round_to_nearest |
|
ieee_cri_default_mode |
Clears all currently set exception flags |
Enables halting for overflow, divide_by_zero, and invalid |
|
Disables halting for underflow and inexact |
|
Enables setting of all exception flags |
|
Sets rounding mode to round_to_nearest |
Compile and Execute Programs Containing Coarrays
There are various commands, tools, and products available in the programming environment to use for compiling and executing programs containing coarrays.
ftn
and aprun
Options Affecting Coarrays
The compiler recognizes coarray syntax by default. The -h nocaf
disables coarray syntax recognition.
Upon execution of an a.out
file that has been compiled and linked with the -h caf
option, an image is created and executed on every processing element assigned to the job. Images 1 through NUM_IMAGES
are assigned to processing elements 0 through N$PES-1, consecutively. The functions THIS_IMAGE()
and NUM_IMAGES()
may be used to retrieve the image number of the current image, or the total number of images at run time, respectively.
Set the number of processing elements assigned to a job at compile time by specifying the -X option on the ftn
command. The number of processing elements can also be set at run time by executing the a.out
file by using the aprun
command with the -n option specified. If mixed -X
values are used when compiling and linking different object files, or the number of PEs specified at run time differs from that specified when compiling and linking, a run time error will be received.
Bounds checking is performed by specifying the -Rb option on the ftn
command line. This feature is not implemented for codimensions of coarrays.
For more information about the ftn
and aprun
commands, see the ftn(1)
and aprun(1)
man pages.
Interoperate with Other Message Passing and Data Passing Models
Coarrays can interoperate with all other message and data passing models. This allows for the introduction of coarrays into existing application codes incrementally. However, while it may work in some cases, mixing language-based PGAS with SHMEM is not officially supported.
These models are implemented through procedure calls, so the language interaction between coarrays and these models is well defined.
MPI and SHMEM generally use processing element numbers, which start at zero, but the coarray model generally deals with image numbers, which start at one.
Coarrays are symmetric for the purposes of SHMEM programming. Pointers in coarrays of derived type, however, may not necessarily point to symmetric data.
For more information about the other message passing and data passing models, see the following man pages:
intro_mpi(3)
intro_shmem(3)
Optimize Programs with Coarrays
Programs containing coarrays benefit from all the usual steps taken to improve run time performance of code that runs on a single image.
\pagebreak
HPE Cray Fortran Deferred Implementation and Optional Features
ISO_10646 Character Set
The Fortran 2003 features related to supporting the ISO_10646 character set are not supported. This includes declarations, constants, and operations on variables of character(kind=4) and I/O operations. Support for this feature is optional in Fortran 2018.
Restrictions on Unlimited Polymorphic Variables
Unlimited polymorphic variables whose dynamic types are integer(1), integer(2), logical(1), or logical(2) are not supported, unless the -dh
option is specified to disable packed storage for short integers and logicals.
\pagebreak
HPE Cray Fortran Implementation Specifics
The Fortran standard specifies the rules for writing a standard conforming Fortran program. Many of the details of how such a program is compiled and executed are intentionally not specified or are explicitly specified as being processor-dependent. This chapter describes the implementation used by the HPE Cray Fortran compiler. Included are descriptions of the internal representations used for data objects and the values of processor-dependent language parameters.
Companion Processor
For the purpose of C interoperability, the Fortran standard refers to a companion processor. The companion processor for the HPE Cray Fortran compiler is the HPE Cray C compiler.
INCLUDE Line
There is no limit to the nesting level for INCLUDE lines. The character literal constant in an INCLUDE line is interpreted as the name of the file to be included. This case-sensitive name may be prefixed with additional characters based on the -I
compiler command line option.
INTEGER Kinds and Values
INTEGER kind type parameters of 1, 2, 4, and 8 are supported. The default kind type parameter is 4 unless the -sdefault64
or -sinteger64
command line option is specified, in which case the default kind type parameter is 8. The interpretation of kinds 1 and 2 depend on whether the -dh
command line option is specified. Integer values are represented as two’s complement binary values.
REAL Kinds and Values
REAL kind type parameters of 4, 8, and 16 are supported. The default kind type parameter is 4 unless the -sdefault64
or -sreal64 command line option is specified, in which case, the default kind type parameter is 8. Real values are represented in the format specified by the IEEE 754 standard, with kinds 4 and 8 corresponding to the 32 and 64 bit IEEE representations.
DOUBLE PRECISION Kinds and Values
The DOUBLE PRECISION type is an alternate specification of a REAL type. The kind type parameter of that REAL type is twice the value of the kind type parameter for default REAL unless the -sdefault64
or -sreal64
command line options are specified, in which case, the kind type parameter for DOUBLE PRECISION and default REAL are the same, and REAL constants with a D exponent are treated as if the D were an E. Note that if the -sdefault64
or -sreal64
options are specified, the compiler is not standard conforming.
LOGICAL Kinds and Values
LOGICAL kind type parameters of 1, 2, 4, and 8 are supported. The default kind type parameter is 4 unless the -sdefault64
or -sinteger64
command line option is specified, in which case, the default kind type parameter is 8. The interpretation of kinds 1 and 2 depend on whether the -dh
command line option is specified. Logical values are represented by a bit sequence in which the low order bit is set to 1 for the value .true. and to 0 for .false., and the other bits in the representation are set to 0.
CHARACTER Kinds and Values
The CHARACTER kind type parameter of 1 is supported. The default kind type parameter is 1. Character values are represented using the 8-bit ASCII character encoding.
Cray Pointers
Cray pointers are 64-bit objects.
ENUM Kind
An enumerator that specifies the BIND(C) attribute creates values with a kind type parameter of 4.
Storage Issues
This section describes how the HPE Cray Fortran compiler uses storage, including how this compiler accommodates programs that use overindexing of blank common.
Storage Units and Sequences
The size of the numeric storage units is 32 bits, unless the -sdefault64
option is specified, in which case the numeric storage unit is 64 bits. If the -sreal64
or -sinteger64
option is specified alone, or the -dp
is specified in addition to -sdefault64
or -sreal64
, the relative sizes of the storage assigned for default intrinsic types do not conform to the standard. In this case, storage sequence associations involving variables declared with default intrinsic noncharacter types may be invalid and should be avoided.
Static and Stack Storage
The HPE Cray Fortran compiler allocates variables to storage according to the following criteria:
Variables in common blocks are always allocated in the order in which they appear in COMMON statements.
Data in modules are statically allocated.
User variables that are defined or referenced in a program unit, and that also appear in SAVE or DATA statements, are allocated to static storage, but not necessarily in the order shown in the source program.
Other referenced user variables are assigned to the stack. If -ev is specified on the HPE Cray Fortran compiler command line, referenced variables are allocated to static storage. This allocation does not necessarily depend on the order in which the variables appear in the source program.
Compiler-generated variables are assigned to a register or to memory (to the stack or heap), depending on how the variable is used. Compiler-generated variables include DO-loop trip counts, dummy argument addresses, temporaries used in expression evaluation, argument lists, and variables storing adjustable dimension bounds at entries.
Automatic objects may be allocated to either the stack or to the heap, depending on how much stack space is available when the objects are allocated.
Heap or stack allocation can be used for some compiler-generated temporary data such as automatic arrays and array temporaries.
Unsaved variables may be assigned to a register by optimization and not allocated storage.
Unreferenced user variables not appearing in COMMON statements are not allocated storage.
Dynamic Memory Allocation
Many FORTRAN 77 programs contain a memory allocation scheme that expands an array in a common block located in central memory at the end of the program. This practice of expanding a blank common block or expanding a dynamic common block (sometimes referred to as overindexing) causes conflicts between user management of memory and the dynamic memory requirements of CLE libraries. It is recommended that programs are modified rather than expand blank common blocks, particularly when migrating from other environments.
The image below shows the structure of a program under the CLE operating systems in relation to expanding a blank common block. In both figures, the user area includes code, data and common blocks.
Finalization
A finalizable object in a module is not finalized in the event that there is no longer any active procedure referencing the module.
A finalizable object that is allocated via pointer allocation is not finalized in the event that it later becomes unreachable due to all pointers to that object having their pointer association status changed.
ALLOCATE Error Status
If an error occurs during the execution of an ALLOCATE statement with a stat= specifier, subsequent items in the allocation list are not allocated.
DEALLOCATE Error Status
If an error occurs during the execution of a DEALLOCATE statement with a stat= specifier, subsequent items in the deallocation list are not deallocated.
ALLOCATABLE Module Variable Status
An unsaved allocatable module variable remains allocated if it is allocated when the execution of an END or RETURN statement results in no active program unit having access to the module.
Kind of a Logical Expression
For an expression such as x1 op x2 where op is a logical intrinsic binary operator and the operands are of type logical with different kind type parameters, the kind type parameter of the result is the larger kind type parameter of the operands.
STOP Code Availability
If a STOP code is specified in a STOP statement, its value is output to stderr when the STOP statement is executed.
When the stop code is a string of digits, only the least-significant 8 bits of the integer value is used as the process exit status. When the stop code is of type character or does not appear, the value zero is the process exit status.
Stream File Record Structure Position
A formatted file written with stream access may be later read as a record file. In that case, embedded newline characters (char(10)) indicate the end of a record and the terminating newline character is not considered part of the record.
The file storage unit for a formatted stream file is a byte. The position is the ordinal byte number in the file; the first byte is position 1. Positions corresponding to newline characters (char(10)) that were inserted by the I/O library as part of record output do not correspond to positions of user-written data.
File Unit Numbers
The values of INPUT_UNIT, OUTPUT_UNIT, and ERROR_UNIT defined in the ISO_Fortran_env module are 100, 101, and 102, respectively. These three unit numbers are reserved and may not be used for other purposes. The files connected to these units are the same files used by the companion C processor for standard input (stdin), output (stdout), and error (stderr). An asterisk (*) specified as the unit for a READ statement specifies unit 100. An asterisk specified as the unit for a WRITE statement, and the unit for PRINT statements is unit 101. All positive default integer values are available for use as unit numbers.
OPEN Specifiers
If the ACTION= specifier is omitted from an OPEN statement, the default value is determined by the protections associated with the file. If both reading and writing are permitted, the default value is READWRITE.
If the ENCODING= specifier is omitted or specified as DEFAULT in an OPEN statement for a formatted file, the encoding used is ASCII.
The case of the name specified in a FILE= specifier in an OPEN statement is significant.
If the FILE= specifier is omitted, fort. is prepended to the unit number.
If the RECL= specifier is omitted from an OPEN statement for a sequential access file, the default value for the maximum record length is 32767 (2**15-1).
If the file is connected for unformatted I/O, the length is measured in 8-bit bytes.
The FORM= specifier may also be SYSTEM for unformatted files.
If the ROUND= specifier is omitted from an OPEN statement, the default value is NEAREST. Specifying a value of PROCESSOR_DEFINED is equivalent to specifying NEAREST.
If the STATUS= specifier is omitted or specified as UNKNOWN in an OPEN statement, the specification is equivalent to OLD if the file exists, otherwise, it is equivalent to NEW. If STATUS=”SCRATCH” is specified the file is placed in the directory specified by the TMPDIR environment variable. If TMPDIR is not set, or the file cannot be created in the specified directory for some other reason, the file is placed in the /tmp directory. If /tmp does not exist, or cannot be accessed, the program aborts.
FLUSH Statement
Execution of a FLUSH statement causes memory resident buffers to be flushed to the physical file. Output to the unit specified by ERROR_UNIT in the ISO_Fortran_env module is never buffered; execution of FLUSH on that unit has no effect.
Asynchronous I/O
The ASYNCHRONOUS= specifier may be set to YES to allow asynchronous I/O for a unit or file.
Asynchronous I/O is used if the FFIO layer attached to the file provides asynchronous access.
REAL I/O of an IEEE NaN
An IEEE NaN may be used as an I/O value for the F, E, D, or G edit descriptor or for list-directed or namelist I/O.
Input of an IEEE NaN
The form of NaN is an optional sign followed by the string ‘NAN’ optionally followed by a hexadecimal digit string enclosed in parentheses. The input is case insensitive. Some examples are:
NaN - quiet NaN
nAN() - quiet NaN
-nan(ffffffff) - quiet NaN
NAn(7f800001) - signalling NaN
NaN(ffc00001) - quiet NaN
NaN(ff800001) - signalling NaN
The internal value for the NaN becomes a quiet NaN if the hexadecimal string is not present or is not a valid NaN.
A ‘+’ or ‘-’ preceding the NaN on input is used as the high order bit of the corresponding READ input list item. An explicit sign overrides the sign bit from the hexadecimal string. The internal value becomes the hexadecimal string if it represents an IEEE NaN in the internal data type. Otherwise, the form of the internal value is undefined.
Output of an IEEE NaN
The form of an IEEE NaN for the F, E, D, or G edit descriptor or for list-directed or namelist output is:
If the field width w is absent, zero, or greater than (5 + 1/4 of the size of the internal value in bits), the output consists of the string ‘NaN’ followed by the hexadecimal representation of the internal value within a set of parentheses. An example of the output field is:
NaN(7fc00000)
If the field width w is at least 3 but less than (5 + 1/4 of the size of the internal value in bits), the string ‘NaN’ will be right-justified in the field with blank fill on the left.
If the field width w is 1 or 2, the field is filled with asterisks.
The output field has no ‘+’ or ‘-’; the sign is contained in the hexadecimal string.
To get the same internal value for a NaN, write it with a list-directed write statement and read it with a list-directed read statement.
To write and then read the same NaN, the field width w in D, E, F, or G must be at least the number of hexadecimal digits of the internal datum plus 5.
REAL(4): w >= 13
REAL(8): w >= 21
REAL(16): w >= 37
List-directed and NAMELIST Output Default Formats
The length of the output value in NAMELIST and list-directed output depends on the value being written. Blanks and unnecessary trailing zeroes are removed unless the -w
option to the assign
command is specified, which turns off this compression.
By default, full-precision printing is assumed unless a precision is specified by the LISTIO_PRECISION environment variable (for more information about the LISTIO_PRECISION environment variable, see LISTIO_PRECISION).
The form of list-directed and NAMELIST output can be changed by using the assign
command with one of the following options.
|
Effect |
---|---|
|
Suppress comma-delimited output; use blank spaces instead |
|
Disable compression of floating-point values |
|
Disable the repeat-count form; write as many copies of the value as needed |
|
Set all three of the above |
For example, consider this code:
integer(4), dimension(5) :: ia
real(4), dimension(5) :: ra
ia = 102
ra = 200.10
NAMELIST/TNAMEL/ia,ra
write(6,TNAMEL)
print *, ' ia=',ia
print *, ' ra=',ra
print *, iarray, rarray
end
When compiled and executed with the default settings, it produces the following output:
&TNAMEL RA = 2*200.100006, IA = 2*102
ia = 2*102
ra = 2*200.100006
2*102, 2*200.100006
However, if the FILENV environment variable is set to a file and uses the assign -U command to change the output behavior, as shown below:
% setenv FILENV ASGTMP
% assign -U on g:sf
The same code now produces the following output:
&TNAMEL RA = 200.1000 200.1000 IA = 102
102 /
ia = 102 102
ra = 200.1000 200.1000
102 102 200.1000 200.1000
For more information about the assign
command and Assign Environment, see Enhanced I/O: Using the assign Environment.
Random Number Generator
A multiplicative congruential generator with period 2**46 is used to produce the output of the RANDOM_NUMBER intrinsic subroutine. The seed array contains one 64-bit integer value.
Timing Intrinsics
A call to the SYSTEM_CLOCK intrinsic subroutine with the COUNT argument present translates into the inline instructions that directly access the hardware clock register. See the description of the -es
and -ds
command line options for information about the values returned for the count and count rate. For fine-grained timing, HPE recommends using a 64-bit COUNT argument.
The CPU_TIME subroutine obtains the value of its argument from the getrusage
system call. Its execution time is significantly longer than for the SYSTEM_CLOCK routine, but the values returned are closer to those used by system accounting utilities.
IEEE Intrinsic Modules
The IEEE intrinsics modules IEEE_EXCEPTIONS, IEEE_ARITHMETIC, and IEEE_FEATURES are supplied. Denormal numbers are not supported on HPE Cray hardware. The IEEE_SUPPORT_DENORMAL inquiry function returns .false. for all kinds of arguments.
At the start of program execution, all floating point exception traps are disabled.
\pagebreak
Enhanced I/O: Using the assign Environment
Fortran programs often need the ability to alter details of a file connection, such as device residency, an alternative file name, a file space allocation scheme or structure, or data conversion properties. These file connection details taken together comprise the assign environment, and they can be modified by using the assign
command and assign
library interface.
The assign environment can also be accessed from C/C++ by using the ffassign
library interface. For more information, see the assign(1)
, assign(3f)
, and ffassign(3c)
man pages.
Understand the assign
Environment
The assign command information is stored in the assign environment file, .assign, or in a shell environment variable. To begin using the assign environment to control a program’s I/O behavior, follow these steps.
Set the FILENV environment variable to the desired path.
set FILENV environment-file
Run the assign command to define the current assign environment.
assign arguments assign-object
For example:
assign -F cachea g:su
Run the program.
If not satisfied with the I/O performance observed during program execution, return to step 2, use the assign command to adjust the assign environment, and try again.
The assign
command passes information to Fortran open
statements and to the ffopen
routine to identify the following elements:
A list of numbers
File names
File name patterns that have attributes associated with them
The assign object is the file name, file name pattern, unit number, or type of I/O open request to which the assign environment applies. When the unit or file is opened from Fortran, the environment defined by the assign command is used to establish the properties of the connection.
Assign Objects and Open Processing
The I/O library routines apply options to a file connection for all related assign objects.
If the assign object is a unit, the application of options to the unit occurs whenever that unit is connected.
If the assign object is a file name or pattern, the application of options to the file connection occurs whenever a matching file name is opened from a Fortran program.
When any of the library I/O routines opens a file, it uses the specified assign environment options for any assign objects that apply to the open request. Any of the following assign objects or categories can apply to a given open request.
Assign-object |
Applies To |
---|---|
g:all |
All open requests |
g:su |
Open sequential unformatted |
g:du |
Open direct unformatted |
g:sf |
Open sequential formatted |
g:df |
Open direct formatted |
g_ff |
ffopen |
u:unit-number |
Open unit-number |
p:pattern |
When a file whose name matches pattern is opened. The assign environment can contain only one p:assign-object that matches the current open file. The exception is that the p:%pattern (which uses the % wildcard character) is silently ignored if a more specific pattern also matches the current file name being opened. |
f:filename |
Whenever file filename is opened. |
Options from the assign objects in these categories are collected to create the complete set of options used for any particular open. The options are collected in the listed order, with options collected later in the list of assign objects overriding those collected earlier.
assign
Command Syntax
Here is the syntax for the assign command:
assign -I -O -aactualfile -bbs -ffortstd -msetting -sft -t -ubufcnt -ysetting -Bsetting -Ccharcon -Dfildes -Fspec,specs -Nnumcon -R -Ssetting -Tsetting -Usetting -V -Wsetting -Ysetting -Zsetting assign-object
The following specifications cannot be used with any other options:assign -R assign-objectassign -V assign-object
A summary of the command options follows. For details, see the assign(1)
and intro_ffio(3f)
man pages.
Control options:
-I
Specifies an incremental use of assign. All attributes are added to the attributes already assigned to the current assign-object. This option and the -O option are mutually exclusive.
-O
Specifies a replacement use of assign. This is the default control option. All currently existing assign attributes for the current assign-object are replaced. This option and the -I option are mutually exclusive.
-R
Removes all assign attributes for assign-object. If assign-object is not specified, all currently assigned attributes for all assign-objects are removed.
-V
Views attributes for assign-object. If assign-object is not specified, all currently assigned attributes for all assign-objects are printed.
Attribute options:
-a actualfile
The file= specifier or the actual file name.
-b bs
Library buffer size in 4096-byte (512-word) blocks.
-f fortstd
Specifies the type of Fortran with which to be compatible. Used by Fortran I/O. The valid values for fortstd
are:
- `90` - Causes the Fortran file to be compatible with the current Cray Fortran compiler.
- `95` - Causes the Fortran file to be compatible with Cray Fortran 95. If this value is set, the list-directed and namelist output of a floating point will remain 0.E+0.
A file's compatibility is established when it is opened. By default, a Fortran file is compatible with the language from which an OPEN statement or implicit open caused the file to be connected.
-m setting
Special handling of a file that will be accessed concurrently by several processes or tasks. Special handling includes skipping the check that only one Fortran unit be connected to a unit, suppressing file truncation to true size by the I/O buffering routines, and ensuring that the file is not truncated by the I/O buffering routines. Enter either on or off for setting.
-s ft
File type. Enter text, cos, blocked, unblocked, u, sbin, or bin for ft. The default is text.
-t
Temporary file.
-u bufcnt
Buffer count. Specifies the number of buffers to be allocated for a file.
-y setting
Suppresses repeat counts in list-directed output. setting can be either on or off. The default setting is off.
-B setting
Activates or suppresses the passing of the O_DIRECT flag to the open(2) system call. Enter either on or off for setting. This is an important feature for I/O optimization; if this is on, it enables reads and writes directly to and from the user program buffer.
-C charcon
Character set conversion information. Enter ascii, or ebcdic for charcon. If the -C option is specified, the -F option must also be specified.
-D fildes
Specifies a connection to a standard file. Enter stdin, stdout, or stderr for fildes.
-F spec ,specs
Flexible file I/O (FFIO) specification. See the assign(1)
man page for details about allowed values for spec and for details about hardware platform support. See the intro_ffio(3f)
man page for details about specifying the FFIO layers.
-N numcon
Foreign numeric conversion specification. See the assign(1)
man page for details about allowed values for numcon and for details about hardware platform support.
-S setting
Suppresses use of a comma as a separator in list-directed output. Enter either on or off for setting. The default setting is off.
-T setting
Activates or suppresses truncation after write for sequential Fortran files. Enter either on or off for setting.
-U setting
Produces a non-UNICOS form of list-directed output. This is a global setting that sets the value for the -y, -eS, and -W options. Enter either on or off for setting. The default setting is off.
-W setting
Suppresses compressed width in list-directed output. Enter either on or off for setting. The default setting is off.
-Y setting
Skips unmatched namelist groups in a namelist input record. Enter either on or off for setting. The default setting is on.
-Z setting
Recognizes -0.0 for IEEE floating-point systems and writes the minus sign for edit-directed, list-directed, and namelist output. Enter either on or off for setting. The default setting is on.
assign-object
Specify either a file name or a unit number for assign-object. The assign
command associates the attributes with the file or unit specified. These attributes are used during the processing of Fortran open statements or during implicit file opens.
Use one of the following formats for assign-object:
f:filename
g:io-type, where io-type can be su, sf, du, df, or ff (for example, g:ff for
ffopen
(3C))p:pattern (for example, p:file%)
u:unit-number (for example, u:9)
filename
When the p:pattern form is used, the % and
_
wildcard characters can be used. The % matches any string of 0 or more characters. The_
matches any single character. The % performs like the * when doing file name matching in shells. However, the % character also matches strings of characters containing the / character.
Use the Library Routines
The assign
, asnunit
, asnfile
, and asnrm
routines can be called from a Fortran program to access and update the assign environment. The assign
routine provides an easy interface to assign processing from a Fortran program. The asnunit
and asnfile
routines assign attributes to units and files, respectively. The asnrm
routine removes all entries currently in the assign environment.
The calling sequences for library routines are as follows:
call assign (cmd, ier)
call asnunit (iunit,astring,ier)
call asnfile (fname,astring,ier)
call asnrm (ier)
Where:
cmd
Fortran character variable containing a complete assign
command in the format acceptable to the pxfsystem routine.
ier
Integer variable that is assigned the exit status on return from the library interface routine.
iunit
Integer variable or constant that contains the unit number to which attributes are assigned.
astring
Fortran character variable that contains any attribute options and option values from the assign
command. Control options -I, -O, and -R can also be passed.
fname
Character variable or constant that contains the file name to which attributes are assigned.
A status of 0 indicates normal return. A status of greater than 0 indicates a specific error status. Use the explain
command to determine the meaning of the error status.
The following calls are equivalent to the assign -s u f:file
command:
call assign('assign -s u f:file',ier)
call asnfile('file','-s u',ier)
The following call is equivalent to executing the assign -I -n 2 u:99 command:
iun = 99
call asnunit(iun,'-i -n 2',ier)
The following call is equivalent to executing the assign -R
command:
call asnrm(ier)
Tune File Connection Behavior
Use Alternative File Names
The -a option specifies the actual file name to which a connection is made. This option allows files to be created in different directories without changing the FILE= specifier on an OPEN statement.
For example, consider the following assign
command issued to open unit 1:
assign -a /tmp/mydir/tmpfile u:1
The program then opens unit 1 with any of the following statements:
WRITE(1) variable ! implicit open
OPEN(1) ! unnamed open
OPEN(1,FORM='FORMATTED') ! unnamed open
Unit 1 is connected to file /tmp/mydir/tmpfile. Without the -a attribute, unit 1 would be connected to file fort.1.
When the -a attribute is associated with a file, any Fortran open that is set to connect to the file causes a connection to the actual file name. An assign
command of the following form causes a connection to file $FILENV/joe:
assign -a $FILENV/joe ftfile
This is true when the following statement is executed in a program:
OPEN(IUN,FILE='ftfile')
If the following assign
command is issued and in effect, any Fortran INQUIRE statement whose FILE= specification is foo refers to the file named actual instead of the file named foo for purposes of the EXISTS=, OPENED=, or UNIT= specifiers:
assign -a actual f:foo
If the following assign
command is issued and in effect, the -a attribute does not affect INQUIRE statements with a UNIT= specifier:
assign -a actual ftfile
When the following OPEN statement is executed, INQUIRE(UNIT=n,NAME=fname)
returns a value of ftfile in fname, as if no assign
had occurred:
OPEN(n,file='ftfile')
The I/O library routines use only the actual file (-a) attributes from the assign environment when processing an INQUIRE statement. During an INQUIRE statement that contains a FILE= specifier, the I/O library searches the assign environment for a reference to the file name that the FILE= specifier supplies. If an assign-by-filename exists for the file name, the I/O library determines whether an actual name from the -a option is associated with the file name. If the assign-by-filename supplied an actual name, the I/O library uses that name to return values for the EXIST=, OPENED=, and UNIT= specifiers; otherwise, it uses the file name. The name returned for the NAME= specifier is the file name supplied in the FILE= specifier. The actual file name is not returned.
Specify File Structure
A file structure defines the way records are delimited and how the end-of-file is represented. The assign command supports two mutually exclusive file structure options:
To select a structure using an FFIO layer, use assign -F
To select a structure explicitly, use assign -s
Using FFIO layers is more flexible than selecting structures explicitly. FFIO allows nested file structures, buffer size specifications, and support for file structures not available through the -s option. Better I/O performance is realized by using the -F option and FFIO layers.
The remainder of this section covers the -s option.
Fortran sequential unformatted I/O uses four different file structures: f77 blocked structure, text structure, unblocked structure, and COS blocked structure. By default, the f77 blocked structure is used unless a file structure is selected at open time. If an alternative file structure is needed, the user can select a file structure by using the -s or -F option on the assign
command.
The -s and -F options are mutually exclusive. The following examples show how to use different assign
command options to select different file structures.
Structure |
|
---|---|
F77 blocked |
|
text |
|
unblocked |
|
COS blocked |
|
The following examples show how to adjust blocking:
To select an unblocked file structure for a sequential unformatted file:
IUN = 1 CALL ASNUNIT(IUN,'-s unblocked',IER) OPEN(IUN,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
The
assign -s u
command can also be used to specify the unblocked file structure for a sequential unformatted file. When this option is selected, I/O is unbuffered. Each Fortran READ or WRITE statement results in aread
orwrite
system call such as the following:CALL ASNFILE('fort.1','-s u',IER) OPEN(1,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
To assign unit 10 a COS blocked structure:
assign -s cos u:10
The full set of options allowed with the assign -s
command are as follows:
bin (not recommended)
blocked
cos
sbin
text
unblocked
Access and Form |
assign -s ft Defaults |
assign -s ft Options |
---|---|---|
Sequential unformatted, BUFFER IN and BUFFER OUT |
blocked / cos / f77 |
|
|
||
Direct unformatted |
unblocked |
|
Sequential formatted |
|
|
Direct formatted |
|
|
Unblocked File Structure
A file with an unblocked file structure contains undelimited records. Because it does not contain any record control words, it does not have record boundaries. The unblocked file structure can be specified for a file opened with either unformatted sequential access or unformatted direct access. It is the default file structure for a file opened as an unformatted direct-access file.
Do not attempt to use a BACKSPACE statement to reposition a file with an unblocked file structure. Since record boundaries do not exist, the file cannot be repositioned to a previous record.
BUFFER IN and BUFFER OUT statements can specify a file having an unbuffered and unblocked file structure. If the file is specified with assign -s u
, BUFFER IN and BUFFER OUT statements can perform asynchronous unformatted I/O.
There are several ways to use the assign
command to specify unblocked file structure. All ways result in a similar file structure but with different library buffering styles, use of truncation on a file, alignment of data, and recognition of an end-of-file record in the file. The following unblocked data file structure specifications are available:
Specification |
Structure |
---|---|
|
Library-buffered |
|
No library buffering |
|
Buffering that is compatible with standard I/O; for example, both library and system buffering |
The type of file processing for an unblocked data file structure depends on the assign -s ft
option that is declared or assumed for a Fortran file.
For more information about buffering, see Specify Buffer Behavior.
An I/O request for a file specified using the assign -s unblocked
command does not need to be a multiple of a specific number of bytes. Such a file is truncated after the last record is written to the file. Padding occurs for files specified with the assign -s bin
command and the assign -s unblocked
command. Padding usually occurs when noncharacter variables follow character variables in an unformatted direct-access file.
No padding is done in an unformatted sequential access file. An unformatted direct-access file created by a Fortran program on CLE systems contains records that are the same length. The end-of-file record is recognized in sequential-access files.
assign -s sbin
File Processing
Use an assign -s sbin
specification for a Fortran file opened with either unformatted direct access or unformatted sequential access. The file does not contain record delimiters. The file created for assign -s sbin
in this instance has an unblocked data file structure and uses unblocked file processing.
The assign -s sbin
option can be specified for a Fortran file that is declared as formatted sequential access. Because the file contains records that are delimited with the new-line character, it is not an unblocked data file structure. It is the same as a text file structure.
The assign -s sbin
option is compatible with the standard C I/O functions.
HPE discourages the use of assign -s sbin
because it typically yields poor I/O performance. If an FFIO layer cannot be used, using assign -s text
for formatted files and assign -s unblocked
for unformatted files usually produces better I/O performance than using assign -s sbin
.
assign -s bin
File Processing
An I/O request for a file that is specified with assign -s bin
does not need to be a multiple of a specific number of bytes. Padding occurs when noncharacter variables follow character variables in an unformatted record.
The I/O library uses an internal buffer for the records. If opened for sequential access, a file is not truncated after each record is written to the file.
assign -s u
File Processing
The assign -s u
command specifies undefined or unknown file processing. An assign -s u
specification can be specified for a Fortran file declared as unformatted sequential or direct access. Because the file does not contain record delimiters, it has an unblocked data file structure. Both synchronous and asynchronous BUFFER IN and BUFFER OUT processing can be used with u file processing.
Fortran sequential files declared by using assign -s u
are not truncated after the last word written. The user must execute an explicit ENDFILE statement on the file.
text File Structure
The text file structure consists of a stream of 8-bit ASCII characters. Every record in a text file is terminated by a newline character (\n, ASCII 012). Some utilities may omit the newline character on the last record, but the Fortran library treats such an occurrence as a malformed record. This file structure may be specified for a file that is declared as either formatted sequential access or formatted direct access. It is the default file structure for formatted sequential access and formatted direct access files.
The assign -s text command specifies the library-buffered text file structure. Both library and system buffering are done for all text file structures.
An I/O request for a file using assign -s text
does not need to be a multiple of a specific number of bytes.
BUFFER IN and BUFFER OUT statements cannot be used with this structure. Use a BACKSPACE statement to reposition a file with this structure.
cos or blocked File Structure
The cos or blocked file structure uses control words to mark the beginning of each sector and to delimit each record. Specify this file structure for a file that is declared as unformatted sequential access. Synchronous BUFFER IN and BUFFER OUT statements can create and access files with this file structure.
Specify this file structure with one of the following assign
commands:
assign -s cos
assign -s blocked
assign -F cos
assign -F blocked
These four assign
commands result in the same file structure.
An I/O request on a blocked file is library buffered.
In a cos file structure, one or more ENDFILE records are allowed. BACKSPACE statements can be used to reposition a file with this structure.
A blocked file is a stream of words that contains control words called Block Control Word (BCW) and Record Control Words (RCW) to delimit records. Each record is terminated by an EOR (end-of-record) RCW. At the beginning of the stream, and every 512 words thereafter (including any RCWs), a BCW is inserted. An end-of-file (EOF) control word marks a special record that is always empty. Fortran considers this empty record to be an endfile record. The end-of-data (EOD) control word is always the last control word in any blocked file. The EOD is always immediately preceded by either an EOR, or by an EOF and a BCW.
Each control word contains a count of the number of data words to be found between it and the next control word. In the case of the EOD, this count is 0. Because there is a BCW every 512 words, these counts never point forward more than 511 words.
A record always begins at a word boundary. If a record ends in the middle of a word, the rest of that word is zero filled; the ubc field of the closing RCW contains the number of unused bits in the last word.
The following illustration and table is a representation of the structure of a BCW.
m |
unused |
bdf |
unused |
bn |
fwi |
---|---|---|---|---|---|
(4) |
(7) |
(1) |
(19) |
(24) |
(9) |
Field |
Bits |
Description |
---|---|---|
m |
0-3 |
Type of control word; 0 for BCW |
bdf |
11 |
Bad Data flag (1-bit, 1=bad data) |
bn |
31-54 |
Block number (modulo 224) |
fwi |
55-63 |
Forward index; the number of words to the next control word |
The following illustration and table is a representation of the structure of an RCW.
m |
ubc |
tran |
bdf |
srs |
unused |
pfi |
pri |
fwi |
---|---|---|---|---|---|---|---|---|
(4) |
(6) |
(1) |
(1) |
(1) |
(7) |
(20) |
(15) |
(9) |
Field |
Bits |
Description |
---|---|---|
m |
0-3 |
Type of control word; 108 for EOR, 168 for EOF, and 178 for EOD |
ubc |
4-9 |
Unused bit count; number of unused low-order bits in last word of previous record |
tran |
10 |
Transparent record field (unused) |
bdf |
11 |
Bad data flag (unused) |
srs |
12 |
Skip remainder of sector (unused) |
pfi |
20-39 |
Previous file index; offset modulo 220 to the block where the current file starts (as defined by the last EOF) |
pri |
40-54 |
Previous record index; offset modulo 215 to the block where the current record starts |
fwi |
55-63 |
Forward index; the number of words to the next control word |
Specify Buffer Behavior
A buffer is a temporary storage location for data while the data is being transferred. Buffers are often used for the following purposes:
Small I/O requests can be collected into a buffer, and the overhead of making many relatively expensive system calls can be greatly reduced.
Many data file structures such as cos contain control words. During the write process, a buffer can be used as a work area where control words can be inserted into the data stream (a process called blocking). The blocked data is then written to the device. During the read process, the same buffer work area can be used to remove the control words before passing the data on to the user (called deblocking).
When data access is random, the same data may be requested many times. A cache is a buffer that keeps old requests in the buffer in case these requests are needed again. A cache that is sufficiently large or efficient can avoid a large part of the physical I/O by having the data ready in a buffer. When the data is often found in the cache buffer, it is referred to as having a high hit rate. For example, if the entire file fits in the cache and the file is present in the cache, no more physical requests are required to perform the I/O. In this case, the hit rate is 100%.
Running the I/O devices and the processors in parallel often improves performance; therefore, it is useful to keep processors busy while data is being moved. To do this when writing, data can be transferred to the buffer at memory-to-memory copy speed. Use an asynchronous I/O request. The control is then immediately returned to the program, which continues to execute as if the I/O were complete (a process called write-behind). A similar process called read-ahead can be used while reading; in this process, data is read into a buffer before the actual request is issued for it. When it is needed, it is already in the buffer and can be transferred to the user at very high speed.
When direct I/O is enabled (assign -B on), data is staged in the system buffer cache. While this can yield improved performance, it also means that performance is affected by program competition for system buffer cache. To minimize this effect, avoid public caches when possible.
In many cases, the best asynchronous I/O performance can be realized by using the FFIO cachea layer (assign -F cachea). This layer supports read-ahead, write-behind, and improved cache reuse.
The size of the buffer used for a Fortran file can have a substantial effect on I/O performance. A larger buffer size usually decreases the system time needed to process sequential files. However, large buffers increase a program’s memory usage; therefore, optimizing the buffer size for each file accessed in a program on a case-by-case basis can help increase I/O performance and minimize memory usage.
The
-b
option on theassign
command specifies a buffer size, in blocks, for the unit. The-b
option can be used with the-s
option, but it cannot be used with the-F
option. Use the-F
option to provide I/O path specifications that include buffer sizes; the-b
, and-u
options do not apply when-F
is specified.For more information about the selection of buffer sizes, see the
assign(1)
man page.
The following examples of buffer size specification illustrate using the assign -b
and assign -F
options:
If unit 1 is a large sequential file for which many Fortran
READ
orWRITE
statements are issued, increase the buffer size to a large value, using the followingassign
command:assign -b buffer-size u:buffer-count
If the file foo is a small file or is accessed infrequently, minimize the buffer size using the following
assign
command:assign -b 1 f:foo
Specify Foreign File Formats
The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F
command specifies a foreign record blocking; the assign -C
command specifies the type of character conversion; the -N option specifies the type of numeric data conversion. When -N or -C is specified, the data is converted automatically during the processing of Fortran READ
and WRITE
statements. For example, assume that a record in file fgnfile contains the following character and integer data:
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int
Use the following assign
command to specify foreign record blocking and foreign data formats for character and integer data:
assign -F ibm.vbs -N ibm -C ebcdic fgnfile
One of the most common uses of the assign
command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system, use the following command:
assign -N swap_endian fgnfile
This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.
If all unformatted sequential files are the opposite endianness, use the following command:
assign -N swap_endian g:su
Default Buffer Sizes
The Fortran I/O library automatically selects default buffer sizes according to file access type as shown in the table, Default Buffer Sizes for Fortran I/O Library Routines. Override the defaults by using the assign
command. The following subsections describe the default buffer sizes on various systems.
One block is 4,096 bytes on CLE systems.
Access Type |
Default Buffer Size |
---|---|
Sequential formatted |
16 blocks (65,536 bytes) |
Sequential unformatted |
128 blocks (524,288 bytes) |
Direct formatted |
The smaller of the record length in bytes +1 or 16 blocks (65,536 bytes). |
Direct unformatted |
The larger of the record length is 16 blocks (65,536 bytes). |
Four buffers of default size are allocated. For more information, see the description of the cachea layer in the intro_ffio(3F)
man page.
Library Buffering
The term library buffering refers to a buffer that the I/O library associates with a file. When a file is opened, the I/O library checks the access, form, and any attributes declared on the assign
command to determine the type of processing that should be used on the file. Buffers are an integral part of the processing.
If the file is assigned with one of the following assign
options, library buffering is used:
-s blocked
-F spec (buffering as defined by spec)
-s cos
-s bin
-s unblocked
The -F
option specifies flexible file I/O (FFIO), which uses library buffering if the specifications selected include a need for buffering. In some cases, more than one set of buffers might be used in processing a file. For example, the -F bufa,cos
option specifies two library buffers for a read of a blank compressed COS blocked file. One buffer handles the blocking and deblocking associated with the COS blocked control words, and the second buffer is used as a work area to process blank compression. In other cases (for example, -F system
), no library buffering occurs.
System Cache
The operating system uses a set of buffers in kernel memory for I/O operations. These are collectively called the system cache
. The I/O library uses system calls to move data between the user memory space and the system buffer. The system cache ensures that the actual I/O to the logical device is well formed, and it tries to remember recent data in order to reduce physical I/O requests.
The following assign
command options can be expected to use system cache:
-s sbin
-F spec (FFIO, depends on spec)
For the assign -F cachea
command, a library buffer ensures that the actual system calls are well formed and the system buffer cache is bypassed. This is not true for the assign -s u
option. If assign -s u
is planned to be used to bypass the system cache, all requests must be well formed.
Unbuffered I/O
The simplest form of buffering is none at all; this unbuffered I/O is known as direct I/O. For sufficiently large, well-formed requests, buffering is not necessary and can add unnecessary overhead and delay. The following assign
command specifies unbuffered I/O:
assign -s u ...
Use the assign
command to bypass both library buffering and the system cache for all well-formed requests. The data is transferred directly between the user data area and the logical device. Requests that are not well formed will result in I/O errors.
Specify Foreign File Formats
The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F
command specifies a foreign record blocking; the assign -C
command specifies the type of character conversion; the -N
option specifies the type of numeric data conversion. When -N
or -C
is specified, the data is converted automatically during the processing of Fortran READ
and WRITE
statements. For example, assume that a record in file fgnfile contains the following character and integer data:
character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int
Use the following assign
command to specify foreign record blocking and foreign data formats for character and integer data:
assign -F ibm.vbs -N ibm -C ebcdic fgnfile
One of the most common uses of the assign
command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system, use the following command:
assign -N swap_endian fgnfile
This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.
If all unformatted sequential files are the opposite endianness, use the following command:
assign -N swap_endian g:su
Specify Memory Resident Files
The assign -F mr
command specifies that a file will be memory resident. Because the mr flexible file I/O layer does not define a record-based file structure, it must be nested beneath a file structure layer when record blocking is needed.
For example, if unit 2 is a sequential unformatted file that is to be memory resident, the following Fortran statements connect the unit:
CALL ASNUNIT (2,'-F cos,mr',IER)
OPEN(2,FORM='UNFORMATTED')
The -F cos,mr
specification selects COS blocked structure with memory residency.
Use and Suppress File Truncation
The assign -T
option activates or suppresses truncation after the writing of a sequential Fortran file. The -T on
option specifies truncation; this behavior is consistent with the Fortran standard and is the default setting for most assign -s fs
specifications.
The assign(1)
man page lists the default setting of the -T
option for each -s fs
specification. It also indicates if suppression or truncation is allowed for each of these specifications.
FFIO layers that are specified by using the -F
option vary in their support for suppression of truncation with -T off
.
The following figure, Access Methods and Default Buffer Sizes, summarizes the available access methods and the default buffer sizes.
Define the Assign Environment File
The assign
command information is stored in the assign environment file. The location of the active assign environment file must be provided by setting the FILENV environment variable to the desired path and file name.
Use Local Assign Mode
The assign environment information is usually stored in the .assign
environment file. Programs that do not require the use of the global .assign
environment file can activate local assign mode. If local assign mode is selected, the assign environment will be stored in memory. Thus, other processes cannot adversely affect the assign environment used by the program.
The ASNCTL
routine selects local assign mode when it is called by using one of the following command lines:
CALL ASNCTL('LOCAL',1,IER)
CALL ASNCTL('NEWLOCAL',1,IER)
Local assign mode
In the following example, a Fortran program activates local assign mode and then specifies an unblocked data file structure for a unit before opening it. The -I
option is passed to ASNUNIT
to ensure that any assign attributes continue to have an effect at the time of file connection.
C Switch to local assign environment
CALL ASNCTL('LOCAL',1,IER)
IUN = 11
C Assign the unblocked file structure
CALL ASNUNIT(IUN,'-I -s unblocked',IER)
C Open unit 11
OPEN(IUN,FORM='UNFORMATTED')
If a program contains all necessary assign statements as calls to ASSIGN
, ASNUNIT
, and ASNFILE
, or if a program requires total shielding from any assign
commands, use the second form of a call to ASNCTL
, as follows:
C New (empty) local assign environment
CALL ASNCTL('NEWLOCAL',1,IER)
IUN = 11
C Assign a large buffer size
CALL ASNUNIT(IUN,'-b 336',IER)
C Open unit 11
OPEN(IUN,FORM='UNFORMATTED')
\pagebreak
Interlanguage Communication
The Clang C and C++ compilers provide mechanisms for declaring external functions written in other languages. This enables the writing of portions of an application in C, C++, Fortran, or assembly language, which can be useful in cases where the other languages provide performance advantages or utilities not available in C or C++.
The HPE Cray Compiling Environment LLD now differs from upstream LLD regarding COMMON symbol resolution. When a COMMON symbol is found within a .bss
or another uninitialized data section, users will define that symbol. But if a symbol is from a .data
section, where data is initialized, users will leave that symbol undefined and let the dynamic linker resolve it at runtime.
Fortran, C, C++ Interoperability
The HPE Cray Compiler supports interoperability mechanisms specified in the Fortran 2008 standard, ISO/IEC 1539-1:2010, and TS 29113 Further Interoperability of Fortran and C.
The Fortran 2008 standard describes interoperability features for:
Intrinsic Types
The Fortran intrinsic module ISO_C_BINDING
provides interoperability between Fortran intrinsic types and C types. The ISO_C_BINDING
module provides named constants which can be used as KIND
type parameters, compatible with C types.
In addition to the named constants required by the Fortran standard, the HPE Cray compiler provides, as an extension, definitions for 128-bit floating, and complex types. C_FLOAT128
and C_FLOAT128_COMPLEX
correspond to C types __float128
and __float128 complex
.
Derived Types and Structures
Use the BIND
attribute when creating an interoperable type:
USE ISO_C_BINDING
TYPE, BIND(C) :: THIS_TYPE
. . .
END TYPE THIS_TYPE
Global Variables
Use the BIND
attribute with a common block declaration, or module variable:
USE ISO_C_BINDING
INTEGER(C_INT), BIND(C) :: EXTERN
INTEGER(C_LONG) :: CVAR
BIND(C, NAME='var') :: CVAR
COMMON /A/ I, J
REAL(C_FLOAT) :: I, J
BIND(C) :: /A/
Pointers
ISO_C_BINDING
provides a derived type, c_ptr
, that inter-operates with any C pointer type. Also, Fortran named constant c_null_ptr
is equivalent to the C value NULL.
Subroutines and Function
Declare a Fortran procedure with the BIND attribute. Procedure arguments must be of interoperable type. By default the Fortran compiler converts the procedure name to lower-case (myfunction
); this is the binding label, or corresponding name which is known to the C compiler.
FUNCTION MYFUNCTION(X, Y), BIND(C)
Specify a different binding label:
FUNCTION MYFUNCTION(X, Y), BIND(C, NAME='C_Myfunction')
A function result must be scalar and of interoperable type. A subroutine prototype must have a void result.
TS 29113 describes further interoperability features including:
C descriptors
ISO_Fortran_binding.h defines C structure CFI_cdesc_t
which facilitates using Fortran data objects from within a C function.
ISO_Fortran binding.h
Contains additional C structure definitions and macro definitions to interoperate with an allocatable, or data pointer argument.
BIND(C) Syntax
The proc-language-binding-spec specification allows Fortran programs to interoperate with C objects. The optional comma in FUNCTION name(), BIND(C) is an HPE extension to the Fortran standard.
ISO_C_BINDING
The ISO_C_BINDING module provides interoperability between Fortran intrinsic types and C types. The ISO_C_BINDING module provides named constants which can be used as KIND type parameters, compatible with C types.
In addition to the named constants required by the Fortran 2008 standard, HPE compiler provides, as an extension, definitions for 128-bit floating, and complex types. C_FLOAT128 and C_FLOAT128_COMPLEX correspond to C types __float128
and __float128
complex.
Interlanguage Communication Examples
Interlanguage Communication using Common Block/Global
// common_c.c : example of function called from common.f90
#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>
// globals that match up to the common blocks in common.f90
float c_single;
struct common {
double var1;
int var2;
} multiple;
int c_int_array[100];
// c function called from Fortran
void global_var_common()
{
int i;
// just prints and sets the globals
printf(" In global_var_common\n");
printf(" c_single: %f\n", c_single);
printf(" multiple: %f, %d\n", multiple.var1, multiple.var2 );
printf(" c_int_array: %d, %d\n", c_int_array[0], c_int_array[99]);
c_single = 2 * c_single;
multiple.var1 = 77.77;
multiple.var2 = 17;
for(i=0; i<100; i++ ) {
c_int_array[i] = c_int_array[i] * 3;
}
} // end of global_var_common
! common.f90
! Needs common_c.c
program common_block
use, intrinsic :: iso_c_binding
! use check_error
implicit none
!
! declare the common blocks for c globals
! one with a single real variable
real(c_float) r_var
common /c_single/ r_var
! one with an integer array
integer i_array(100)
common / array / i_array
! one with two variables
real(c_double) :: var1
integer(c_int) :: var2
common / multiple / var1, var2
! do the bind c on the common blocks, renaming one
BIND(C,name="c_int_array") :: / array /
BIND(C) :: / multiple /, /c_single/
call sub1()
end program common_block
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine sub1( )
use, intrinsic :: iso_c_binding
! declare the common blocks for c globals
! one with a single real variable
real(c_float) r_var
common /c_single/ r_var
! one with an integer array
integer i_array(100)
common / array / i_array
real(c_double) var1
integer(c_int) var2
common / multiple / var1, var2
! do the bind c on the common blocks, renaming array
BIND(C,name="c_int_array") :: / array /
BIND(C) :: / multiple /, /c_single/
interface
subroutine global_var_common( ) bind(c)
use,intrinsic :: iso_c_binding
implicit none
end subroutine global_var_common
end interface
r_var = -99.3
var1 = 88.88
var2 = -13
i_array = [(i,i=1,100)]
! call the c function
call global_var_common( )
print *, "In sub1"
print *, " r_var : ", r_var
print *, " var1 : ", var1
print *, " var2 : ", var2
print *, " array : ", i_array(1), i_array(100)
end subroutine
Interlanguage Communication using Derived Structure
// c program that calls the Fortran subroutine with struct argument, f2008 C.11.3
//*********************************************************************
#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>
// declare the structure type
struct pass {
int lenc, lenf;
float *c, *f;
};
// prototype for the Fortran function
void simulation(long alpha, double *beta, long *gamma, double delta[], struct pass *arrays);
// program that calls the Fortran subroutine
int main ( )
{
int i;
long alpha, gamma;
double beta, delta[100];
struct pass arrays;
alpha = 1234L;
gamma = 5678L;
beta = 12.34;
for(i=0; i<100; i++ ) {
delta[i] = i+1;
}
// fill in some of the structure
arrays.lenc = 100;
arrays.lenf = 0;
arrays.c = (float *) malloc( 100*sizeof(float) );
arrays.f = NULL;
for(i=0; i<100; i++ ) {
arrays.c[i] = 2*(i+1);
}
// reference the Fortran subroutine
simulation(alpha, &beta, &gamma, delta, &arrays);
printf(" After simulation\n");
printf(" alpha: %d, beta: %f\n", alpha, beta );
printf(" gamma: %d\n", gamma );
printf(" arrays.lenc: %d\n", arrays.lenc);
printf(" arrays.c[0],[arrays.lenc-1],: %f, %f\n", arrays.c[0], arrays.c[arrays.lenc-1]);
printf(" arrays.lenf: %d\n", arrays.lenf);
printf(" arrays.f[0],[arrays.lenf-1],: %f, %f\n", arrays.f[0], arrays.f[arrays.lenf-1]);
} // end of main
! Example derived type/structure interoperability, f2008 C.11.3
!**************************************************************
subroutine simulation(alpha, beta, gamma, delta, arrays) bind(c)
use, intrinsic :: iso_c_binding
implicit none
integer (c_long), value :: alpha
real (c_double), intent(inout) :: beta
integer (c_long), intent(out) :: gamma
real (c_double),dimension(*),intent(in) :: delta
type, bind(c) :: pass
integer (c_int) :: lenc, lenf
type (c_ptr) :: c, f
end type pass
type (pass), intent(inout) :: arrays
real (c_float), allocatable, target, save :: eta(:)
real (c_float), pointer :: c_array(:)
integer i
print *, "In simulation"
print *, " alpha: ", alpha, ", beta: ", beta
print *, " delta(1),(100): ", delta(1), delta(100)
! associate c_array with an array allocated in c
call c_f_pointer (arrays%c, c_array, [arrays%lenc])
print *, " c_array(1),(arrays%lenc): ", c_array(1), c_array(arrays%lenc)
! allocate an array and make it available in c
arrays%lenf = 100
allocate (eta(arrays%lenf))
arrays%f = c_loc(eta)
eta = [(i*3,i=1,arrays%lenf)]
! change argument values
c_array = c_array * 2.0
gamma = 77
beta = -55.66
end subroutine simulation
Interlanguage Communication using Module
// c function called from module.f90
#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>
// globals that match up to the module variables in module.f90
float r_var;
double var1;
int var2;
int c_int_array[100];
// c function called from Fortran
void global_var_module()
{
int i;
// just prints and sets the globals
printf(" In global_var_module\n");
printf(" r_var : %f\n", r_var);
printf(" var1 : %f\n", var1 );
printf(" var2 : %d\n", var2 );
printf(" c_int_array: %d, %d\n", c_int_array[0], c_int_array[99]);
r_var = 2 * r_var;
var1 = 77.77;
var2 = 17;
for(i=0; i<100; i++ ) {
c_int_array[i] = c_int_array[i] * 3;
}
} // end of global_var_module
! Example of module/global variable interoperability.
! Needs c function from module_c.c
! ********************************************************************
module module_example_mod
use, intrinsic :: iso_c_binding
real(c_float) r_var
integer i_array(100)
real(c_double) :: var1
integer(c_int) :: var2
BIND(C,name="c_int_array") :: i_array
BIND(C) :: r_var, var1, var2
end module module_example_mod
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program module_example
use module_example_mod
implicit none
call sub1()
end program module_example
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine sub1( )
use module_example_mod
interface ! for the c function
subroutine global_var_module( ) bind(c)
use,intrinsic :: iso_c_binding
implicit none
end subroutine global_var_module
end interface
r_var = -99.3
var1 = 88.88
var2 = -13
i_array = [(i,i=1,100)]
! call the c function
call global_var_module( )
print *, "In sub1"
print *, " r_var : ", r_var
print *, " var1 : ", var1
print *, " var2 : ", var2
print *, " array : ", i_array(1), i_array(100)
end subroutine