Cray Compiler Fortran Reference

Fortran Compiler Introduction

The HPE Cray Compiling Environment (CCE) Fortran compiler supports HPE Cray system and supports the Fortran 2018 standard (ISO/IEC 1539:2018) with some exceptions and deferred features as noted elsewhere. The HPE Cray Fortran compiler is also documented in man pages, beginning with the crayftn(1) man page. Where the information in this manual differs from the man page, the information in the man page is presumed to be more current.

The HPE Cray Fortran Programming Environment

The HPE Cray Fortran Programming Environment consists of the tools and libraries used to develop Fortran applications. These are:

  • The ftn command, which invokes the HPE Cray Fortran compiler. The ftn command is properly termed a compiler driver, as it is used both to compile source code into object code and to link object code files and libraries to create executable files. This compiling and linking can be performed either as separate processes or as one contiguous process, which has significant implications for file handling considerations. These implications are described later in this section. See the crayftn(1) man page for more information.

  • HPE Cray Scientific and Math Libraries (CSML) - a set of high performance libraries that provide portability for scientific applications by implementing APIs for arrays (NetCDF), sparse and dense linear algebra (BLAS, LAPACK, ScaLAPACK) and fast Fourier transforms (FFTW).

  • The ftnlx command, which generates listings and checks for possible errors in Fortran programs. See the ftnlx(1) man page for more information.

  • The ftnsplit command, which splits named Fortran files into separate files with one program unit per file. See the ftnsplit(1) man page for more information.

  • The ftnmgen command, which invokes the Fortran makefile generator. See the ftnmgen(1) man page for more information.

HPE Cray Fortran Compiler Messages

The HPE Cray Fortran compiler can produce many messages during compilation and linking. To expand on these messages, use the explain command. For more information, see the explain(1) man page.

Document-specific Conventions

Cray pointer : The term Cray pointer refers to the Cray pointer data type extension.

Fortran Standard Compatibility

In the Fortran standard, the term processor means the combination of a Fortran compiler and the computing system that executes the code. A processor conforms to the standard if it compiles and executes programs that conform to the standard, provided that the Fortran program is not too large or complex for the computer system in question.

The compiler can be directed to flag and generate messages when nonstandard usage of Fortran is encountered. For more information about this command line option (ftn -en), see the crayftn(1) man page. When the option is in effect, the compiler prints messages for extensions to the standard that are used in the program. As required by the standard, the compiler also flags the following items and provides the reason that the item is being flagged:

  • Obsolescent features

  • Deleted features

  • Kind type parameters not supported

  • Violations of any syntax rules and the accompanying constraints

  • Characters not permitted by the processor

  • Illegal source form

  • Violations of the scope rules for names, labels, operators, and assignment symbols

The HPE Cray Fortran compiler includes extensions to the Fortran standard. Because the compiler processes standard-conforming programs according to the standard, it is considered to be a standard-conforming processor. When the option to note deviations from the Fortran standard is in effect (-en), extensions to the standard are flagged with ANSI messages when detected at compile time.

Fortran 2018 Compatibility

This release of HPE CCE fully supports the Fortran 2018 standard including coarray TEAMS, with one limitation. This release does not yet support the REDUCE intrinsic with CHARACTER arguments. Support for this feature is expected in a future release.

Fortran Extensions

The HPE Cray Fortran Compiler supports extended features beyond those specified by the current standard. For more information, see HPE Cray Fortran Language Extensions.

Invoke the HPE Cray Fortran Compiler

The ftn(1) command invokes the HPE Cray Fortran compiler when the HPE Cray Compiling Environment is loaded. Typically the ftn command processes the input files specified on the command line and generates a binary object file, and then loads the binary object file and generates the executable file a.out.

HPE Cray Fortran Command Syntax

The ftn command is a driver that invokes the HPE Cray Fortran Compiler when the HPE Cray Compiling Environment is loaded, and links in the libraries required in order to produce code that can be executed on HPE compute nodes. Valid ftn options include those of the ftn(1) driver, as well as those specific to the HPE Cray Fortran Compiler:

ftn
       [-A module_name[,module_name]...]
       [-b bin_obj_file]
       [-c]
       [-d disable]
       [-D identifier[=value]]
       [-e enable]
       [-f source_form]
       [-fbackslash]
       [-fopenmp]
       [-F]
       [-g]
       [-G debug_lvl]
       [-h arg]
       [-I incldir]
       [-J dir_name]
       [-K trap=opt[,opt]...]
       [-l libname]
       [-L ldir]
       [-m msg_lvl]
       [-M msgs]
       [-N col]
       [-o out_file]
       [-O opt[,opt]...]
       [-p module_site[,module_site...]]
       [-Q path]
       [-r list_opt]
       [-R runchk]
       [-s size]
       [-S]
       [-T]
       [-U identifier[,identifier]...]
       [-v]
       [-V]
       [--version]
       [-W phase,"opt...",]
       [-x dirlist]
       [-Y phase,dirname]
       [--]
       sourcefile [sourcefile ...]

sourcefile Suffix

The sourcefile.suffix names the file or files to be processed. The file suffixes indicate the content of each file and determine whether the preprocessor, compiler, assembler, or linker will be invoked. At least one source file must be specified, unless the -V option is specified.

Parameter

Description

.f, .for

Fixed-format source, compile

.F, .FOR

Fixed-format source, preprocess, compile

.f90, .f95, .f03, .f08, .f18, .ftn

Free-format source, compile

.F90, .F95, .F03, .F08, .F18, .FTN

Free-format source, preprocess, compile

.o

object file, link

.s

assembler source, assemble

The source form specified on the -f source_form option overrides the source form implied by the file suffixes.

If only one source file is specified on the command line, the .o file is created and deleted. To retain the .o file, use the -c option to disable the linker. Object files produced by HPE Cray Fortran, C, C++, or assembler compilers, can be specified. Object files are passed to the linker in the order in which they appear on the ftn command line. If the linker is disabled by the -b or -c option, no files are passed to the linker.

File Types Used or Created by the Compiler

The compiler uses and creates several types of files during processing:

a.out

Default name of the executable output file. Use the compiler driver command line option -o to specify an executable name other than a.out.

.i

Files containing output from the source preprocessor

.o

Relocatable object code. During compilation, these relocatable object files are saved in the current directory automatically. If CrayPat is used to conduct performance analysis experiments, the object files created during compilation must be kept in order to preserve source-to-executable function mapping. To do so, use the -h keepfiles option.

.a

Library files containing external references

.s

Assembly language files. Files with .s extensions are assembled and written to the corresponding .o file.

.mod

By default, the compiler writes a MODULENAME.mod file for each module; MODULENAME is created by taking the name of the module and, if necessary, converting it to upper case. This file contains module information, including any contained procedures. If the -ef option is specified, the compiler writes modulename.mod for each module, rather than MODULENAME.mod.

Module information files

The compiler creates modules from MODULE program units. A module is referenced with the USE statement. The compiler creates a module information file for each module file, with the suffix .mod. By default, all .mod files are named MODULENAME.mod, where MODULENAME is the name of the module (in uppercase) specified in the MODULE or USE statement.

The options that change this are -e/dm, -e/df, -J, and -Q.

  • -em is the default.

  • -dm causes the information to be written to the binary .o file.

  • -ef modifies -em to write the information to the modulename.mod file rather than MODULENAME.mod. -ef is not allowed with -dm.

  • -J and -Q specify a directory where all .mod output is created in and searched for. -J is only allowed with -em and -ef and only affects module information location. -Q is also allowed with -dm and affects all non-temporary files.

The search order for satisfying module references in USE statements is as follows:

  1. The current compilation.

  2. The -J dir_name directory, if specified.

  3. Any directories or files specified with the -p and -I options, in order of specification.

  4. Any directories or files specified with the FORTRAN_MODULE_PATH environment variable.

  5. The current working directory or the -Q directory, if specified.

By default, when searching within a directory, the compiler first searches the .mod files, then the .o files, then the .a files, and then the directories, in the order specified.

For module compatibility purposes, the HPE Cray Fortran compiler supports the current release and two previous releases.

Fortran Command-line Options

The ftn command invokes the HPE Cray Fortran compiler and accepts the following options and arguments.

  • -A module_name[,module_name]...

    Directs the compiler to behave as if you entered a USE module_name statement for each module_name in your Fortran source code. The USE statements are entered in every program unit and interface body in the source file being compiled.

  • -b bin_obj_file

    Disables the link step and saves the binary object file of your program in bin_obj_file.

    Only one input file is allowed when the -b bin_obj_file option is specified. If you have more than one input file, use the -c option instead. If only one input file is being processed and neither the -b nor -c option is specified, the binary object file of your program is not saved after the link step is completed.

    If both -b bin_obj_file and -c are specified, the link step is disabled and the binary object file is written to bin_obj_file.

    Default: not set

  • -c

    Disables the link step and saves the binary object file version of your program in file.o, where file is the name of the source file. If there is more than one source file, a file.o is created for each input file specified.

    Default: not set

  • -d disable, -e enable

    Disable or enable compiling options. To specify more than one option, enter the options without separators between them; for example, -e aj.disable/enable can be one or more of the following options.

-d/-e Option

Action

0

Initialize all undefined local stack, static, and heap variables to 0 (zero). If a user variable is of type character, it is initialized to NUL. If logical, initialized to false. The stack variables are initialized upon each execution of the procedure. When used in combination with -ei, Real and Complex variables are initialized to signaling NaNs, while all other typed objects are initialized to 0. Objects in common blocks will be initialized if the common block is declared within a BLOCKDATA program unit compiled with this option.

Default: disabled

a

Abort compilation after encountering the first error.

Default: disabled

A

Treat all module variables as PUBLIC. Do not override any explicit PRIVATE statements or attributes. Disabling this option with -dA has the effect of including a PRIVATE statement in the specification part of the module.

Default: enabled

b

If enabled, issue a warning message rather than an error message when the compiler detects a call to a procedure with one or more dummy arguments having the TARGET, VOLATILE or ASYNCHRONOUS attribute and there is not an explicit interface definition.

Default: disabled

B

Generate binary output. If disabled, inhibit all optimization and allow only syntactic and semantic checking.

Default: enabled

c

Interface checking: use HPE’s system modules to check library calls in a compilation. If a user procedure has the same name as one in the library, this will produce errors, as the compiler does not skip user-specified procedures when performing checks.

Default: disabled

C

Enable/disable some types of standard call site checking. The current Fortran standard requires that the number and types of arguments must agree between the caller and callee. These constraints are enforced in cases where the compiler can detect them, however, specifying -dC disables some of this error-checking, which may be necessary in order to get some older Fortran codes to compile.

Note: If error-checking is disabled, unexpected compile-time or runtime errors may occur.

In addition, the compiler by default attempts to detect situations in which an interface block should be specified but is not. Specifying -dC disables this type of checking.

Default: enabled

d

Control a column-oriented debugging feature when using fixed source form. When enabled, the compiler replaces a D or d character appearing in column 1 of the source with a blank and treats the entire line as a valid source line. This feature is useful if you want to insert PRINT statements as part of your debugging process.

Default: disabled

D

Enable all debugging options. This option is equivalent to specifying the -G0 option with the -m2, -rl, and -R bcdsp options.

Default: disabled

e

Enable/disable masking expression support for non-integer type operands. This allows masking expressions to be evaluated without type conversion. For example, if -ee is specified, real_variable = real_variable .or. another_real_variable is evaluated as a bitwise or and the assignment will occur without type conversion. By default, the compiler will turn .or. into the IOR intrinsic and type conversion will occur when the value is assigned to the real_variable. This option may not be supported in all situations because of new compiler optimization requirements.

Default: -de

E

Allow existing declarations to duplicate the declarations contained in a used module. Only existing declarations that declare the function name or generic name in an EXTERNAL or type statement are allowable under this option. Therefore, it is not necessary to modify the older code by removing the existing declarations. Because the declarations are not removed, the use associated objects duplicate declarations already in the code, which is not standard-conforming. However, this option allows the compiler to accept these statements as long as the declarations match the declarations in the module.

Existing declarations of a procedure must match the interface definitions in the module; otherwise an error is generated.

Default: disabled

f

This option is a modifier to the -em option. When this option is enabled, module files are created with lowercase names, as in modulename.mod. When it is disabled, module file creation is determined by the setting of the -em option. The compiler writes a modulename.mod file for each module; modulename is created by taking the name of the module and, if necessary, converting it to lowercase.

Default: disabled

F

Control preprocessor expansion of macros in Fortran source lines.

Default: enabled whenever preprocessing is enabled.

g

Allow branching into the code block for a DO or DO WHILE construct, which may be necessary in order to permit older codes to compile. Historically, codes used branches out of and into DO constructs. Current Fortran standards prohibit branching into a DO construct from outside of that construct and the compiler issues an error in this situation. Specifying the -eg option will allow codes with these constructs to compile, but performance may suffer as a result.

Default: disabled

h

Enable support for 8-bit and 16-bit INTEGER and LOGICAL types that use explicit kind or star values. By default (-eh), data objects declared asINTEGER(kind=1) or LOGICAL(kind=1) are 8 bits long and objects declared as INTEGER(kind=2) or LOGICAL(kind=2) are 16 bits long. When this option is disabled (-dh), data objects declared as INTEGER(kind=1), INTEGER(kind=2), LOGICAL(kind=1), or LOGICAL(kind=2) are 32 bits long.

Note: Vectorization of 8- and 16-bit objects is deferred.

Default: enabled

i

Initialize all undefined local stack, static, and heap variables of type REAL or COMPLEX to an invalid value (signaling NaN). Stack variables are initialized upon each execution of the procedure. Objects in common blocks will be initialized if the common block is declared within a BLOCKDATA program unit compiled with this option.

Default: disabled

I

Treat all variables as if an IMPLICIT NONE statement had been specified. Do not override any IMPLICIT statements or explicit type statements. All variables must be typed.

Default: disabled

j

Execute DO loops at least once.

Default: disabled

K

Allow character literal continuation in free source form without the leading & (ampersand character) on the continuing line. This is an extension to the Fortran standard.

Default: enabled

m

When this option is enabled, the compiler creates MODULENAME.mod files to hold module information for future compiles. The compiler writes a MODULENAME.mod file for each module; MODULENAME is created by taking the name of the module and, if necessary, converting it to uppercase.

The -ef option is a modifier to the -em option. If -ef is specified, the output goes to modulename.mod rather than MODULENAME.mod.

Default: enabled

n

Generate messages to note nonstandard Fortran usage.

N

-eN is the same as -en, except that the messages are ERROR level.

If -dN is specified, ANSI messages are not generated. This is the same as -dn.

If multiple -en, -dn, -eN, or -dN options are specified, the last one encountered takes precedence.

Default: -dN

o

Display to stderr the optimization options the compiler used for this compilation. This is the same as specifying -h display_opt.

Default: disabled

p

Enable or disable double-precision arithmetic. This option can be used only when the default data size is 64 bits (-s default64 or -s real64). Double-precision arithmetic is disabled by default.

When the -s default64 or -s real64 option is used and double-precision arithmetic disabled, variables declared on a DOUBLE PRECISION statement and constants specified with the D exponent are converted to default real type (64-bit). If double precision is enabled (-ep), they are handled as a double-precision type (128-bit).

Similarly, when the -s default64 or -s real64 option is used, variables declared on a DOUBLE COMPLEX statement and complex constants specified with the D exponent are mapped to the complex type in which each part has a default real type, so the complex variable is 128-bit. If double precision is enabled (-ep), each part has double-precision type, so the double complex variable is 256-bit.

Default: disabled

P

Perform source preprocessing on Fortran source files but do not compile. When specified, source code is included by #include directives but not by Fortran INCLUDE lines. Generates file.i, which contains the source code after the preprocessing has been performed and the effects applied to the source program. If the -o out_fileargument is also specified, the preprocessed source is written to out_file instead of file.i.

Default: disabled

q

Abort compilation if 100 or more errors are generated.

Default: enabled

Q

Control whether or not the compiler accepts variable names that begin with a leading underscore (_) character. For example, when -e Q is specified, the compiler accepts _ANT as a variable name. Enabling this option can cause collisions with system name space; for example, library entry point names.

Default: disabled

R

Compile all functions and subroutines as if they contained a RECURSIVE keyword.

Default: enabled

s

Scale the values of the count and count_rate arguments for the SYSTEM_CLOCK intrinsic function down by a factor of 2**14 (16384) if the storage size of the values of each of the count and count_rate arguments is 32 bits.

Default: enabled

S

Generate assembly language output and saves it in file.s.

Default: disabled

T

Control preprocessing of Fortran source files. When enabled, source preprocessing is performed. Macro expansion within Fortran source lines is enabled but can be controlled by the -e/d F command line option. When disabled (-dT), preprocessing of the Fortran source file is not performed, even for files with upper case suffixes such as file.F90.

Default: When not specified, the default is to honor the case of the file name suffix, and other preprocessing options such as -e/d Z, -e/d P, and -e/d F.

v

Allocate variables to static storage. These variables are treated as if they had appeared in a SAVE statement. Variables that are explicitly or implicitly defined as automatic variables are not allocated to static storage.

The following types of variables are not allocated to static storage: automatic variables (explicitly or implicitly stated), variables declared with the AUTOMATIC attribute, variables allocated in an ALLOCATE statement, and local variables in explicit recursive procedures. Variables with the ALLOCATABLE attribute remain allocated upon procedure exit, unless explicitly deallocated, but they are not allocated in static memory. Variables in explicit recursive procedures consist of those in functions, in subroutines, and in internal procedures within functions and subroutines that have been defined with the RECURSIVE attribute. The STACK compiler directive overrides this option.

Default: disabled

w

Enable support for automatic memory allocation for allocatable variables and arrays that are on the left-hand side of intrinsic assignment statements.

Using this option may degrade runtime performance, even when automatic memory allocation is not needed. It can affect optimizations for a code region containing an assignment to allocatable variables or arrays; for example, by preventing loop fusion for multiple array syntax assignment statements with the same shape.

Default: enabled

X

If a module variable has initializers, implicit or explicit, and the variable has greater than 10,000 elements to be initialized, optionally create a new module procedure to do the initialization at runtime before MAIN is called. Enabling this option may significantly reduce compile time and reduce the size of the executable for some code, while increasing execution time. If performance is the only issue, disable this option.

Default: enabled

x

Cause the SYSTEM_CLOCK intrinsic to useclock_gettime().

Default: disabled

z

Initialize all memory allocated by Fortran ALLOCATE statements to zero. This option applies only for the current source file and should be specified for each source file compilation where this behavior is desired.

Default: disabled

Z

Perform source preprocessing and compilation on Fortran source files. When specified, source code is included by both #include directives and Fortran INCLUDE lines. Generates file file.i, which contains the source code after the preprocessing has been performed and the effects applied to the source program.Default: disabled

  • -D identifier[=value]

    Defines variables used for source preprocessing as if they had been defined by a #define source preprocessing directive. If a value is specified, there can be no spaces on either side of the equal sign. If no value is specified, the default value is 1.

    Compare to the -U identifier option.

  • -E

    Performs source preprocessing on Fortran source files, but does not compile. When specified, source code is included by #include directives but not by Fortran INCLUDE lines. The preprocessed source code is sent to stdout. This option overrides other preprocessing control, -e/dP and -e/dZ.

  • -f source_form

    Specifies whether the Fortran source file is written in fixed source form or free source form. For source_form, enter free or fixed.

    The default is fixed for source files that have .f, .F, .for, or .FOR extensions. The default is free for source files that have .f90, .F90, .f95, .F95, .f03, .F03, .f08, .F08, .f18, .F18, .ftn, or .FTN extensions.

    The upper-case file extensions, .F, .FOR, .F90, .F95, .F03, .F08,.F18, or .FTN, will enable source preprocessing by default.

  • -f backslash

    Change the interpretation of backslashes in character literals from a single backslash character to C-style escape characters. The following combinations are expanded \a, \b, \f, \n, \r, \t, \v, \, and \0 to the ASCII characters alert, backspace, form feed, newline, carriage return, horizontal tab, vertical tab, backslash, and NUL, respectively.

  • -f cray-program-library-path=program_library

    Create and use a persistent repository of compiler information specified by program_library. When used with -h wp, this option provides application-wide, cross-file, automatic inlining. This option is an alias to the -h pl=program_library compiler option.

  • -f [no-]openmp

    Enable or disable compiler recognition of OpenMP directives. Using -f no-openmp is similar to the -h thread0 option, in that it disables OpenMP, but unlike -h thread0, -f no-openmp does not affect autothreading. This option is an alias to -h [no]omp. The HPE Cray Programming Environment will link in the serial version of LibSci when -f no-openmp is used.

    Default: -f no-openmp (-h noomp)

  • -f [no-]openmp-simd

    Enable or disable compiler recognition of OpenMP SIMD directives. This option may be enabled (-h omp_simd or -fopenmp-simd) when general OpenMP is disabled (-h noomp or -fno-openmp), allowing the compiler to take advantage of omp simd constructs for CPU vectorization without enabling CPU threading for omp parallel constructs. This option may not be disabled (-h noomp_simd or -fno-openmp-simd) when general OpenMP is enabled (-h omp or -fno-openmp). Specifying -O0 with OpenMP disabled (-h noomp or -fno-openmp) will disable OpenMP SIMD recognition (-h noomp_simd or -fno-openmp-simd). This option is an alias to -h [no]omp_simd.

    Default: -f openmp-simd

  • -f denormal-fp-math=ieee

    When applied on the link step, begin execution with Gradual Underflow for denormals.

  • -f denormal-fp-math=preserve-sign

    When applied on the link step, begin execution with Abrupt Underflow (aka flush-to-zero) for denormals. This is default behavior for Fortran on CPUs.

  • -f pic, -f PIC

    Generates partition independent code (PIC), which allows a virtual address change from one process to another, as is necessary in the case of shared, dynamically linked objects. The virtual addresses of the instructions and data in PIC code are not known until dynamic link time. These are aliases to -h pic and -h PIC.

  • -f sanitize=check

    Turns on runtime checks for various forms of undefined or suspicious behavior. This is an experimental feature currently.

    This option controls whether the compiler adds runtime checks for various forms of undefined or suspicious behavior, and is disabled by default. If a check fails, a diagnostic message is produced at runtime explaining the problem.

    The following checks are currently supported:

  • -fsanitize=address

    : Enables AddressSanitizer, a memory error detector.

    Note: COMMON variables are not supported currently.

  • -fsanitize=thread

    : Enables ThreadSanitizer, a data race detector.

    Note:

    • AddressSanitizer and ThreadSanitizer cannot be used simultaneously.

    • Lower optimization levels are more likely to produce accurate sanitizer reports

-F

Macro expansion in Fortran source lines is now enabled by default whenever preprocessing is enabled. See the -d|e F option. The -F option is obsolete and supported for compatibility with legacy make files.

-g

When specified with no optimization options or with -O0, provides debugging support identical to specifying the -G0 option. If any optimization option is specified, -g is ignored.

Default: off

-G debug_lvl

Controls the tradeoffs between ease of debugging and compiler optimizations. The compiler produces some level of internal debugger information (DWARF) at all times. This DWARF data provides function and source line information to debuggers for tracebacks and breakpoints, as well as type and location information about data variables.

Note that the -g or -G options can be specified on a per-file basis, so that only part of an application pays the price for improved debugging.

The -G debug_lvl arguments are as follows:

debug_lvl

Support

0

Full DWARF information is available for debugging, but at the cost of a slower and larger executable. Breakpoints can be set at each line. Most optimizations are disabled including floating point optimizations. This level of debugging implies -h ipa0, -h scalar0, -h thread0, -h vector0 and -h fp0. Pattern, fusion and unrolling are all off.

1

Most DWARF information is available with partial optimization. Some optimizations make tracebacks and limited breakpoints available in the debugger. Some scalar optimizations and all loop nest restructuring is disabled, but the source code will be visible and most symbols will be available. This allows block-by-block debugging, with the exception of innermost loops. The executable will be faster than with -g or -G0.

2

Partial DWARF information. Most optimizations, tracebacks and very limited breakpoints are available in the debugger. The source code will be visible and some symbols will be available. This level allows post-mortem debugging, but local information such as the value of a loop index variable is not necessarily reliable at this level because such information often is carried in registers in optimized code. The executable will be faster and smaller than with -G1.

  • -h arg

    The -h arg options enable you to access various compiler functions. Some of these options duplicate -O arg options.

  • -h [no]acc

    Enables or disables the compiler recognition of OpenACC accelerator directives. See the intro_openacc(7) man page.

    Default: noacc

  • -h acc_model=option[option]...

    Explicitly control execution and memory model utilized by the accelerator support system. The option arguments identify the type of behavior desired. There are three option sets. Only one member of a set may be used at a time, however, all three sets may be used together.

    Default: auto_async_kernel:fast_addr:no_deep_copy

    The option Set 1 is as follows:

    option

    Description

    auto_async_none

    Execute kernels and updates synchronously, unless there is an async clause present on the kernels or update directive.

    auto_async_kernel

    Default. Execute all kernels asynchronously ensuring program order is maintained.

    auto_async_all

    Execute all kernels and data transfers asynchronously, ensuring program order is maintained.

    The option Set 2 is as follows:

    option

    Description

    no_fast_addr

    Use default types for addressing.

    fast_addr

    Default. Attempt to use 32 bit integers in all addressing to improve performance. Base addresses remain as 64 bit. The performance is improved by potentially using fewer registers and faster arithmetic for offset calculations. This optimization may result in incorrect behavior for codes that make use within accelerator regions of any of the following: very large arrays (offsets would require greater than 32 bits); very large array lower bounds (max offset plus lower bound is greater than 32 bits); bitfields/other bit operations.

    The option Set 3 is as follows:

    option

    Description

    no_deep_copy

    Default. Do not look inside of an object type to transfer sub-objects. Allocatable members of derived type objects will not be allocated on the device.

    deep_copy

    (Fortran only) Look inside of derived type objects and recreate the derived type on the accelerator recursively. A derived type object that contains an allocatable member will have memory allocated on the device for the member.

  • -h [no]add_paren

    Automatically add parentheses to select associative operations (+,-,*) to encourage left to right evaluation of floating point and complex expressions. Left to right evaluation is not required by the language standards but some applications may expect it.

    Default: noadd_paren

  • -h [no]aggress

    Cause the compiler to treat a subroutine, function, or main program as a single optimization region. Doing so can improve the optimization of large program units but also increases compile time and size.

    Default: noaggress

  • -h [no]align_arrays

    Enable or disable padding of arrays in static data. Some statically allocated arrays are aligned and padded for better cache behavior. Common block data is not affected.

    Default: align_arrays

  • -h [no]autoprefetch

    Enable or disable automatic prefetch optimization. Does not affect loop_info [no]prefetch directive.

    Default: autoprefetch

  • -h [no]autothread

    Enable or disable autothreading.

    Default: noautothread

  • -h [no]bounds

    Enable or disable checking of array bounds. Bounds checking is not performed on arrays dimensioned as (1). Enables -h overindex. Equivalent to the -Rb option.

  • -h byteswapio

    Force byte-swapping of all input and output files for direct and sequential unformatted I/O. Byteswapio is implemented during the linker phase so that it can be uniformly applied across the entire executable. This is a link-time option.

  • -h cachen

    Specify the level of automatic cache management to be performed, where n is a value from 0 to 3 with 0 being no cache management and 3 being the most aggressive. Note that cache blocking is controlled by the cblock level, a separate option.

    Default: cache0

  • -h cblockn

    Specify cache blocking policy, where n is a level from 0 to 3.

    0 : No cache blocking is performed. Directives are ignored.

    1 : Only block according to directives

    2 : Honor directives and block for largest private cache

    3 : Honor directives and block for largest cache (prorated for core sharing)

    Default: cblock0

  • -h [no]caf

    Enable the compiler to recognize coarray syntax. The macro _CRAY_COARRAY will be defined as 1 if -hcaf is specified on the command line.

    -hnocaf is required for Fortran code that will be linked with C++ code.

    PGAS behavior is determined by the number of physical cores on the node. For more information, see the intro_pgas(7) man page.

    Default: caf

  • -h [no]concurrent

    Equivalent to adding a concurrent directive before every loop in the file, including loops created from array syntax. This option may provide significant performance improvements for some codes. The user must ensure that all loops are clearly parallel; private arrays, ambiguous reductions and other special forms may not yield valid parallel code in this mode. -hnoconcurrent honors existing concurrent directives. The default is -hnoconcurrent.

  • -h [no]contiguous

    Declare that every assumed shape array and array pointer target is contiguous, whether or not they have a CONTIGOUS keyword, potentially increasing the range of permitted compiler optimizations. By default, the compiler does not assume that all array pointers are pointers associated with contiguous targets or that all assumed shape arrays are contiguous and there is no way to verify this at compile time.

    Use this option with caution. This additional level of compiler optimization is safe when the memory objects occupy contiguous blocks of memory. If there is potential for hidden dependencies between the memory locations to which the pointers are referring, do not use this option.

    Default: nocontiguous

  • -h [no]contiguous_assumed_shape

    If contiguous_assumed_shape is specified, all assumed-shape dummy arguments are implicitly marked with the CONTIGUOUS attribute.

    Default: nocontiguous_assumed_shape

  • -h cpu=target_system

    Specify the target system on which the absolute binary file is to be executed, where target_system can be:

    • ivybridge

    • sandybridge

    • haswell

    • broadwell

    • mic-knl

    • x86-skylake

    • x86-cascadelake

    • x86-naples

    • arm-thunderx2

    If target_system is set during compilation of any source file, it must be set to the same target during linking and loading.

    Rather than setting this option directly, users should load one of the targeting modules (for example, craype-sandybridge). The targeting modules set CRAY_CPU_TARGET and define paths to the corresponding libraries. The compiler driver script translates CRAY_CPU_TARGET to the corresponding -h cpu=target_system option when calling the compiler.

    If a user wishes to override the current target_system value set by the module environment (via the CRAY_CPU_TARGET definition), they should do so by specifying -hcpu=target_system on the compiler command line.

  • -h craylibs_arch_override

    Forces the Cray math library to honor the processor architecture specified by the -h cpu option. Processor architecture is typically specified by loading one of the targeting modules, e.g., craype-sandybridge, but can be overridden at link time by using the -h cpu option.

    If the CRAYLIBS_ARCH_OVERRIDE environment variable is defined, it takes precedence over this option.

  • -h develop

    Reduce compile time at the expense of optimization, by omitting or scaling back optimizations that are known to increase compile time. This option is intended to be used when a program is under development and being recompiled frequently, and is different from and independent of the -O options. Consider using this option when using the -O0 or -O1 options results in a longer compile time, or when code compiled with the -O0 or -O1 options runs so slowly as to negate whatever time savings were gained by faster compilation.

    Default: off

  • -h dir_check

    Enable a run time check for the !dir$ collapse directive and check the validity of the loop_info count information. Equivalent to the -Rd option.

  • -h display_opt

    Display the compiler optimization settings currently in force. This option is identical to the -eo option.

  • -h dynamic

    Directs the compiler driver to create dynamically linked executable files and link dynamic libraries at runtime. Note that the preferred invocation is to call the generic ftn command with the -dynamic option, rather than using this compiler-specific option. Compare to the -h shared and -h static options and the CRAYPE_LINK_TYPE environment variable, and see the ftn(1) man page for more information.

  • -h error_on_warning

    If set, change the message level of all warning messages to error.

    Default: not set

  • -h find_dirs

    Issue warning messages for all unsupported INTEL (!DIR$), Fujitsu (!OCL), PGI (!PGI$), GCC (!GCC$), and DEC (!DEC$) directives.

    Default: not set

  • -h flex_mp=level

    Control the aggressiveness of optimizations that may affect floating point and complex repeatability when application requirements require identical results when varying the number of ranks or threads. The valid values for level are:

    level

    Description

    intolerant

    Has the highest probability of repeatable results, but also the highest performance penalty.

    rigorous

    Maintains the bit-reproducibility of intolerant but provides most of the performance benefits of strict.

    strict

    Uses some safe optimizations and yields higher performance than intolerant, with a high probability of repeatable results.

    conservative

    Uses more aggressive optimization and yields higher performance than strict, but results may not be sufficiently repeatable for some applications.

    default

    Uses more aggressive optimization and yields higher performance than conservative, but results may not be sufficiently repeatable for some applications.

    tolerant

    Uses most aggressive optimization and yields highest performance, but results may not be sufficiently repeatable for some applications.

    Default: default

  • -h [no]fma

    Enable or disable the generation of fused multiply add (FMA) instructions, if supported on the target hardware. FMA instructions are enabled by default at -hfp levels of 1 or higher, but disabled by default at -hfp0. This option can be used for debugging a numerically sensitive application. The use of FMAs are generally better for performance, but introduce different, although not necessarily incorrect, rounding. This will only affect compiler-generated FMA opportunities and will not affect pre-built libraries.

    Default: fma (enabled, except at -hfp0)

  • -h [no]fortran_ptr_alias

    The noortran_ptr_alias option indicates storage accessed through a Fortran POINTER is only accessible from said POINTER. The compiler is free to assume no overlap with other POINTER based storage or variables with the TARGET attribute. This is a very strong assertion. When applicable, it permits very aggressive optimizaiton.

    Default: -h fortran_ptr_alias

  • -h [no]fortran_ptr_overlap

    : The nofortran_ptr_overlap option indicates storage accessed through one Fortran POINTER does not overlap with storage accessed through any other Fortran POINTER; while overlap with non-POINTER variables with TARGET attribute is allowed.

  • -h fpn[=[no]approx]

    Controls the level of floating point optimizations, where n is a value between 0 and 4, with 0 giving the compiler minimum freedom to optimize floating point operations and 4 giving it maximum freedom. The higher the level, the less the floating point values conform to the IEEE standard. Use -h fp4 only if your application uses algorithms which are tolerant of reduced precision. Do not use -h fp4 for codes that use Boost I/O or for any codes that do “roll your own” I/O.

    fp0 may provide well defined values from some intrinsic operations were the Fortran language standard does not specify behavior. For examples see CEILING and FLOOR.

    Default: fp2

  • -h [no]fp_trap

    Control whether the compiler generates code compatible with floating point traps being enabled.

    Default: fp_trap if traps are enabled using the -K trap option or if -Ofp[0,1] is in effect. Otherwise, the default is nofp_trap.

  • -h [no]func_trace

    For use only with CrayPat (an HPE performance analysis tool). If this option is specified, the compiler inserts CrayPat entry points into each function in the compiled source file. The names of the entry points are __pat_tp_func_entry and __pat_tp_func_return.

    These are resolved by CrayPat when the program is instrumented using the pat_build command. When the instrumented program is executed and it encounters either of these entry points, CrayPat captures the address of the current function and its return address.

    Default: nofunc_trace

  • -h fusionn

    Control loop fusion globally and change the assertiveness of the FUSION directive. Loop fusion can improve the performance of loops. although in some rare cases it may degrade overall performance.

    The n argument enables you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of the FUSION directive. The valid values for n are:

    n

    Effect

    0

    No fusion (ignore all FUSION directives and do not attempt to fuse other loops)

    1

    Attempt to fuse loops that are marked by the FUSION directive.

    2

    Attempt to fuse all loops (includes array syntax implied loops), except those marked with the NOFUSION directive.

    Default: fusion2

  • -h gasp[=opt]:opt]

    Request GASP (Global Address Space Performance Analysis) instrumentation. With no options specified, remote data accesses are profiled. When opt is specified, the compiler provides additional instrumentation as follows.

    opt

    Description

    local

    Enables instrumentation of events generated by shared local accesses. Instrumenting these events can add runtime overhead to the application.

    functions

    Enables function instrumentation. Sets -hipa0.

  • -h [no]heap_allocate

    -h heap_allocate forces all variable-size local arrays and temporary arrays to be allocated on the heap. !dir$ heap_allocate directives are ignored.

    -h noheap_allocate places variable-size local arrays and temporary arrays on the stack, except where the !dir$ heap_allocate directive applies.

    Default: noheap_allocate

  • -h ignore_unknown_dirs

    Suppress generation of warning messages when compiler encounters an unknown directive.

    Default: not set

  • -h ipalevel

    Specifies the level of interprocedural optimization (IPA). level may be one of the following values.

    level

    Description

    0

    All inlining/cloning disabled. All inlining and cloning compiler directives are ignored.

    1

    Inlining/cloning is attempted for call sites and routines that are under the control of a compiler directive.

    2

    Include level 1. Inline a call site to an arbitrary depth as long as the expansion does not exceed some compiler-determined threshold. The call site must flatten for any expansion to occur. The call site is said to “flatten” when there are no calls present in the expanded code. The call site must reside within the body of a loop and the entire loop body must flatten. A loop body is said to “flatten” when all call sites within the body of the loop are flattened.

    3

    (Default) Includes levels 1 and 2. Incline call sites that contain constant actual argument(s). Additionally, any call site (regardless of location) that is below some small compiler-determined threshold will inline, provided that the call site flattens. If a routine does not inline, the compiler may clone said routine if there exists a performance benefit.

    4

    Includes levels 1, 2, and 3. Additionally, a call site does not have to reside in a loop body to inline, nor does the call site necessarily have to flatten.

    5

    Includes levels 1, 2, 3, and 4. Thresholds are raised and may allow for additional inlining/closing that was not achieved at level 4.

  • -h nointerchange

    Inhibits the compiler’s attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default.

    Specifying the -h nointerchange option is equivalent to specifying a NOINTERCHANGE directive prior to every loop. To disable loop interchange on individual loops, use the NOINTERCHANGE directive.

  • -h keepfiles

    The -h keepfiles option prevents the removal of the object ( .o) and temporary assembly (.s) files after an executable is created. Normally the compiler automatically removes these files after linking them to create an executable. Since the original object files are required in order to instrument a program for performance analysis, if planning to use CrayPat to conduct performance analysis experiments, use this option to preserve the object files.

    Default: not set

  • -h keep_frame_pointer

    Retain call stack information back to main entry point for CrayPat performance sampling.

    Default: not set

  • -h list=a|c|d|e|E|i|l|m|o|s|T|x

    Produce a listing file. The valid arguments are:

    Argument

    Description

    a

    Include all reports in the listing (including source, cross references, options, lint, loopmarks, common block, and options used during compilation).

    c

    Listing includes a COMMON block report (lists all common blocks and members of each block).

    d

    Decompiles (translates) the intermediate representation of the compiler into listings that resemble the format of the source code. You can use these files to examine the restructuring and optimization changes made by the compiler, which can lead to insights about changes you can make to your Fortran source to improve its performance. The compiler produces two decompilation listing files per source file specified on the command line, with these two extensions: .opt and .cg.

    e

    Expands included files in the source listing. This option is off by default.

    E

    Same as -h list=e for Fortran.

    i

    Used with the -h list=m option to intersperse loop optimization messages within the loopmark listing. By default, the messages are placed at the bottom of the program unit.

    l

    Lists source code and includes lint style checking. The listing includes the COMMON block report (see the -h list=c option for more information about the COMMON block report).

    m

    Produces a source listing with loopmark information. To provide a more complete report, this option automatically enables the -h negmsg option to show why loops were not optimized. If you do not require this information, use the -h nonegmsg option on the same command line. Loopmark information will not be displayed if the -d B option has been specified.

    o

    Show all options used by the compiler during compilation.

    s

    Lists source code.

    T

    Retains file.T after processing rather than deleting it. The file.T can be used to call ftnlx directly. For more information, see the ftnlx(1) man page.

    x

    Produces a cross-reference listing.

  • -h loop_trips=[tiny|small|medium|large|huge]

    Specifies runtime loop trip counts for all loops in a compiled source file. This information is used to better tune optimizations to the runtime characteristics of the application.

  • -h map_long_double_to_real16

    Maps the C_LONG_DOUBLE KIND from the ISO_C_BINDING module to a 128-bit REAL. This allows for interoperability in what would otherwise be an incompatible 80-bit extended precision C format. The corresponding C files must be compiled with the mlong-double-128 option.

  • -h [no]modinline

    Directs the compiler to create templates for module procedures and store them in file.o, MODULENAME.mod, or modulename.mod. These templates are used by IPA to inline/clone a routine. A USE statement makes the templates available to IPA.

    Default: modinline

  • -h [no]msgs

    Controls whether messages describing optimizations performed are written to stderr. Similar information in a more-readable format can be obtained by using the -rm option instead.

    Default: nomsgs

  • -h [no]negmsgs

    Controls whether messages explaining why optimizations such as vectorization or inlining did not occur are written to stderr. The -h negmsgs option enables the -h msgs option. The -rm option enables the -h negmsgs option.

    Default: nonegmsgs

  • -h network=nic

    Specifies the target machine’s interconnection attribute. The supported values are gemini and aries.

  • -h [no]omp

    Enable or disable compiler recognition of OpenMP directives. Using -h noomp is similar to the -h thread0 option, in that it disables OpenMP, but unlike -h thread0, -h noomp does not affect autothreading. CrayPE will link in the serial version of LibSci when -hnoomp is used.

    -fopenmp is a synonym for -h omp.

    Default: noomp

  • -h [no]omp_simd

    Enable or disable compiler recognition of OpenMP SIMD directives. This option may be enabled (-h omp_simd or -fopenmp-simd) when general OpenMP is disabled (-h noomp or -fno-openmp), allowing the compiler to take advantage of omp simd constructs for CPU vectorization without enabling CPU threading for OMP parallel constructs. This option may not be disabled (-h noomp_simd or -fno-openmp-simd) when general OpenMP is enabled (-h omp or -fno-openmp). Specifying -O0 with OpenMP disabled (-h noomp or -fno-openmp) will disable OpenMP SIMD recognition (-h noomp_simd or -fno-openmp-simd). This option is an alias to -f[no-]openmp-simd.

    Default: omp_simd

  • -h [no]omp_trace

    Enable or disable the insertion of CrayPat OpenMP tracing calls.

    Default: noomp_trace

  • -h [no]overindex

    Declares that there exists an array subscript applied to a multidimensional array such that the subscript exceeds the declared bounds of its dimension. Such a subscript still must result in an access to the same multidimensional array object.

    Default: nooverindex

  • -h page_align_allocate

    The -h page_align_allocate option directs the compiler to force allocations of arrays larger than the memory page size to be aligned on a page boundary. This option affects only the ALLOCATE statements of the current source file; therefore it must be specified for each source file where this behavior is desired. Using this option can improve DIRECTIO performance.

  • -h [no]pattern

    Enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to highly optimized routines.

    The -h pattern option is enabled only for optimization levels -O2, -h vector2 or higher; there is no way to force pattern matching for lower levels.

    Specifying -h nopattern disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives.

    Default: pattern

  • -h [no]pgas_runtime

    The -h pgas_runtime option directs the compiler driver to link with the runtime libraries required when linking programs that use UPC or coarrays. In general, a resource manager job launcher such as aprun or srun must be used to launch the resulting executable.

    The -h nopgas_runtime option prevents this runtime library environment from being added to the link line.

    Use the -hnopgas_runtime option when you have a program that does not use UPC or coarrays and you want to execute it outside of the aprun/srun job launch context. For example, you may want to test a serial program that does not contain any UPC or coarray code on a login or service node, or fork/exec an executable on a compute node. Also, compile non-coarray Fortran using the -hnocaf option.

    PGAS behavior is determined by the number of physical cores on the node. For more information, see the intro_pgas(7) man page.

    Default: pgas_runtime

  • -h pic|PIC

    Generates position independent code (PIC), which allows a virtual address change from one process to another, as is necessary in the case of shared, dynamically linked objects. The virtual addresses of the instructions and data in PIC code are not known until dynamic link time.

  • -h pl=program_library

    Create and use a persistent repository of compiler information specified by program_library. When used with -h wp, this option provides application-wide, cross-file, automatic inlining. The -f cray-program-library-path=program_library option is provided as an alias to -h pl=program_library to match the HPE CCE C/C++ compiler option.

    The program_library repository is implemented as a directory and the information contained in program_library is built up with each compiler invocation. Any compilation that does not have the -h pl option will not add information to this repository.

    Because of the persistence of program_library, it is the user’s responsibility to manage it. For example, rm -r program_library might be added to the make clean target in an application makefile. Because program_library is a directory, use rm -r to remove it.

    If an application makefile works by creating files in multiple directories during a single build, the program_library should be an absolute path, otherwise multiple and incomplete program library repositories will be created. For example, avoid -hpl=./PL.1 and use -hpl=/fullpath/builddir/PL.1 instead.

  • -h [no]preferred_vector_width[=64|128|256|512]

    Specify the preferred vector width to use when vectorizing loops. This option does not guarantee that the specified vector width will be used, only that it is preferred. The optimizer is still free to choose a smaller width if it is expected to perform better. As the set of acceptable widths is target-sensitive and fairly complicated, the optimizer diagnoses any illegal values.

    A value is not required when specifying nopreferred_vector_width.

  • -h profile_generate

    Enables instrumenting of source code for CrayPat profile-guided optimization. For more information, see the intro_craypat(1) and pat_build(1) man pages.

  • -h scalarn

    Specifies the level of scalar optimization, where n can be one of the following levels:

    n

    Description

    0

    Disables scalar optimization.

    1

    Specifies conservative scalar optimization.

    2

    Specifies moderate scalar optimization.

    3

    Specifies aggressive scalar optimization.

    Default: scalar2

  • -h shared

    Create a library which may be dynamically linked at runtime. Note that the preferred invocation is to call the generic ftn command with the -shared option, rather than using this compiler-specific option. See the ftn(1) man page for more information.

  • -h shortcircuitn

    Specifies various levels of short circuit evaluation, which is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the .AND. and .OR. operator. n can be one of the following levels:

    n

    Description

    0

    Disables short circuiting of IF and ELSEIF statement logical conditions.

    1

    Specifies short circuiting of IF and ELSEIF logical conditions only when a PRESENT, ALLOCATED, or ASSOCIATED intrinsic procedure is in the condition.

    2

    Specifies short circuiting of IF and ELSEIF logical conditions, and it is done left to right. This is the default for architectures without predicated vector support.

    3

    Specifies short circuiting of IF and ELSEIF logical conditions. It is an attempt to avoid making function calls. If either the left or right operand to .AND. and .OR. Operators contain function calls, short circuit evaluation is performed. This is the default for architectures with predicated vector support.

  • -h static

    Directs the linker to use the static version of the libraries, not the dynamic version of the libraries, to create an executable file. Note that the preferred invocation is to call the generic ftn command with the -static option. See the ftn(1) man page for more information.

  • -h [no]safe_addr

    Provides assurance that most conditionally executed memory references are thread safe, which in turn supports a more aggressive use of speculative writes, thereby improving application performance. If -h nosafe_addr is specified, the optimizer performs speculative stores only when it can prove absolute thread safety using the information available within the application code.

    Default: safe_addr

  • -h [no]summary

    Stops the log summary from printing out when any Warnings, Comments, Notes, or Optimization messages are issued. If Errors are issued, a log summary will always print out. If both -V[V[V]] and -h nosummary are specified, the last one specified wins.

    Default: summary

  • -h threadn

    Control the compilation and optimization of OpenMP directives, where n is a value from 0 to 3 with 0 being off and 3 specifying the most aggressive optimization. This option is identical to the -O threadn option. If -h thread1 is specified, it is equivalent to specifying -h nosafe_addr.

    Default: thread2

  • -h [no]thread_do_concurrent

    The -h thread_do_concurrent option permits DO CONCURRENT nests to be threaded unless prohibited by the loop_info prefer_nothread directive.

    The -h nothread_do_concurrent option disallows DO CONCURRENT nests to be threaded unless forced by the loop_info prefer_nothread directive.

    Default: nothread_do_concurrent

  • -h [no]offload_do_concurrent

    The -h offload_do_concurrent option permits DO CONCURRENT nests to be offloaded to a GPU unless prohibited by the loop_info prefer_nothread_do_concurrent directive.

    Default: nooffload_do_concurrent

  • -h unrolln

    The -h unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll all loops, unless the NOUNROLL directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.

    The n argument enables you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of the UNROLL directive. Use one of these values for n:

    n

    Description

    0

    No unrolling. (Ignore all UNROLL directives and do not attempt to unroll other loops.)

    1

    Attempt to unroll loops that are marked by the UNROLL directive.

    2

    Attempt to unroll all loops (includes array syntax implied loops), except those marked with the NOUNROLL directive.

    Default: unroll2

  • -h vectorn

    Specifies the level of automatic vectorizing to be performed. Vectorization results in dramatic performance improvements with a small increase in object code size. Vectorization directives are unaffected by this option.

    The valid values for n are:

    n

    Description

    0

    Minimal automatic vectorization. Characteristics include low compile time and small compile size. This option is compatible with all scalar optimization levels. The compiler will still vectorize array syntax in order to allow full source level debugging with reasonable performance. When this option is specified in conjunction with -hfp0 or -hfp1, then array syntax containing associative floating point or complex operations will not be vectorized.

    1

    Conservative vectorization. The -h vector1 option is compatible with -h scalar1, -h scalar2, and -h scalar3.

    2

    Moderate vectorization. Loop nests are restructured. The -h vector2 option is compatible with -h scalar2 or -h scalar3.

    3

    Aggressive vectorization.

    Default: vector2

  • -h vector_classic

    Prior to CCE 9.0, the Fortran NOVECTOR directive applied to the rest of the program unit, unless subsequently superseded by a VECTOR directive. Beginning with CCE 9.0, the VECTOR and NOVECTOR directives apply only to the next loop.

    The -h vector_classic option, if specified, provides the pre-CCE 9.0 behavior and causes the VECTOR and NOVECTOR directives to behave as toggle switches, controlling vectorization for the remainder of the program unit unless superseded by the countervailing directive.

    Default: not set

  • -h wp

    Enables the whole program mode. This option causes the compiler backend (IPA, optimizer, code generator) to be invoked at application link time, enabling whole program automatic inlining/cloning and future whole program interprocedural analysis (IPA) optimizations. Requires that -h pl=program_library is also specified. The options -h pl=program_library and -h wp should be specified on all compiler invocations and on the compiler link invocation.

    Since the -h wp option provides automatic application-wide inlining, the -Oipafrom option is no longer needed for cross-file inlining and using these two options together is not permitted.

    Since -h wp delays the compiler optimization step until link time, -c compiles will take less time and the link step will take longer. Normally, this is just a time shift from one build phase to another with roughly the same overall compile time. In some cases increased inlining may cause an increase in overall compile time. Using -h wp allows the compiler backend to be invoked in parallel during a build. Setting the environment variable NPROC controls the number of concurrent compiler backend invocations and this parallelism may reduce overall compile time.

    -h ipan guides heuristics of inlining/cloning expansion while the specification of pl=program_library and -h wp guides location and availability of the candidates for expansion.

    Default: not set

  • -h zero

    Initializes all undefined local stack variables to 0 (zero). If a user variable is of type character, it is initialized to NUL. The variables are initialized upon each execution of the procedure. This option is identical to the -e0 option.

    Default: not set

  • -h [no]zeroinc

    Cause the compiler to assume that a constant increment variable (CIV) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV. -h zeroinc can cause less strength reduction to occur in loops that have variable increments.

    Default: nozeroinc

  • -I incldir

    Specifies a directory to be searched for files named in INCLUDE lines and #include directives. You must specify an -I option for each directory you want searched. Directories can be specified in incldir as full pathnames or as pathnames relative to the working directory.

    If no -I is specified, only the working directory and system directories are searched.

  • -J dir_name

    Specifies an alternate directory for the module information files. The compiler puts the .mod files in this directory and searches for .mod files in this directory. The compiler will search for modules stored in the directories specified using the -J dir_name option for the current compilation automatically; it is not necessary to use the -p option explicitly to make the compiler do this.

    -J is not allowed with -dm.

    By default, the files are written to the current working directory.

  • -K trap=opt[,opt]

    Enable traps for the specified exceptions. By default, no exceptions are trapped. Enabling traps using this option also has the effect of setting -h fp_trap.

    If the specified options contradict each other, the last option has priority. For example, -Ktrap=none,fp is equivalent to -Ktrap=fp.

    This option does not affect compile time optimizations; it detects runtime exceptions. This option is processed only at link time and affects the entire program; it is not processed when compiling subprograms. Therefore, traps may be set using this command line option at the beginning of execution of the main program only. The program may subsequently change these settings by calling intrinsic or library procedures. Use of this option may require the specification of -hfp_trap when compiling other files of the application.

    opt

    Exceptions

    denorm

    Trap on denormalized operands.

    divz

    Trap on divide-by-zero.

    fp

    Trap on divz, inv, or ovf exceptions.

    inexact

    Trap on inexact result (i.e. rounded result). Enabling traps for inexact results is not recommended.

    inv

    Trap on invalid operation.

    none

    Disables all traps (default).

    ovf

    Trap on overflow (i.e. the result of an operation is too large to be represented).

    unf

    Trap on underflow (i.e. the result of an operation is too small to be represented).

  • -l libname

    Directs the compiler driver to search for the specified object library file when linking an executable. To request more than one library file, specify multiple -l options.

    When statically linking, the compiler driver searches for libraries by prepending ldir/lib on the front of libname and appending .a on the end of it, for each ldir that has been specified by using the -L option. It uses the first file it finds.

    When dynamically linking, the library search process is similar to the static case, with a few differences. The compiler driver searches for libraries by prepending ldir/lib on the front of libname and appending .so on the end of it, for each ldir that has been specified by using the -L option. If a matching .so is not found, the compiler driver replaces .so with .a and repeats the process from the beginning. It uses the first file it finds.

    There is no search order dependency for libraries.

    If you specify personal libraries by using the -l command line option, those libraries are added before the default HPE CCE library list. For example, when the following command line is issued, the linker looks for a library named libmylib.a (following the naming convention) and adds it to the top of the list of default libraries.

    % ftn -l mylib target.f
    
  • -L ldir

    Changes the -l option search algorithm to look for library files in directory ldir. To request more than one library directory, specify multiple -L options.

    Note that multiple -L options are treated cumulatively as if all ldir arguments appeared on one -L option preceding all -l options. Therefore, do not attempt to link functions of the same name from different libraries through the use of alternating -L and -l options.

    The compiler driver searches for library files in directory ldir before searching the default directories /opt/ctl/libs and /lib. For example, when statically linking, if -L ../mylib, -L /loclib, and -l m are specified, the compiler driver searches for the following files and uses the first one found:

    ../mylibs/libm.a
    /loclib/libm.a
    /opt/ctl/libs/libm.a
    /lib/libm.a
    
  • -m msg_lvl

    Specifies the minimum compiler message levels to enable. The following list shows the integers to specify in order to generate each type of message and which messages are generated by default. Use the explain(1) command to view a message explanation.

    The -m messages types are as follows:

    msg_lvl

    Message Types Enabled

    0

    Error, Warning, Caution, Note, and Comment

    1

    Error, Warning, Caution, and Note

    2

    Error, Warning, and Caution

    3

    Error and Warning (default)

    4

    Error

  • -M msgs

    The -M msgs option suppresses messages at the Warning, Caution, Note, and Comment levels and can change the default message severity to an Error or a Warning level. You cannot suppress or alter the severity of Error-level messages with this option.

    To suppress messages, specify one or more integer numbers that correspond to the HPE Cray Fortran Compiler messages you want to suppress. To specify more than one message number, specify a comma (but no spaces) between the message numbers. For example, -M 110,300 suppresses messages 110 and 300.

    To change a message’s severity to an Error level or a Warning level, specify an E (for Error) or a W (for Warning) and then the number of the message. For example, consider the following option: -M 300,E600,W400.

    This specification results in the following:

    • Message 300 is disabled and is not issued, provided that it is not an Error-level message by default. Error-level messages cannot be suppressed and cannot have their severity downgraded.

    • Message 600 is issued as an Error-level message, regardless of its default severity.

    • Message 400 is issued as a Warning-level message, provided that is it not an Error-level message by default.

  • -N col

    Specifies the line width for fixed and free-format source lines. The value used for col specifies the maximum number of columns per line.

    • For fixed form sources, col can be set to 72, 80, 132, 255, or 1023.

    • For free form sources, col can be set to 132, 255, or 1023.

    • Characters in columns beyond the col specification are ignored.

    • By default, lines are 72 characters wide for fixed-format sources and have no line size limit for free-form source files.

    • There is no line size limit for free-format source files.

  • -O opt[,opt]...

    Specifies optimization features. The opt values 0, 1, 2, and 3 (fast) enable you to specify increasing general levels of optimization. The other opt values enable you to select specific optimization features. All -O options with the exception of 0, 1, 2, and 3 have corresponding -h options available.

    The -O0, -O1, -O2, and -O3 (-Ofast) specifications do not directly correspond to the numeric optimization levels for scalar optimization and vectorization. For example, specifying -O3 does not necessarily enable scalar3 and vector3. HPE reserves the right to alter the specific optimizations performed at these levels from release to release. You can use the -eo option or the ftnlx command to display the optimization options used during compilation.

    The valid opt values are:

    opt

    Optimization Provided

    -O0

    Disables all optimizations including floating point optimizations and OpenACC acceleration. (Equivalent to specifying -hfp0 and -hnoacc). To disable optimizations but leave acceleration enabled, specify -O0 -hacc. Some informational messages may not be issued.

    -O1, -O2, -O3

    -Ofast is a synonym for -O3

    Default: O2

    [no]aggress

    Cause the compiler to treat a subroutine, function, or main program as a single optimization region. Doing so can improve the optimization of large program units but also increases compile time and size.

    Default: noaggress

    [no]autoprefetch

    Controls automatic prefetch optimization. Does not affectloop_info [no]prefetch directive.

    Default: autoprefetch

    [no]autothread

    Enables or disables autothreading.

    Default: noautothread

    cachen

    Specify the level of automatic cache management, where n can be one of the following values:

    0: Specifies no automatic cache management; all memory references are allocated to cache. Both automatic cache blocking and manual cache blocking (by use of the BLOCKABLE directive) are shut off. Characteristics include low compile time. This option is compatible with all optimization levels.

    1: Specifies conservative automatic cache management. Characteristics include moderate compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the predicted cache footprint of the symbol in isolation is small enough to experience reuse.

    2: Specifies moderately aggressive automatic cache management. Characteristics include moderate compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the predicted state of the cache model is such that the symbol will be reused.

    3: Specifies aggressive automatic cache management. Characteristics include potentially high compile time. Symbols are placed in the cache when the possibility of cache reuse exists and the allocation of the symbol to the cache is predicted to increase the number of cache hits.

    fast

    Synonym for -O3.

    fpn

    Controls the level of floating point optimizations, where n is a value between 0 and 3, with 0 giving the compiler minimum freedom to optimize floating point operations and 3 giving it maximum freedom. The higher the level, the less the floating point values conform to the IEEE standard.

    When -hfp[0,1] is specified, it also has the effect of setting -hfp_trap.

    Default: fp2.

    fusionn

    Control loop fusion globally and change the assertiveness of the FUSION directive.

    Loop fusion can improve the performance of loops, although in some rare cases it may degrade overall performance.

    The n argument enables you to turn loop fusion on or off and determine where fusion should occur. It also affects the assertiveness of the FUSION directive. n can be one of the following values:

    0: No fusion (ignore all FUSION directives and do not attempt to fuse other loops)

    1: Attempt to fuse loops that are marked by the FUSION directive.

    2: Attempt to fuse all loops (includes array syntax implied loops), except those marked with the NOFUSION directive.

    Default: fusion2

    ipalevel

    Control level of interprocedural analysis (IPA) which implies the control over the level of automatic inlining and cloning.

    -O ipa level guides heuristics of inlining/cloning expansion while the specification of -O ipafrom=source, or pl=program_library and -hwp guides location and availability of the candidates for expansion.

    Inlining is the process of replacing a user procedure call with the procedure definition itself. This saves subprogram call overhead and may allow better optimization of the inlined code. If all calls within a loop are inlined, the loop becomes a candidate for parallelization.

    Cloning is a situation in which a procedure is duplicated with modifications such that it will run more efficiently. For example, the compiler will clone a procedure for a specific call site when there are constant actual arguments present in that call site. When the clone is made, the dummy arguments are replaced with the constant actual arguments, and the original call to the procedure is replaced with a call to the duplicate copy.

    When -O ipalevel is used alone, the candidates for expansion are all those functions that are present in the input file to the compile step. If -O ipalevel is used in conjunction with -O ipafrom=source or in conjunction with pl=program_library and -hwp, the candidates for expansion are those functions present in source or program_library, respectively. The valid values for level are:

    0: Disable interprocedural analysis and optimizations. All inlining and cloning compiler directives are ignored.

    1: Directive IPA. Inlining/cloning is attempted for call sites and routines that are under the control of a compiler directive.

    2: Inlining. Inline a call site to an arbitrary depth as long as the expansion does not exceed some compiler-determined threshold. The call site must flatten for any expansion to occur. The call site is said to “flatten” when there are no calls present in the expanded code. The call site must reside within the body of a loop and the entire loop body must flatten. A loop body is said to “flatten” when all call sites within the body of the loop are flattened. Includes level 1.

    3: (Default) Constant actual argument inlining and tiny routine inlining. This includes levels 1 and 2, plus any call site that contains a constant actual argument. Additionally, any call nest(regardless of location) that is below some small compiler-determined threshold will be inlined, provided that call nest flattens completely. Cloning directives are recognized.

    4: Aggressive inlining. This includes levels 1, 2, and 3. Additionally, a call site does not have to reside in a loop body to inline, nor does the call site necessarily have to flatten.

    5: Aggressive inlining and aggressive cloning. Includes levels 1, 2, 3, and 4, plus routine cloning is attempted if inlining fails at a given call site.

    ipafrom=source[:source]...

    Explicitly indicate the procedures to consider for inlining/cloning.

    The source arguments identify each file or directory that contains the functions to consider for inlining/cloning. Whenever a call is encountered in the input program that matches a function in source, inlining/cloning is attempted for that call site.

    Note that blank spaces are not allowed on either side of the equal sign.

    All inlining directives are recognized with explicit inlining.

    Note that routines in source are not actually linked with the final program. They are simply templates for the inliner. To have a routine contained in source linked with the program, you must include it in an input file to the compilation.

    The valid source arguments are:

    Fortran source files: The routines in Fortran source files are candidates for inline expansion and must contain error-free code. Source files that are acceptable for inlining are files that have one of the following extensions: .f, .F, .for, .FOR, .f90, .F90, .f95, .F95, .f03, .F03, .f08, .F08, .f18, .F18, .ftn, or .FTN

    module files: MODULENAME.mod and modulename.mod files that contain precompiled inlining templates can be specified. However, this is unnecessary, as the compiler will find these files when resolving the USE statement during compilation.

    Directories: A directory containing any of the Fortran source or Module files described above.

    loop_trips=[tiny|small|medium|large|huge]

    Specifies runtime loop trip counts for all loops in a compiled source file. This information is used to tune optimizations to the runtime characteristics of the application.

    Default: none

    [no]msgs

    Cause the compiler to write optimization messages to stderr.

    Similar information in a more-readable format can be obtained by using the -h list=m (-rm) option instead. Specifying the -h list=m option enables -h msgs.

    Default: nomsgs

    [no]negmsgs

    Cause the compiler to generate messages to stderr to indicate why optimizations such as vectorization or inlining did not occur in a given instance.

    The -O negmsgs option enables the -O msgs option. The -rm option enables the -O negmsgs option.

    Default: nonegmsgs

    nointerchange

    Inhibit the compiler’s attempts to interchange loops. Interchanging loops by having the compiler replace an inner loop with an outer loop can increase performance. The compiler performs this optimization by default. Specifying the -O nointerchange option is equivalent to specifying a NOINTERCHANGE directive prior to every loop. To disable loop interchange on individual loops, use the NOINTERCHANGE directive.

    [no]omp

    Enable or disable compiler recognition of OpenMP directives. Using -O noomp is similar to the -O thread0 option, in that it disables OpenMP, but unlike -O thread0, noomp does not affect autothreading. The -O noomp option is identical to the -h [no]omp option.

    Default: noomp

    [no]overindex

    Declares that there exists an array subscript applied to a multidimensional array such that the subscript exceeds the declared bounds of its dimension. Such a subscript still must result in an access to the same multidimensional array object.

    Default: nooverindex

    [no]pattern

    Enables pattern matching for library substitution. The pattern matching feature searches your code for specific code patterns and replaces them with calls to highly optimized routines.

    The -O pattern option is enabled only for optimization levels -O2, -O vector2 or higher; there is no way to force pattern matching for lower levels.

    Specifying -O nopattern disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives.

    Default: pattern

    scalarn

    Specifies the level of scalar optimization, where n can be one of the following levels:

    0: Disables scalar optimization.

    1: Specifies conservative scalar optimization.

    2: (Default) Specifies moderate scalar optimization.

    3: Specifies aggressive scalar optimization.

    shortcircuitn

    Specifies various levels of short circuit evaluation, which is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the .AND. and .OR. operator. The valid values for n are:

    0: Disables short circuiting of IF and ELSEIF statement logical conditions.

    1: Specifies short circuiting of IF and ELSEIF logical conditions only when a PRESENT ALLOCATED, or ASSOCIATED intrinsic procedure is in the condition.

    2: Specifies short circuiting of IF and ELSEIF logical conditions, and it is done left to right. This is the default for x86-64 systems.

    3: Specifies short circuiting of IF and ELSEIF logical conditions. It is an attempt to avoid making function calls. If either the left or right operand to AND and OR operators contain function calls, short circuit evaluation is performed. This is the default for target cpus other than x86-64.

    [no]safe_addr

    Provides assurance that most conditionally executed memory references are thread safe, which in turn supports a more aggressive use of speculative writes, thereby improving application performance. If -O nosafe_addr is specified, the optimizer performs speculative stores only when it can prove absolute thread safety using the information available within the application code.

    Default: -O safe_addr

    threadn

    Control the compilation and optimization of OpenMP directives, where n is a value from 0 to 3, with 0 being off and 3 specifying the most aggressive optimization.

    Default: -O thread2.

    The valid values for n are:

    0: No autothreading or OpenMP threading. The -O thread0 option is similar to -O noomp, but -O noomp disables OpenMP only and does not affect autothreading.

    1: Specifies strict compliance with the OpenMP standard for directive compilation. Strict compliance is defined as no extra optimizations in or around OpenMP constructs. In other words, the compiler performs only the requested optimizations. If -O thread1 is specified, it is equivalent to specifying -O nosafe_addr

    2: OpenMP parallel regions are subjected to some optimizations; that is, some parallel region expansion. Parallel region expansion is an optimization that merges two adjacent parallel regions in a compilation unit into a single parallel region.

    3: Full optimization: loop restructuring, including modifying iteration space for static schedules (breaking standard compliance). Reduction results may not be repeatable.

    unrolln

    The -O unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll all loops, unless the NOUNROLL directive is specified for a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.

    The n argument enables you to turn loop unrolling on or off and determine where unrolling should occur. It also affects the assertiveness of the UNROLL directive.

    Default: -O unroll2.

    The valid values for n are:

    0: No unrolling. Ignore all UNROLL directives and do not attempt to unroll other loops.

    1: Attempt to unroll loops that are marked by the UNROLL directive.

    2: Attempt to unroll all loops (includes array syntax implied loops), except those marked with the NOUNROLL directive.

    vectorn

    Specifies the level of automatic vectorizing to be performed. Vectorization results in dramatic performance improvements with a small increase in object code size. Vectorization directives are unaffected by this option.

    Default: -O vector2.

    The valid values for n are:

    0: Minimal automatic vectorization. Characteristics include low compile time and small compile size. This option is compatible with all scalar optimization levels. The compiler will still vectorize array syntax in order to allow full source level debugging with reasonable performance. When this option is specified in conjunction with -Ofp0 or -Ofp1, array syntax containing associative floating point or complex operations will not be vectorized.

    1: Conservative vectorization. The -O vector1 option is compatible with -O scalar1, -O scalar2, and -O scalar3.

    2: Moderate vectorization. Loop nests are restructured. The -O vector2 option is compatible with -O scalar2 or -O scalar3.

    3: Aggressive vectorization.

    [no]zeroinc

    Cause the compiler to assume that a constant increment variable (CIV) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV. -O zeroinc can cause less strength reduction to occur in loops that have variable increments.

    Default: nozeroinc.

  • -o out_file

    Override the default executable file name, a.out, with the name specified in the out_file argument.

    If both the -o out_file and -c options are specified, the link step is disabled and the binary file is written to out_file.

    If both the -o out_file and -eP (preprocess only) options are specified, the preprocessed source is written to out_file.

  • -p module_site

    Specify where to look for Fortran modules to satisfy USE statements. The module_site argument specifies the name of a file or directory to search for modules. The module_site specified can be a .mod file, .o (object) file, .a (archive) file, or a directory.

  • -Q path

    Specifies the directory to contain all saved nontemporary files from this compilation (for example, all .o and .mod files). Specific file types (such as .o files) are saved to a different directory if the -b, -J, -o, or -eS options are used.

    By default, this option is disabled and the compiler puts all nontemporary files in the current working directory.

  • -r list_opt

    Produces a listing file. Note that the -rd argument does not invoke the ftnlx(1) command. All others do.

    The -r list_opt command is equivalent to -h list=option.

    The valid list_opt arguments are:

  • a

    Includes all reports in the listing (including source, cross references, options, lint, loopmarks, common block, and options used during compilation).

  • c

    Listing includes a COMMON block report (lists all common blocks and members of each block).

  • d

    Decompiles (translates) the intermediate representation of the compiler into listings that resemble the format of the source code. You can use these files to examine the restructuring and optimization changes made by the compiler, which can lead to insights about changes you can make to your Fortran source to improve its performance.

    The compiler produces two decompilation listing files, with these extensions, per source file specified on the command line: .opt and .cg.

  • e

    Expands included files in the source listing. This option is off by default.

  • E

    Same as -re.

  • i

    Used with the -rm option to intersperse loop optimization messages within the loopmark listing. By default, the messages are placed at the bottom of the program unit.

  • l

    Lists source code and includes lint style checking. The listing includes the COMMON block report (see the -rc option for more information about the COMMON block report).

  • m

    Produces a source listing with loopmark information. To provide a more complete report, this option automatically enables the -O negmsg option to show why loops were not optimized. If you do not require this information, use the -O nonegmsg option on the same command line. Loopmark information will not be displayed if the -dBoption has been specified.

  • o

    Show all options used by the compiler during compilation.

  • s

    Lists source code.

  • T

    Retains file.T afte r processing rather than deleting it. The file.T can be used to call ftnlx directly.

  • x

    Produces a cross-reference listing.

  • -R runchk

    Specifies any of a group of runtime checks for your program. To specify more than one type of checking, specify consecutive runchk arguments: for example, -R bs. By default, no runtime checks are performed.

    The valid runchk arguments are:

  • b

    Enables checking of array bounds. Bounds checking is not performed on arrays dimensioned as (1). Enables -Ooverindex.

  • c

    Enables conformance checking of array operands in array expressions.

  • d

    Enables a run time check for the !dir$ collapse directive and checks the validity of the loop_info count information.

  • p

    Generates run time code to check the association or allocation status of referenced POINTER variables, ALLOCATABLE arrays, or assumed-shape arrays.

  • s

    Enables checking of character substring bounds.

  • -s size

    The -s size option allows you to modify the sizes of variables, literal constants, and intrinsic function results declared as type REAL, INTEGER, LOGICAL, COMPLEX, DOUBLE COMPLEX, or DOUBLE PRECISION. The valid values for size are:

  • byte_pointer

    Applies a byte scaling factor to integers used in pointer arithmetic involving Cray pointers. That is, Cray pointers are moved on byte instead of word boundaries.

  • default32

    Adjusts the data size of default types as follows:

    • 32 bits: REAL, INTEGER, LOGICAL

    • 64 bits: COMPLEX, DOUBLE PRECISION

    • 128 bits: DOUBLE COMPLEX

      The data sizes of integers and logicals that use explicit kind and star values are not affected by this option. However, they are affected by the -eh option.

  • default64

    Adjusts the data size of default types as follows:

    • 64 bits: REAL, INTEGER, LOGICAL

    • 64 bits: DOUBLE PRECISION (implied -dp)

    • 128 bits: COMPLEX

    • 128 bits: DOUBLE COMPLEX (implied -dp)

    If you use -s default64 at compile time, you must also specify this option when invoking the ftn command for linking.

  • integer32

    Adjusts the default data size of default integers and logicals to 32 bits.

  • real32

    Adjusts the data size of default real types as follows:

    • 32 bits: REAL

    • 64 bits: DOUBLE PRECISION

    • 64 bits: COMPLEX

    • 128 bits: DOUBLE COMPLEX

    The data sizes of integers and logicals that use explicit kind and star values are not affected by this option. However, they are affected by the -eh option.

  • real64

    Adjusts the data size of default real types as follows:

    • 64 bits: REAL

    • 64 bits: DOUBLE PRECISION (implied -dp)

    • 128 bits: COMPLEX

    • 128 bits: DOUBLE COMPLEX (implied -dp)

    If you use -s default64 at compile time, you must also specify this option when invoking the ftn command for linking

  • word_pointer

    Applies a word scaling factor to integers used in pointer arithmetic involving Cray pointers. That is, Cray pointers are moved on word instead of byte boundaries.

    The default data size options (for example, -s default64) do not affect the size of data that explicitly declare the size of the data (for example, REAL(KIND=4) X).

    REAL(KIND=16) and COMPLEX(KIND=16) support 128-bit floating point and 256-bit complex types, sometimes referred to as quad-precision.

  • -S

    Generates assembly language output and saves it in file.s. Has the same effect as -eS. By default, this option is off.

  • -T

    Disables the compiler but displays all options currently in effect. By default, this option is off.

  • -U identifier[,identifier]...

    This option undefines variables used for source preprocessing. This option removes the initial definition of a predefined macro or sets a user predefined macro to an undefined state.

    The -D identifier[=value] option defines variables used for source preprocessing. If both -D and -U are used for the same identifier, in any order, the identifier is undefined.

    This option is ignored unless one of the following conditions is true:

    • The Fortran input source file is specified as file.extension, where extension is one of the following: .F, .FOR,.F90, .F95, .F03, .F08, .F18, or .FTN.

    • The -eP or -eZ options have been specified.

  • -v

    Prints information about each compilation phase to the standard error file (stderr). The information contains what the compiler, lister, and linker is doing and what it is calling. By default, this option is off.

  • -V

    Directs each compilation phase to send a message containing version information to the standard error file (stderr). You can specify this option without specifying an input file name; that is, specifying ftn -V is valid. By default, this option is off.

  • --version

    Directs each compilation phase to send a message containing version information to the stdout. You can specify this option without specifying an input file name; that is, specifying ftn --version is valid. Note that -version is incorrect; it must be --version. By default, this option is off.

  • -W phase,"opt..."

    Passes arguments directly to a phase of the compiling system.

    The valid values for phase are:

    phase

    System Phase

    Command

    0 (zero)

    Compiler

    ftn

    a

    Assembler

    as

    c

    Linker

    arg

    l

    Linker

    ftnlx

    r

    Lister

    ftnlx

    x

    Assembler

    arg

    Arguments to be passed to system phases can be entered in either of two styles. If spaces appear within a string to be passed, the string is enclosed in double quotes. When double quotes are not used, spaces cannot appear in the string. Commas can appear wherever spaces normally appear; an option and its argument can be either separated by a comma or not separated. If a comma is part of an argument, it must be preceded by the \ character. For example, any of the following command lines would send -e name to the linker:

    • % ftn -Wl,"-e name" file.F08

    • % ftn -Wl,-e,name file.F08

    • % ftn -Wl,"-ename" file.F08

    -Wa,"assembler_opt" passes the assembler_opt option directly to the as command, directing it to enable all pseudos, regardless of location field name. This option is meaningful to the system only when file.s is specified as an input file on the command line. For more information about assembler options, see the as(1) man page.

    The -Wr,"lister_opt" option passes lister_opt directly to the ftnlx command. For example, specifying -Wr,"-o cfile.o" passes the argument cfile.o directly to the ftnlx command’s -o option; this directs lister_opt to override the default output listing and put the output file in cfile.o. If specifying the -Wr,"lister_opt" option, specify the -h list_opt option in addition to the -Wr. For more information about options, see the ftnlx(1) man page.

    The -Wl,-rpath ldir option changes the run time library search algorithm to look for files in directory ldir. To request more than one library directory, specify multiple -rpath options. Note that a library may be found at link time with an -L option, but may not be found at run time if a corresponding -rpath option was not supplied on the link line. Also note that the compiler driver does not pass the -rpath option to the linker. You must explicitly specify -Wl when using this option.

    At link time, all ldir arguments are added to the executable. The dynamic linker will search these paths first for shared dynamic libraries at run time, with one exception. The Linux environment variable LD_LIBRARY_PATH precedes all other search paths for shared dynamically linked libraries. The use of LD_LIBRARY_PATH is discouraged. Caution should be used when setting LD_LIBRARY_PATH, as doing so will change the shared dynamically linked library search paths for all executable files in your environment.

    -Wx,"arg" passes command line arguments to the PTX assembler for OpenACC applications.

    -Wc,"arg" passes command line arguments to the CUDA linker for OpenACC applications.

    Caution should be used when setting LD_LIBRARY_PATH. Doing so will change the shared dynamically linked library search paths for all executable files in your environment.

  • -x dirlist

    Disables specified directives or specified classes of directives. If specifying a multiword directive, either enclose the directive name in quotation marks or remove the spaces between the words in the directive’s name. By default, no directives or specified classes of directives are disabled.

    dirlist can be one of the following options:

    Option

    Description

    acc

    All OpenACC API directives.

    all

    All compiler and OpenMP Fortran API directives.

    dec

    All !DEC$ directives.

    dir

    All !DIR$ directives.

    directive

    One or more compiler directives. If specifying more than one, separate them with commas, as follows: -x INLINEALWAYS,"NO SIDE EFFECTS",BOUNDS

    gcc

    All gcc directives.

    intel

    All Intel directives.

    ocl

    All Fujitsu directives.

    pgi

    All PGI directives.

    omp

    All OpenMP Fortran API directives.

    conditional_omp

    All C$ and !$ conditional compilation lines.

  • -Y phase,dirname

    Specifies a new directory (dirname) from which the designated phase should be executed. Phase can be one or more of the following values:

    phase

    System Phase

    Command

    0

    Compiler

    ftn

    a

    Assembler

    as

  • --

    Signifies the end of options. After this symbol, specify the files to be processed.

  • sourcefile [sourcefile…]

    Fortran source files to be processed. Possible suffixes of sourcefile indicate the following:

    Option

    Description

    .f, .for

    Fixed-format source, compile

    .F, .FOR

    Fixed-format source, preprocess, compile

    .f90, .f95, .f03, .f08, .f18, .ftn

    Free-format source, compile

    .o

    object file, link

    .a

    assembler source, assemble

    The source form specified using the -f source_form option overrides the source form implied by the file suffixes.

    If only one source file is specified on the command line, the .o file is created and deleted. To retain the .o file, use the -c option to disable the linker. You can specify object files produced by HPE Cray Fortran, C, C++, or assembler compilers. Object files are passed to the linker in the order in which they appear on the ftn command line. If the linker is disabled by the -b or -c option, no files are passed to the linker.

    The source filename and path lengths are limited depending on system. On Linux, the filename must be shorter than 250 characters. The path length can be up to 4096 characters. If the source file is a symlink, the symlinks must not exceed 40 levels.

Set Environment Variables to the HPE Cray Fortran Compiler

Environment variables are predefined shell variables, taken from the execution environment, that determine some of the shell characteristics. Several environment variables pertain to the HPE Cray Fortran compiler. The HPE Cray Fortran compiler recognizes general and multiprocessing environment variables.

The multiprocessing variables in the following sections affect the way the program will perform on multiple processors. Use environment variables to tune the system for parallel processing without rebuilding libraries or other system software.

The variables allow for controlling parallel processing at compile time and at run time. Compile time environment variables apply to all compilations in a session.

The following examples show how to set an environment variable:

  • With the standard shell, enter:

    $ CRAY_FTN_OPTIONS=options
    $ export CRAY_FTN_OPTIONS
    
  • With the C shell, enter:

    % setenv CRAY_FTN_OPTIONS options
    

The following sections describe the environment variables recognized by the HPE Cray Fortran compiler.

Many of the environment variables described in this chapter refer to the default system locations of Programming Environment components. If the HPE Cray Fortran Compiler Programming Environment has been installed in a non-default location, see the system support staff for path information.

CRAY_FTN_OPTIONS

The CRAY_FTN_OPTIONS environment variable specifies additional options to attach to the command line. This option follows the options specified directly on the command line. File names cannot appear. These options are inserted at the rightmost portion of the command line before the input files and binary files are listed. This allows the environment variable to be set once and have the specified set of options used in all compilations. This is especially useful for adding options to compilations done with build tools.

For example, assume that this environment variable was set as follows:

% setenv CRAY_FTN_OPTIONS -G0

With the variable set, the following two command line specifications are equivalent:

% ftn  -c t.f
% ftn -c -G0 t.f

FORTRAN_MODULE_PATH

As with the HPE Cray Fortran compiler -p module_site command line option, this environment variable allows for the specification of the files or the directory to search for the modules to use. The files specified can be a .mod file, .o (object) file, .a (archive) file, or a directory. The compiler appends the contents specified by the FORTRAN_MODULE_PATH environment variable to anything specified with the -p module_site command line option.

Since the FORTRAN_MODULE_PATH environment variable can specify multiple files and directories, a colon separates each path as shown in the following example:

% set FORTRAN_MODULE_PATH='path1 : path2 : path3'

LISTIO_PRECISION

The LISTIO_PRECISION environment variable controls the number of digits of precision printed by list-directed output. The LISTIO_PRECISION environment variable can be set to FULL or PRECISION.

  • FULL prints full precision (default).

  • PRECISION prints x or x+1 decimal digits, where x is the value of the PRECISION intrinsic function for a given real value. This is a smaller number of digits, which usually ensures that the last decimal digit is accurate to within 1 unit. This number of digits is usually insufficient to assure that subsequent input will restore a bit-identical floating-point value.

NLSPATH

The NLSPATH environment variable specifies the message system library catalog path. This environment variable affects compiler interactions with the message system. For more information about this environment variable, see the catopen(3) man page.

NPROC

The NPROC environment variable specifies the maximum number of processes to be run. Setting NPROC to a number other than 1 can speed up a compilation if machine resources permit.

The effect of NPROC is seen at compilation time, not at execution time. NPROC requests a number of compilations to be done in parallel. It affects all the compilers and also the make command.

For example, assume that NPROC is set as follows:

setenv NPROC 2

The following command is entered:

ftn -o t main.f sub.f

In this example, the compilations from .f files to .o files for main.f and sub.f happen in parallel, and when both are done, the link step is performed. If NPROC is unset, or set to 1, main.f is compiled to main.o; sub.f is compiled to sub.o, and then the link step is performed.

The NPROC can be set to any value, but large values can overload the system. For debugging purposes, NPROC should be set to 1. By default, NPROC is 1.

ZERO_WIDTH_PRECISION

The ZERO_WIDTH_PRECISION environment variable controls the field width when field width w of Fw.d is zero on output. The ZERO_WIDTH_PRECISION environment variable can be set to PRECISION or HALF.

  • PRECISION specifies that full precision will be written. This is the default.

  • HALF specifies that half of the full precision will be written.

Run Time Environment Variables

Run time environment variables allow for adjusting the following elements of the run time environment:

  • CRAY_ACC_DEBUG

    • Write accelerator-related activity to stdout for debugging purposes. Valid output levels range from 0, which indicates no output, through 3, which indicates verbose.

    • Default: 0

  • CRAY_AUTO_APRUN_OPTIONS

    • Default options for automatic aprun. See the aprun(1) man page.

  • CRAY_RANK_THREAD_PREFIX

    • Prepend a string identifying mpi rank and omp thread id to each line written to stdout and stderr.

  • CRAY_MALLOPT_OFF (Only relevant if -hsystem_alloc is specified)

    • If set, then the system default mallopt parameters are used, instead of the compiler default parameters. For most programs, run time performance is improved by using the compiler defaults, but more memory may be used.

  • FORMAT_TYPE_CHECKING

  • MALLOC_MMAP_MAX_ (Only relevant if -hsystem_alloc is specified)

    • Specifies the maximum number of memory chunks to allocate with mmap. The compiler default value is 0. For most programs, run time performance is improved by using the compiler default, but more memory may be used.

  • MALLOC_TRIM_THRESHOLD_ (Only relevant if -hsystem_alloc is specified)

    • Specifies the minimum size of the unused memory region at the top of the heap before the region is returned to the operating system. The compiler default value is 536870912 bytes. For most programs, run time performance is improved by using the compiler default, but more memory may be used.

  • NO_STOP_MESSAGE

    • If set, and if the STOP stop_code statement in the Fortran code does not specify the optional stop_code, then STOP messages are not produced when this statement is executed.

  • PGAS_ERROR_FILE

    • Specifies the location to which libpgas (the library which provides an interface to the internal system network) error messages are written. The default is stderr. If stdout is specified, errors will be written to standard output.

  • TMPDIR

    • Compiler temporary files and user scratch files are placed in the directory specified by the TMPDIR environment variable.

  • CRAYLIBS_ARCH_OVERRIDE

    • Override the default HPE Cray math library run time selection and specify the library to use by CPU architecture. The valid options are: ivybridge, sandybridge, haswell, broadwell, mic-knl, x86-skylake, x86-cascadelake, x86-naples, or arm-thunderx2.

      Can be used to specify that a lowest-common-denominator math library be used instead of the default selection, thus ensuring that identical computations produce identical results regardless of the type of compute node CPU actually used. The trade-off is that specifying an older library may affect performance on a newer CPU. For example, if ivybridge is specified, the code will run and produce identical results on a haswell compute node, but performance may be reduced.

      Default: If not set, the library specific to the type of CPU selected at run time is used.

aprun Resource Limits

The aprun command always forwards its own core and cpu resource limits (RLIMIT_CPU and RLIMIT_CORE) to the compute nodes where those limits are set for the application. If a -m value is specified, RLIMIT_RSS is also forwarded.

If the APRUN_XFER_LIMITS run time environment variable is set to a non-zero value, the following resource limits are also forwarded:

  • RLIMIT_FSIZE

  • RLIMIT_DATA

  • RLIMIT_STACK

  • RLIMIT_RSS

  • RLIMIT_NPROC

  • RLIMIT_NOFILE

  • RLIMIT_MEMLOCK

  • RLIMIT_AS

  • RLIMIT_LOCKS

  • RLIMIT_SIGPENDING

  • RLIMIT_MSGQUEUE

  • RLIMIT_NICE

  • RLIMIT_RTPRIO

This forwarding is disabled by default.

This forwarding of user resource limits can cause problems on systems where the login node’s limits are more restrictive than the default compute node limits.

HPE Cray Fortran Directives

Directives are instructions that may be inserted into source code in order to specify certain kinds of special processing to be performed by the compiler during compilation.

Directives are not Fortran language statements. Directives are often compiler-specific, and if the HPE CCE compiler encounters a directive that is not supported by HPE CCE, the compiler will generate a message, ignore the directive, and continue with the compilation.

HPE Cray Fortran Directive Use

A directive line begins with the characters CDIR$ or !DIR$. How to specify a directive depends on the source form being used.

  • If using fixed source form, indicate a directive line by placing CDIR$ or !DIR$ in columns 1 through 5. If the compiler encounters a non-blank character in column 6, the line is assumed to be a directive continuation line. Columns 7 and beyond can contain one or more directives. Characters entered in columns beyond the default column width are ignored.

  • If using free source form, indicate a directive by placing !DIR$ followed by a space and then one or more directives. If the position following the !DIR$ contains a character other than a blank space, tab, or newline character, the line is assumed to be a continuation line. E.g., the asterisk (*) character in column 6 on the second line in the following example indicates that it is a continuation of the first line:

    !DIR$ NOSIDEEFFECTS
    !DIR$*ab
    

    Note the following:

    • The !DIR$ need not start in column 1, but it must be the first text on the line.

    • The FIXED and FREE directives must appear alone on a directive line and cannot be continued.

    • Do not use source preprocessor (#) directives within multiline compiler directives.

To specify more than one directive on a line, separate the directives with commas. Some directives require that one or more arguments be specified; when specifying a directive of this type, no other directive can appear on the line.

Spaces can precede, follow, or be embedded with a directive, regardless of the source form.

Code portability is maintained by using the !DIR$ form of the directive. In the following example, the ! character in column 1 causes other compilers to treat the HPE Cray Fortran compiler directive as if it is a comment:

      A=10
!DIR$ NOVECTOR
      DO 10,I=1,10...
        

Range and Placement of Directives

FIXED and FREE directives can appear anywhere in the source code. All other directives must appear within the program unit where they are to be applied.

The following directives must be placed in the declarative portion of a program unit and apply only to that program unit:

  • CACHE

  • CACHE_NT

  • COPY_ASSUMED_SHAPE

  • IGNORE_TKR

  • MEMORY

  • NAME

  • NOSIDEEFFECTS

  • SAME_TBS

  • STACK

  • WEAK

The following directives toggle a compiler feature on or off at the point at which the directive appears in the code. These directives remain in effect until the opposite directive appears, the directive is reset, or until the end of the program unit, at which time the command line settings become the default for the remainder of the compilation:

  • [NO]BOUNDS

  • [NO]CLONE

  • [NO]COLLAPSE

  • [NO]FUSION

  • [NO]INLINE

  • [NO]PATTERN

  • [NO]PIPELINE

  • [NO]UNROLL

  • [NO]VECTOR

RESETCLONE and RESETINLINE apply at the point at which they appear in the code and reset cloning or inlining back to the defaults.

The SUPPRESS directive applies at the point at which it appears.

The following directives apply only to the next loop or block of code encountered lexically:

  • BLOCKABLE

  • BLOCKINGSIZE|NOBLOCKING

  • CONCURRENT

  • HAND_TUNED

  • [NO]INTERCHANGE

  • IVDEP

  • NEXTSCALAR

  • NOFISSION

  • PERMUTATION

  • PREFERVECTOR

  • PROBABILITY

  • SAFE_ADDRESS

  • SAFE_CONDITIONAL

  • LOOP_INFO

The following directives alter the status of entities in ways that affect compilation. They do not apply to particular ranges of code.

  • IGNORE_TKR

  • INLINEALWAYS|INLINENEVER

  • CLONEALWAYS|CLONENEVER

  • NAME

  • NOSIDEEFFECTS

Interaction with the Command Line

Note the following interactions between directives and the ftn command line options.

-x

The -x option accepts one or more directives as arguments. Directives specified with the -x option are ignored during compilation. To ignore all compiler directives, specify -x all.

-O0

The -O0 option disables all compiler optimizations. All scalar optimization, vector optimization, and tasking directives are ignored.

-O ipan

The -O ipa0 option disables all inlining and cloning optimizations. All inlining and cloning directives are ignored.

-O scalarn

The -O scalar0 option disables all scalar optimizations. All scalar optimization directives are ignored.

-O vectorn

The -O vector0 option disables all vector optimizations. All vector optimization directives are ignored.

BLOCKABLE

!DIR$ BLOCKABLE (do_variable,do_variable[,do_variable]…)

The BLOCKABLE directive specifies that it is legal and desirable to cache block the subsequent loop nest, even when the compiler has not made such a determination. To be legally blockable, the nest must be perfect (without code between constituent loops), rectangular (trip counts of member loops are fixed over the life time of nest), and fully permutable (loop interchange and unrolling is legal at all levels). This directive both permits and requests blocking of the indicated loop nest.

The directive arguments are a comma-delimited list of two or more loop control variables, do_variable.

If a BLOCKINGSIZE directive is also provided for the indicated loop, the following rules apply.

  • If BLOCKINGSIZE is at least 2, the indicated BLOCKINGSIZE is used.

  • If BLOCKINGSIZE is 0, the loop itself is not blocked and it is treated as an inner loop (as part of the nest that traverses the cache block tile).

  • If BLOCKINGSIZE is 1, the loop itself is not blocked and it is treated as an outer loop (as a loop in the nest that moves from tile to tile).

When no BLOCKINGSIZE directive is supplied, the compiler chooses the BLOCKINGSIZE according to its own heuristics.

Example 1: BLOCKINGSIZE 1 followed by its equivalent

subroutine EX1(A, B, n)
  real A(n,n), B(n,n)

!dir$ blockable(i,j)
!dir$ blockingsize(512)
  do j = 1, n
!dir$ blockingsize(1)
      do i = 1, n-1
          A(j,i) = B(j,i) + B(j,i+1)
      enddo
  enddo
end subroutine EX1
subroutine EX1m(A, B, n)
  real A(n,n), B(n,n)

  do js = 1, n, 512
      do i = 1, n-1
          do j = js, min( n, js+511 )
              A(j,i) = B(j,i) + B(j,i+1)
          enddo
      enddo
  enddo
end subroutine EX1m

Notice that blockingsize(1) is applied to an inner loop, while blockingsize(0) typically is used for outer loops.

Example 2: BLOCKINGSIZE > 1 at both levels

subroutine EX2(A, B, n)
  real A(n,n), B(n,n)

!dir$ blockable(i,j)
!dir$ blockingsize(32)
  do j = 1, n-1
!dir$ blockingsize(128)
      do i = 1, n-1
          A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
      enddo
  enddo
end subroutine EX2

subroutine EX2(A, B, n)
  real A(n,n), B(n,n)

  do js = 1, n-1, 32
      do is = 1, n-1, 128
          do j = js, min( n-1, js+31 )
              do i = is, min( n-1, is+127 )
                  A(i,j) = B(i,j) + B(i+1,j) + B(i,j+1)
              enddo
          enddo
      enddo
  enddo
end subroutine EX2

BLOCKINGSIZE, NOBLOCKING

!DIR$ BLOCKINGSIZE(n1[,n2])

!DIR$ NOBLOCKING

The BLOCKINGSIZE directive asserts that the loop following the directive is involved in a cache blocking situation for the primary or secondary cache.

The NOBLOCKING directive prevents the compiler from involving the subsequent loop in a cache blocking situation.

The BLOCKINGSIZE directive supports one argument:

n

where n is an integer that indicates the block size. If the loop is involved in a blocking situation, it will have a block size of n1 for the primary cache and n2 for the secondary cache. The compiler attempts to include this loop within such a block but cannot guarantee this inclusion.

  • For n1, specify a value such that n1 .GE. 0.

  • For n2, specify a value such that n2 .LE. 230.

  • If n1 or n2 are 0, the loop is not blocked, but the entire loop is inside the block.

Example: Using !DIR$ BLOCKINGSIZE

In this example, the compiler makes 20 x 20 blocks when blocking, but it could block the loop nest such that loop K is not included in the file.

SUBROUTINE AMAT(X,Y,Z,N,M,MM)
REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
DO K = 1, N
!DIR$ BLOCKABLE(J,I)
!DIR$ BLOCKING SIZE (20)
   DO J = 1, M
!DIR$ BLOCKING SIZE (20)
      DO I = 1, MM
         Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
      END DO
   END DO
END DO
END

If K is excluded, you can add a BLOCKINGSIZE(0) directive just before loop K to specify that the compiler should generate a loop such as the following example:

SUBROUTINE AMAT(X,Y,Z,N,M,MM)
REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
DO JJ = 1, M, 20
   DO II = 1, MM, 20
      DO K = 1, N
         DO J = JJ, MIN(M, JJ+19)
            DO I = II, MIN(MM, II+19)
               Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
            END DO
         END DO
      END DO
   END DO
END DO
END

Note that an INTERCHANGE directive can be applied to the same loop nest as a BLOCKINGSIZE directive. The BLOCKINGSIZE directive applies to the loop it directly precedes; it moves with that loop when an interchange is applied.

The NOBLOCKING directive prevents the compiler from involving the subsequent loop in a cache blocking situation.

BOUNDS, NOBOUNDS

!DIR$ BOUNDS [array[,array]…]

!DIR$ NOBOUNDS [array[,array]…]

The BOUNDS directive specifies that pointer and array references are to be checked. The NOBOUNDS directive specifies that this checking is to be disabled.

The BOUNDS and NOBOUNDS directives support this optional argument:

array

The name of an array. The name cannot be a subobject of a derived type. When no array name is specified, the directive applies to all arrays.

Array bounds checking provides a check of most array references at both compile time and run time to ensure that each subscript is within the array’s declared size. Bounds checking behavior differs with the optimization level. Bounds checking is not performed on arrays dimensioned as 1. Enables -Ooverindex. Complete checking is guaranteed only when optimization is turned off by specifying -O0 on the ftn command line.

The h [no]bounds (-Rb) command line option controls bounds checking for a whole compilation. The BOUNDS and NOBOUNDS directives toggle the feature on and off within a program unit. Either directive can specify particular arrays or can apply to all arrays.

BOUNDS remains in effect for a given array until the appearance of a NOBOUNDS directive that applies to that array, or until the end of the program unit. Bounds checking can be enabled and disabled many times in a single program unit.

To be effective, these directives must follow the declarations for all affected arrays. It is suggested that they be placed at the end of a program unit’s specification statements unless they are meant to control particular ranges of code.

The bounds checking feature detects any reference to an array element whose subscript exceeds the array’s declared size.

    REAL A(10)
C DETECTED AT COMPILE TIME:
     A(11) = X
C DETECTED AT RUN TIME IF IFUN(M) EXCEEDS 10:
     A(IFUN(M)) = W

The compiler generates an error message if it detects that an array element section reference with an out-of-bounds subscript attempts to reference memory. If the compiler cannot detect the out-of-bounds subscript (for example, if the subscript includes a function reference), a message is issued for out-of-bound subscripts when the program runs, but the program is allowed to complete execution.

Bounds checking does not inhibit vectorization but typically increases program run time. If an array’s last dimension declarator is *, checking is not performed on the last dimension’s upper bound. Arrays in formatted WRITE and READ statements are not checked.

Array bounds checking does not prevent operand range errors that result when operand prefetching attempts to access an invalid address outside an array. Bounds checking is needed when very large values are used to calculate addresses for memory references.

If bounds checking detects an out-of-bounds array reference, a message is issued for only the first out-of-bounds array reference in the loop.

DIMENSION A(10)
      MAX = 20
      A(MAX) = 2
      DO 10 I = 1, MAX
         A(I) = I
10    CONTINUE
      CALL TWO(MAX,A)
      END
      SUBROUTINE TWO(MAX,A)
      REAL A(*)  ! NO UPPER BOUNDS CHECKING DONE
      END

The following messages are issued for the preceding program:

lib-1961 a.out: WARNING
  Subscript 20 is out of range for dimension 1 for array
  'A' at line 3 in file 't.f' with bounds 1:10.
lib-1962 a.out: WARNING
  Subscript 1:20:1 is out of range for dimension 1 for array
  'A' at line 5 in file 't.f' with bounds 1:10.

CACHE

!DIR$ CACHE base_name[,base_name …]

Scope: Declaration

To use the CACHE directive, place it only in the specification part, before any executable statement.

The CACHE directive asserts that all memory operations with the specified symbols as the base are to be allocated in cache. This is an advisory directive. The CACHE directive is meaningful for stores in that it allows the user to override a decision made by the automatic cache management.

base_name

The base name of the object that should be placed into the cache. This can be the base name of any object such as an array, scalar structure, and so on, without member references like C[10]. If a pointer is specified in the list, only the references, not the pointer itself, are cached.

This directive overrides automatic cache management decisions (-h cachen, -O cachen) made on the compiler command line. The cache directive may be locally overridden by the use of the LOOP_INFO directive.

CACHE_NT

!DIR$ CACHE_NT base_name[,base_name …]

Scope: Declaration

To use the CACHE_NT directive, place it only in the specification part, before any executable statement.

Use the CACHE_NT directive to identify objects that should not be placed in cache. This is an advisory directive that specifies objects that should non-temporal reads and writes.

base_name

The base name of the object that should use non-temporal reads and writes. This can be the base name of any object such as an array, scalar structure, and so on, without member references like C[10]. If a pointer is specified in the list, only the references, not the pointer itself, will have the cache non-temporal property.

Advisory directives are directives the compiler will honor if conditions permit it to. When this directive is honored, the performance of code may be improved because the cache is not occupied by objects that have a lower cache reuse rate. In theory, this makes room for objects that have a higher cache reuse rate.

This directive may be locally overridden by use of a LOOP_INFO directive. This directive overrides automatic cache management decisions (-O cachen) made on the compiler command line.

CLONE, NOCLONE, RESETCLONE, CLONEALWAYS, CLONENEVER

!DIR$ CLONE

!DIR$ NOCLONE

!DIR$ RESETCLONE

!DIR$ CLONEALWAYS [name [, name] … ]

!DIR$ CLONENEVER [name [, name] … ]

Cloning is the attempt to duplicate a procedure under certain conditions and replace dummy arguments with associated constant actual arguments throughout the cloned procedure. The compiler attempts to clone a procedure when a call site contains actual arguments that are scalar integer and/or scalar logical constants. When the constants are exposed to the optimizer, it can generate more efficient code. The cloning directives control whether cloning is attempted over a range of code.

The following directives remain in effect until a different cloning directive is encountered or until the end of the program unit.

  • CLONE forces cloning to be attempted at all call sites if the conditions exist for cloning to be done.

  • NOCLONE disables all cloning.

  • RESETCLONE returns the cloning to the state specified on the compiler command line.

  • CLONEALWAYS instructs the compiler to attempt to clone one or more specific procedures, as specified by a comma-delimited list of name values.

  • CLONENEVER prevents cloning of a comma-delimited list of procedures as specified by name.

In the cases of CLONEALWAYS and CLONENEVER, if the directive is placed in the definition of the function, cloning is always or never attempted at every call site to name. If the directive is placed in a function other than the definition, cloning is always or never attempted at every call to name within the specific function containing the directive. An error message is issued if both CLONEALWAYS and CLONENEVER are specified for the same procedure within the same program unit.

Use the compiler -h negmsgs option to see messages that highlight where cloning did occur and conditions that may have inhibited cloning.

COLLAPSE, NOCOLLAPSE

!DIR$ COLLAPSE [(do_var1,do_var2[,do_var3 …])]

!DIR$ NOCOLLAPSE

The COLLAPSE directive controls collapse of the immediately following loop nest. The directive enables the compiler to assume appropriate conformity between trip counts. The compiler diagnoses misuse at compile time (when able) or at run time if the -Rd option is specified during compilation.

The COLLAPSE directive supports one option:

do_var

The names of the DO variables of the participating loops. When the COLLAPSE directive is applied to a loop nest, the do_var variables must be listed in order of increasing access stride. When the COLLAPSE directive is applied to an array assignment statement, the (do_var1, do_var2 [,do_var3 … ]) syntax is omitted.

The NOCOLLAPSE directive disqualifies the next immediate loop from collapsing with any other loop. Collapse is almost always desirable, so use the NOCOLLAPSE directive sparingly. The NOCOLLAPSE directive immediately before an array assignment statement has no effect.

Loop collapse is a special form of loop coalesce. Any perfect loop nest may be coalesced into a single loop, with explicit rediscovery of the intermediate values of original loop control variables. The rediscovery cost, which generally involves integer division, is quite high; therefore coalesce is rarely suitable for vectorization. By definition, loop collapse occurs when loop coalesce may be done without the rediscovery overhead. To meet this requirement, all memory access must have uniform stride.

In Fortran arrays, uniform stride is achieved when a computation can flow from one column of a multidimensional array into the next, viewing the array as a flat sequence. Hence, array sections such as A(:,3:7) are generally suitable for collapse, while a section like A(1:n-1,:) lacks the needed uniformity. Care must be taken when applying the COLLAPSE directive to assumed shape dummy arguments and Fortran pointers because the underlying storage need not be contiguous.

Example 1: COLLAPSE directive

In this example, the COLLAPSE will collapse loop I and loop J into a single loop. The COLLAPSE directive enables the compiler to assume appropriate conformity between trip counts and array extents.

SUBROUTINE S(A, N, N1, N2)
     REAL A(N, *)
!DIR$ COLLAPSE (I, J)
     DO I = 1, N1
          DO J = 1, N2
               A(I,J) = A(I,J) + 42.0
          ENDDO
     ENDDO
END

This results in code that is equivalent to the following. However, the following code is only an example to show the resulting behavior, and should not be coded directly as-is because as program source, it violates the Fortran language standard.

SUBROUTINE S(A, N, N1, N2)
     REAL A(N, *)
     DO IJ = 1, N1*N2
           A(IJ, 1) = A(IJ, 1) + 42.0
     ENDDO
END

Example 2: COLLAPSE directive using array syntax

In this example, the directive enables the compiler to assume appropriate conformity between trip counts and array extends.

SUBROUTINE S( A, B )
     REAL, DIMENSION(:,:) :: A, B
!DIR$ COLLAPSE
     A = B ! USER PROMISES UNIFORM ACCESS STRIDE.
     END

CONCURRENT

!DIR$ CONCURRENT [SAFE_DISTANCE=n]

Scope: Local

The CONCURRENT directive indicates that no data dependence exists between array references in different iterations of the loop. This directive affects the loop that immediately follows it. This can be useful for vectorization optimizations.

The CONCURRENT directive supports one argument:

n : An integer that represents the number of additional consecutive loop iterations that can be executed in parallel without danger of data conflict. n must be an integer constant > 0. If SAFE_DISTANCE=n is not specified, the distance is assumed to be infinite and the compiler ignores all cross-iteration dependencies.

The CONCURRENT directive is ignored if the SAFE_DISTANCE argument is used and vectorization is requested on the command line.

Consider the following example:

!DIR$ CONCURRENT SAFE_DISTANCE=3
      DO I = K+1, N
        X(I) = A(I) + X(I-K)
      ENDDO

The CONCURRENT directive in this example informs the optimizer that the relationship K>3 is true. This allows the compiler to load all of the following array references safely during the Ith loop iteration:

X(I-K)
X(I-K+1)
X(I-K+2)
X(I-K+3)

COPY_ASSUMED_SHAPE

!DIR$ COPY_ASSUMED_SHAPE [array [,array] …]

The COPY_ASSUMED_SHAPE directive copies assumed-shape dummy array arguments into contiguous local temporary storage upon entry to the procedure in which the directive appears. During execution, it is the temporary storage that is used when the assumed-shape dummy array argument is referenced or defined.

The COPY_ASSUMED_SHAPE directive applies only to the program unit in which it appears.

The COPY_ASSUMED_SHAPE directive supports one argument:

array

The name of an array to be copied to temporary storage. If no array names are specified, all assumed-shape dummy arrays are copied to temporary contiguous storage upon entry to the procedure. When the procedure is exited, the arrays in temporary storage are copied back to the dummy argument arrays. If one or more arrays are specified, only those arrays specified are copied. The arrays specified must not have the TARGET attribute.

All arrays specified, or all assumed-shape dummy arrays (if specified without array arguments), on a single COPY_ASSUMED_SHAPE directive must be shape conformant with each other. Incorrect code may be generated if the arrays are not. The -R c command line option can be used to verify whether the arrays are shape conformant.

Except when the dummy argument is declared with the CONTIGUOUS attribute, assumed-shape dummy arguments cannot be assumed to be stored in contiguous storage. In the case of multidimensional arrays, the elements cannot be assumed to be stored with uniform stride between each element of the array. These conditions can arise, for example, when an actual array argument associated with an assumed-shape dummy array is a non-unit strided array slice or section.

If the compiler cannot determine whether an assumed-shape dummy array is stored contiguously or with a uniform stride between each element, some optimizations are inhibited in order to ensure that correct code is generated. If an assumed-shape dummy array is passed to a procedure and becomes associated with an explicit-shape dummy array argument, additional copy-in and copy-out operations may occur at the call site. For multidimensional assumed-shape arrays, some classes of loop optimizations cannot be performed when an assumed-shape dummy array is referenced or defined in a loop or an array assignment statement. The lost optimizations and the additional copy operations performed can significantly reduce the performance of a procedure that uses assumed-shape dummy arrays when compared to an equivalent procedure that uses explicit-shape array dummy arguments.

The COPY_ASSUMED_SHAPE directive causes a single copy to occur upon entry and again on exit. The compiler generates a test at run time to determine whether the array is contiguous. If the array is contiguous, the array is not copied. This directive allows the compiler to perform all the optimizations it would otherwise perform if explicit-shape dummy arrays were used. If there is sufficient work in the procedure using assumed-shape dummy arrays, the performance improvements gained by the compiler outweigh the cost of the copy operations upon entry and exit of the procedure.

FREE, FIXED

!DIR$ FREE

!DIR$ FIXED

The FREE and FIXED directives specify whether the source code in the program unit is written in free source or fixed source form. These directives override the -f option, if specified on the ftn command line.

These directives apply to the source file in which they appear and allow for switching source forms with a source file.

Source form can be changed from within an INCLUDE file. After the INCLUDE file is processed, the source form reverts to the source form that was being used prior to processing the INCLUDE file.

The source preprocessor does not recognize the FREE or FIXED directives. These directives must not be specified in a file that is to be submitted to the source preprocessor. To specify source form with such files, use the -f fixed or -f free option on the ftn command line.

FUSION, NOFUSION

!DIR$ FUSION

!DIR$ NOFUSION

The FUSION and NOFUSION directives direct the compiler to attempt or not attempt loop fusion on the loop following the directive, thus permitting fine-tuning of the selection of which loops the compiler should attempt to fuse. The FUSION directive should be placed immediately before the DO statement of the loop that should be fused.

The FUSION directive instructs the compiler to attempt loop fusion on the following loop unless -h nofusion was specified on the compiler command line.

The NOFUSION directive instructs the compiler to not attempt loop fusion on the following loop even when the -h fusion option is specified on the compiler command line.

If it is desired that only a few loops out of many should be fused, use the FUSION directive with the -O fusion1 option to confine loop fusion to these few loops. Conversely, if there are only a few loops out of many that should not be fused, use the NOFUSION directive with the -O fusion2 option to specify no fusion for these loops.

HAND_TUNED

!DIR$ HAND_TUNED

Assert that the code in the next loop nest has been arranged by hand for maximum performance and the compiler should restrict some of the more aggressive automatic expression rewrites. The compiler should still fully optimize and vectorize the loop within the constraints of the directive. The HAND_TUNED directive applies to the next loop in the same manner as the CONCURRENT and SAFE_ADDRESS directives.

Use of this directive may severely impede performance. Use carefully and evaluate performance before and after employing this directive.

HEAP_ALLOCATE, NOHEAP_ALLOCATE

!DIR$ HEAP_ALLOCATE

!DIR$ NOHEAP_ALLOCATE

HEAP_ALLOCATE puts variable-size array temporaries on the heap until a NOHEAP_ALLOCATE directive is found or the current scope ends. Following scope exit, array placement returns to the policy in force prior to entry to the scope just exited.

NOHEAP_ALLOCATE puts variable-size array temporaries on the stack until a HEAP_ALLOCATE directive is found or the current scope ends. Following scope exit, array placement returns to the policy in force prior to entry to the scope just exited.

The OPTIMIZE directive recognizes the -h heap_allocate and -h noheap_allocate options. The latter is the default. A -h heap_allocate argument on the command line overrides the -h noheap_allocate option in the OPTIMIZE directive, but a -h noheap_allocate argument on the command line does not override the OPTIMIZE directive.

IGNORE_TKR

!DIR$ IGNORE_TKR [ [ (letter) dummy_arg] … ]

The IGNORE_TKR directive directs the compiler to ignore the Type, Kind, and/or Rank of specified dummy arguments in the procedure interface.

This directive supports the following arguments:

letter

This can be T, K, R, or any combination of these letters, for example TK or KR. The letter applies only to the dummy argument it precedes. If letter appears, dummy_arg must appear.

dummy_arg

If specified, it indicates the dummy arguments for which TKR rules should be ignored. If not specified, TKR rules are ignored for all dummy arguments in the procedure that contains the directive.

The directive causes the compiler to ignore the type, kind, and/or rank of the specified dummy arguments when resolving a generic call to a specific call. The compiler also ignores the type, kind, and/or rank on the specified dummy arguments when checking all the specifics in a generic call for ambiguities.

The following example instructs the compiler to ignore type, kind, and/or rank rules for the dummy arguments of the following subroutine fragment:

subroutine example(A,B,C,D)
!DIR$ IGNORE_TKR A, (R) B, (TK) C, (K) D

Dummy Argument

What’s Ignored

A

Type, kind, and rank

B

Rank only

C

Type and kind

D

Kind only

INLINE, NOINLINE, RESETINLINE, INLINEALWAYS, INLINENEVER

!DIR$ INLINE

!DIR$ NOINLINE

!DIR$ RESETINLINE

!DIR$ INLINEALWAYS [name [, name] … ]

!DIR$ INLINENEVER [name [, name] … ]

Inlining replaces calls to user-defined functions with the code that represents the function. This can improve performance by saving the expense of the function call overhead. It also increases the possibility of additional code optimization. Inlining may increase object code size.

The following directives remain in effect until a different inlining directive is encountered or until the end of the program unit.

  • INLINE instructs the compiler to attempt to inline functions at all call sites

  • NOINLINE disables all inlining

  • RESETINLINE returns inlining to the state specified on the compiler command line by the -O ipan option, or to the default state if no option was specified

  • INLINEALWAYS instructs the compiler to attempt to inline one or more specific procedures, as specified by a comma-delimited list of name values

  • INLINENEVER prevents inlining of a comma-delimited list of procedures as specified by name

In the cases of INLINEALWAYS and INLINENEVER, if the directive is placed in the definition of the function, inlining is always or never attempted at every call site to name. If the directive is placed in a function other than the definition, inlining is always or never attempted at every call to name within the specific function containing the directive. An error message is issued if both INLINEALWAYS and INLINENEVER are specified for the same procedure within the same program unit.

Example: INLINEALWAYS and INLINENEVER directives

SUBROUTINE S()
!DIR$ INLINEALWAYS S ! THIS SAYS ATTEMPT
! INLINING OF S AT ALL CALLS.
...
END SUBROUTINE
SUBROUTINE T
!DIR$ INLINENEVER S ! DO NOT INLINE ANY CALLS TO S
! IN SUBROUTINE T.
CALL S()
...
END SUBROUTINE
SUBROUTINE V
!DIR$ NOINLINE ! HAS HIGHER PRECEDENCE THAN INLINEALWAYS.
CALL S() ! DO NOT INLINE THIS CALL TO S.
!DIR$ INLINE
CALL S() ! ATTEMPT INLINING OF THIS CALL TO S.
...
END SUBROUTINE
SUBROUTINE W
CALL S() ! ATTEMPT INLINING OF THIS CALL TO S.
...
END SUBROUTINE

INTERCHANGE, NOINTERCHANGE

!DIR$ INTERCHANGE (do_var1,do_var2[,do_var3 … ])

!DIR$ NOINTERCHANGE

Scope: Local

The INTERCHANGE directive specifies that the order of the two or more loops immediately following the directive should be interchanged.

The NOINTERCHANGE inhibits loop interchange on the loop that immediately follows the directive.

The INTERCHANGE directive supports one option:

do_var

Specifies two or more DO variable names. The do_var names can be specified in any order, and the compiler will reorder the loops. The loops must be perfectly nested. If the loops are not perfectly nested, the results will be unpredictable.

The loops affected by the INTERCHANGE directive are designated by their DO variable names. The compiler will reorder the loops such that the loop with do_var1 is outermost, then loop do_var2, then loop do_var3, and so on.

Example: INTERCHANGE directive

In this example, the interchange directive reorders the loops. The K loop becomes the outermost, followed by J, and the I loop becomes the innermost.

!DIR$ INTERCHANGE (K, J, I)
     DO I = 1,NSIZE1
          DO K = 1,NSIZE1
               DO J = 1,NSIZE1
                    X(I,J) = X(I,J) + Y(I,K) * Z(K,J)
               ENDDO
          ENDDO
     ENDDO

IVDEP

!DIR$ IVDEP [ SAFEVL=vlen| INFINITEVL ]

Ignore vector dependencies in the loop immediately following the directive. The IVDEP directive supports these arguments.

vlen

Specifies a vector length in which no dependency will occur. vlen must be an integer between 1 and 1024 inclusive.

INFINITEVL

Specifies an infinite safe vector length. That is, no dependency will occur at any vector length. This is the default. If vlen is not specified, the vector length used is infinity.

When the IVDEP directive appears before a loop, the compiler ignores vector dependencies, including explicit dependencies, in any attempt to vectorize the loop. IVDEP applies only to the first IVDEP loop that follows the directive within the same program unit. An IVDEP directive before a DO CONCURRENT loop has no effect.

For array operations, Fortran requires that the complete right-hand side (RHS) expression be evaluated before the assignment to the array or array section on the left-hand side (LHS). If possible dependencies exist between the RHS expression and the LHS assignment target, the compiler creates temporary storage to hold the RHS expression result. If an IVDEP directive appears before an array syntax statement, the compiler ignores potential dependencies and suppresses the creation and use of array temporaries for that statement. Using array syntax statements allows the reference of referencing arrays in a compact manner. Array syntax allows the use of either the array name, or the array name with a section subscript, to specify actions on all the elements of an array, or array section, without using DO loops.

Whether or not IVDEP is used, conditions other than vector dependencies can inhibit vectorization.

If a loop with an IVDEP directive is enclosed within another loop with an IVDEP directive, the IVDEP directive on the outer loop is ignored.

When the Cray compiler vectorizes a loop, it may reorder the statements in the source code to remove vector dependencies. When IVDEP is specified, the statements in the loop or array syntax statement are assumed to contain no dependencies as written, and the Cray compiler does not reorder loop statements.

LOOP_INFO

!DIR$ LOOP_INFO [options]

The LOOP_INFO directive allows additional information to be specified about the behavior of a loop, including run-time trip count and hints on cache allocation strategy. This information is provided to the optimizer and can produce faster code sequences. The LOOP_INFO directive supports are large number of optional arguments.

The following trip count arguments use the variable c to indicate an expression that evaluates to an integer constant at compilation time. Use these immediately before a FOR loop to indicate minimum, maximum, and estimated trip counts. The compiler will diagnose misuse at compile time when able to, or when option -h dir_check is specified.

MIN_TRIPS(c)

Specifies guaranteed minimum number of trips.

EST_TRIPS(c)

Specifies estimated or average number of trips.

MAX_TRIPS(c)

Specifies guaranteed maximum number of trips.

The following cache allocation arguments use the variable symbol to indicate the base name of the object that should or should not be placed into cache. This can be the base name of any object, such as an array or scalar structure without member references. If specifying a pointer in the list, only the references, not the pointer itself, are subject to the instruction. For cache allocation hints, use the LOOP_INFO directives to override default settings, CACHE or CACHE_NT directives, or automatic cache management decisions. The cache hints are local and apply only to the specified loop nest.

CACHE(symbol[,symbol] …)

Specifies that symbol, or a comma-delimited list of symbols, is to be allocated in cache. This is the default if no hint is specified and the cache_nt directive is not specified.

CACHE_NT(symbol[,symbol] … )

Specifies that symbol, or a comma-delimited list of symbols, is to use non-temporal reads and writes, and not be allocated in cache.

The following optional arguments do not require variables.

PREFETCH

Specifies a preference that prefetches be performed for the following loop.

NOPREFETCH

Specifies a preference that no prefetches be performed for the following loop.

PREFER_THREAD

The PREFER_THREAD and PREFER_NOTHREAD directives are special cases of the LOOP_INFO advisory directive. Use these directives to indicate a preference for turning threading on or off for the subsequent loop. Use !DIR$ LOOP_INFO PREFER_THREAD to indicate your preference that the loop following the directive be threaded.

PREFER_NOTHREAD

Use !DIR$ LOOP_INFO PREFER_NOTHREAD to indicate that the loop should not be threaded.

The PREFETCH directive instructs the compiler to preload scalar data into the first-level cache to improve the frequency of cache hits and to lower latency. Prefetch instructions are generated in situations where the compiler expects them to improve performance. Strategic use of prefetch instructions can hide latency for scalar loads that feed vector instructions or scalar loads in purely scalar loops. Prefetch instructions are generated at default and higher levels of optimization. Thus, they are turned off at -O0 or -O1. Prefetch instructions can be turned off at the loop level by specifying the NOPREFETCH directive.

MEMORY

UNSUPPORTED FEATURE:

The Cray !DIR$ MEMORY directive is no longer supported. Users are encouraged to prepare to transition to OpenMP 5.0 allocators instead, which provide similar capabilities through standard mechanisms. HPE CCE currently provides partial functional support for OpenMP 5.0 allocators, including support for the “pinned” allocator trait when targeting an NVIDIA or AMD GPU. Support for the “high bandwidth” predefined memory space is planned for a future HPE CCE release.

NAME

!DIR$ NAME (fortran_name=”external_name” [, fortran_name=”external_name” ] … )

Scope: Global

The NAME directive allows the specification of a case-sensitive external name or a name that contains characters outside of the Fortran character set. The NAME directive supports the following arguments:

fortran_name

The name used for the object throughout the Fortran program.

external_name

The external form of the name.

The rules for Fortran naming do not apply to the external_name string. Any character sequence is valid.

The name directive can be used, for example, to write calls to C routines. The Fortran standard BIND feature provides some of the capability of the NAME directive.

Example: Calling a C routine from a Fortran program

PROGRAM MAIN
!DIR$ NAME (FOO="XyZ")
CALL FOO ! XyZ is really being called
END PROGRAM

NEXTSCALAR

!DIR$ NEXTSCALAR

The NEXTSCALAR directive disables vectorization for the first DO or DO WHILE loop following the directive. The directive applies to one loop only; the first loop that appears after the directive but within the same program unit. If the NEXTSCALAR directive appears before any array syntax statement, it disables vectorization for the array syntax statement.

NEXTSCALAR is ignored if vectorization has been disabled.

NOFISSION

!DIR$ NOFISSION

The NOFISSION directive instructs the compiler not to split the loop immediately following the directive. This directive should be placed immediately before the DO statement of the loop that should not be split.

Fission is prevented only for the loop level specified. Loops nested within the indicated loop remain fission candidates unless likewise annotated.

NOSIDEEFFECTS

!DIR$ NOSIDEEFFECTS f[, f … ]

The NOSIDEEFFECTS directive allows the compiler to keep information in registers across a single call to a subprogram without reloading the information from memory after returning from the subprogram. This directive is not needed for intrinsic functions or VFUNCTIONS.

This directive supports one argument:

f

Symbolic name of a subprogram that the user is sure has no side effects. f must not be the name of a dummy procedure, module procedure, or internal procedure.

NOSIDEEFFECTS declares that a called subprogram does not redefine any variables that meet the following conditions:

  • Local to the calling program

  • Passed as arguments to the subprogram

  • Accessible to the calling subprogram through host association

  • Declared in a common block or module

  • Accessible through USE association

A procedure declared NOSIDEEFFECTS should not define variables in a common block or module shared by a program unit in the calling chain. All arguments should have the INTENT(IN) attribute; that is, the procedure must not modify its arguments. If these conditions are not met, results are unpredictable.

The NOSIDEEFFECTS directive must appear in the specification part of a program unit and must appear before the first executable statement.

The compiler may move invocations of a NOSIDEEFFECTS subprogram from the body of a DO loop to the loop preamble if the arguments to that function are invariant in the loop. This may affect the results of the program, particularly if the NOSIDEEFFECTS subprogram calls functions such as the random number generator or the real-time clock.

The effects of the NOSIDEEFFECTS directive are similar to those that can be obtained by specifying the PURE prefix on a function or a subroutine declaration.

OPTIMIZE

!DIR$ OPTIMIZE [(option[ option])]

The OPTIMIZE directive enables optimization in the function in which it appears, overriding the optimization level set via the compiler command line. The OPTIMIZE directive with no option specified is equivalent to OPTIMIZE -O2.

The OPTIMIZE directive may only appear in the declarative section of a program unit. A program unit may be a program, subroutine, function, module, or submodule, but not a block data program unit. OPTIMIZE does not affect any modules invoked with the USE statement in the program unit that contains them. They do affect CONTAINed procedures that do not include an explicit OPTIMIZE directive.

The OPTIMIZE directive accepts the following subset of the command line options that control optimization. Refer to Fortran Command-line Options or the crayftn(1) man page for more detailed information.

  • -Olevel

  • -h acc

  • -h acc_model=

  • -h add_paren

  • -h [no]aggress

  • -h align_arrays

  • -h [no]autothread

  • -h [no]autoprefetch

  • -h cachen

  • -h concurrent

  • -h contiguous

  • -h contiguous_assumed_shape

  • -h flex_mp=level

  • -h fpn

  • -h fp_trap

  • -h fusionn

  • -h [no]heap_allocate

  • -h infinitevl

  • -h loop_trips

  • -h msgs

  • -h negmsgs

  • -h nointerchange

  • -h omp

  • -h overindex

  • -h page_align_allocate

  • -h [no]pattern

  • -h preferred_vector_width=

  • -h scalarn

  • -h shortcircuitlevel

  • -h threadn

  • -h unrolln

  • -h vectorn

  • -h zero

PATTERN, NOPATTERN

!DIR$ PATTERN

!DIR$ NOPATTERN

By default, the compiler detects coding patterns in source code sequences and replaces these sequences with calls to optimized library routines. In most cases, this replacement improves performance. There are cases, however, in which this substitution degrades performance. This can occur, for example, in loops with very low trip counts.

The NOPATTERN directive disables pattern matching and causes the compiler to generate inline code for the loop immediately following the directive. When the NOPATTERN directive is encountered, pattern matching is suspended for the remainder of the program unit or until a PATTERN directive is encountered.

The PATTERN directive is used to resume pattern matching within the program.

When the -O nopattern command line option is in effect, the NOPATTERN and PATTERN compiler directives are ignored.

In the following example, the compiler normally would detect that the loop is a matrix multiply and replace it with a call to a matrix multiply library routine. By preceding the loop with a NOPATTERN directive, however, pattern matching is inhibited and no replacement is done.

!DIR$ NOPATTERN
       DO k= 1,n
         DO i= 1,n
           DO j= 1,m
             A(i,j) = A(i,j) + B(i,k) * C(k,j)
           END DO
         END DO
       END DO

PERMUTATION

!DIR$ PERMUTATION (symbol [, symbol] … )

symbol

Integer array that has no repeated values for the entire routine.

The PERMUTATION directive specifies that an integer array has no repeated values. This directive is useful when the integer array is used as a subscript for another array (vector-valued subscript). This directive may improve code performance.

The PERMUTATION directive is not a loop-based directive. It applies to the entire enclosing routine, regardless of where it is placed.

In a sequence of array accesses that read array element values from the specified symbols with no intervening accesses that modify the array element values, each of the accessed elements will have a distinct value.

When an array element that has a subscript that is an element of an integer array with a subscript that depends on the loop index is on the left side of the equal sign in a loop, many-to-one assignment is possible. Many-to-one assignment occurs if any repeated elements exist in the subscripting array. If it is known that the integer array is used merely to permute the elements of the subscripted array, it can often be determined that many-to-one assignment does not exist with that array reference.

Sometimes a vector-valued subscript is used as a means of indirect addressing because the elements of interest in an array are sparsely distributed; in this case, an integer array is used to select only the desired elements, and no repeated elements exist in the integer array. The permutation directive does not apply to the array a. Rather, it applies to the pointer used to index into it, ipnt. By knowing that ipnt is a permutation, the compiler can safely generate an unordered scatter for the write to a.

!DIR$ PERMUTATION(IPNT) ! IPNT has no repeated values
      ...
      DO I = 1, N
         A(IPNT(I)) = B(I) + C(I)
      END DO

PGAS BUFFERED_ASYNC

!DIR$ PGAS BUFFERED_ASYNC

The PGAS BUFFERED_ASYNC directive batches PGAS operations into bulk data transfers. PGAS data references made by the single statement immediately following the PGAS BUFFERED_ASYNC directive will be batched into bulk data transfers.

Before using this directive, the user should port the code to use the PGAS DEFER_SYNC directive.

No ordering or correctness guarantees between buffered async (BA) and non-BA references are made. No ordering guarantees between BA references are made. Users should insert a fence or barrier if they require ordering guarantees. This directive will allow the compiler to violate language ordering semantics.

Both fence and barrier imply global visibility for BA references. It is the user’s responsibility to ensure BA references do not target overlapping memory.

No automatic progress guarantees are made. The only way to guarantee progress is if both the source and target are actively making BA references or are inside a barrier/fence. User implemented spin-wait routines may encounter deadlock.

The purpose of the BUFFERED_ASYNC directive is to achieve higher performance by batching small references into bulk data transfers. This should only be applied to references targeting non-contiguous irregular memory where the compiler is unable to pattern match to an optimized communication pattern.

Care should be taken to ensure many thousands of BA operations take place before a fence. There is overhead added to achieve bulk data transfers. Using BA references may greatly increase the application’s memory footprint.

PGAS DEFER_SYNC

!DIR$ PGAS DEFER_SYNC

The PGAS DEFER_SYNC directive defers the synchronization of PGAS data. PGAS data references may by the single statement immediately following the PGAS DEFER_SYNC directive will not be synchronized until the next fence instruction.

The compiler normally synchronizes the references in a statement as late as possible without violating program semantics. The purpose of the DEFER_SYNC directive is to synchronize the references even later, beyond where the compiler can determine it is safe.

For example, if there is a remote-memory access (RMA) put near the end of a subroutine, the compiler must guard against the put value being read back immediately after the subroutine returns, so the put is synchronized just before returning. The programmer, however, may know that the value is not read back and can insert a PGAS DEFER_SYNC directive.

Example: Coarray Fortran

subroutine my_put( x, image, value )
    integer :: x[*], image, value
    !dir$ pgas defer_sync
    x[image] = value
    end subroutine

PIPELINE, NOPIPELINE

!DIR$ PIPELINE

!DIR$ NOPIPELINE

Software-based vector pipelining (software vector pipelining) provides additional optimization beyond the normal hardware-based vector pipelining. In software vector pipelining, the compiler analyzes all vector loops and automatically attempts to pipeline a loop if doing so can be expected to produce a significant performance gain. This optimization also performs any necessary loop unrolling.

In some cases the compiler either does not pipeline a loop that could be pipelined or pipelines a loop without producing performance gains. In these situations, use the PIPELINE or NOPIPELINE directive to advise the compiler to pipeline or not pipeline the loop immediately following the directive.

Software vector pipelining is valid only for the innermost loop of a loop nest. These directives are advisory only. While the NOPIPELINE directive can be used to inhibit automatic pipelining, and the PIPELINE directive can be used to attempt to override the compiler’s decision not to pipeline a loop, the compiler cannot be forced to pipeline a loop that cannot be pipelined.

Vector loops that have been pipelined generate compile-time messages to that effect, if optimization messaging is enabled (-h msgs).

PREFERVECTOR

!DIR$ PREFERVECTOR

Directs the compiler to vectorize the loop immediately following the directive if the loop contains more than one loop in the nest that can be vectorized. The directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard.

In the following example, both loops can be vectorized, but the compiler generates vector code for the outer DO I loop:

!DIR$ PREFERVECTOR
      DO I = 1, N
        DO J = 1, M
          A(I) = A(I) + B(J,I)
        END DO
      END DO

PREFETCH

!DIR$ PREFETCH [([line(num)][, level(num)] [, write][, nt])] var[, var] …

PREFETCH is a general directive that instructs the compiler to generate explicit prefetch instructions to load data from memory into cache prior to read or write access.

The PREFETCH directive supports the following options:

lines(num)

Specifies the number of cache lines to be prefetched. num is an expression that evaluates an integer constant at compilation time. By default, the number of cache lines prefetched is 1.

level(num)

Specifies the level of cache into which data is loaded. num is an expression that evaluates an integer constant at compilation time. The cache level defaults to 1, the level closest to the processing unit.

write

Specifies that the prefetch is for data to be written. When data is to be written, a prefetch instruction can move a block into the cache so that the expected store will be to the cache. Prefetch for write generally brings the data into the cache in an exclusive or modified state. By default, the prefetch is for data to be read. If the target architecture does not support prefetch for write, the prefetch will automatically become a prefetch for read.

nt

Specifies that the prefetch is for non-temporal data. By default, the prefetch is for temporal data. Data with temporal locality (persistence), is expected to be accessed multiple times.

var

The memory location to be prefetched, which can be any valid variable, member, or array element reference.

The compiler issues the prefetch instruction when it encounters the PREFETCH directive. The directive allows the user to influence almost every aspect of prefetch behavior. The default behavior prefetches one cache line, into L1 cache, for read access, and assumes temporal locality.

The PREFETCH directive can be used inside and outside of loops, in a loop preamble, or before a function call to reduce cache-miss memory latency.

The compiler will attempt to avoid multiple prefetches to the same cache line, which can be created as a result of optimization.

All variables specified on the same PREFETCH directive line share the same behavior. If different behavior is needed for different variables, use multiple PREFETCH directive lines.

The general PREFETCH directive supersedes the effects of any relevant loop_info [no]prefetch directives and the -h [no]autoprefetch command line option.

The Cray Fortran compiler command line option -x prefetch can be used to disable all general PREFETCH directives in Fortran source code.

Example: PREFETCH directive

real*8 a(m,n), b(n,p), c(m,p), arow(n)
...
do j = 1, p
!dir$ prefetch (lines(3), nt) arow(1),b(1,j)
    do k = 1, n, 4
!dir$ prefetch (nt) arow(k+24),b(k+24,j)
        c(i,j) = c(i,j) + arow(k) * b(k,j)
        c(i,j) = c(i,j) + arow(k+1) * b(k+1,j)
        c(i,j) = c(i,j) + arow(k+2) * b(k+2,j)
        c(i,j) = c(i,j) + arow(k+3) * b(k+3,j)
      enddo
enddo

PREPROCESS

!DIR$ PREPROCESS [expand_macros]

The PREPROCESS directive allows an include file to be preprocessed when the compilation does not specify the preprocessing command line option. This directive does not cause preprocessing of included files, unless they too use the directive. If the preprocessing command line option is used, preprocessing occurs normally for all files.

To use the directive, it must be the first line in the include file and in each included file that needs to be preprocessing.

The PREPROCESS directive supports this option:

expand_macros

The optional expand_macros clause allows the compiler to expand all macros within the include files. Without this clause, macro expansion occurs only within preprocessing directives.

PROBABILITY, PROBABILITY_ALMOST_ALWAYS, PROBABILITY_ALMOST_NEVER

!DIR$ PROBABILITY const

!DIR$ PROBABILITY_ALMOST_ALWAYS

!DIR$ PROBABILITY_ALMOST_NEVER

The probability directives specify information used by interprocedure analysis (IPA) and the optimizer to produce faster code sequences. The specified probability is a hint, rather than a statement of fact. This information is used to guide inlining decisions, branch elimination optimizations, branch hint marking, and the choice of the optimal algorithmic approach to the vectorization of conditional code. These directives can appear anywhere executable code is legal. Each directive applies to the block of code where it appears. It is important to realize that the directive should not be applied to a conditional test directly; rather, it should be used to indicate the relative probability of a THEN or ELSE code block being executed.

The PROBABILITY directive supports one argument.

const

Expression that evaluates to a floating point constant at compilation time. (0.0 <= const <= 1.0.)

Specify almost_never and almost_always by using the probability const values 0.0 and 1.0, respectively.

This example states that the probability of entering the block of code with the assignment statement is 0.3 or 30%. This also means that a[i] is expected to be greater than b[i] 30% of the time. Note that the probability directive appears within the conditional block of code, rather than before it. This removes some of the ambiguity that has plagued other implementations that tie the directive directly to the conditional code.

     IF ( A(I) > B(I) ) THEN
!DIR$ PROBABILITY 0.3
	 A(I) = B(I)
      ENDIF

For vector IF code, a probability of very low (<0.1) or probability_almost_never causes the compiler to use vector gather/scatter methods used for sparse IF vector code instead of the vector merge methods used for denser IF code. For example:

      DO I = 1,N
         IF ( A(I) > 0.0 ) THEN
!DIR$ PROBABILITY_ALMOST_NEVER
	     B(I) = B(I)/A(I) + A(I)/B(I) ! EVALUATE USING 
	 			          ! SPARSE METHODS
	 ENDIF
      ENDDO

Note that the PROBABILITY directive appears within the conditional, rather than before the condition. This removes some of the ambiguity of tying the directive directly to the conditional test.

SAFE_ADDRESS

!DIR$ SAFE_ADDRESS

Scope: Local

Specifies that it is safe to speculatively execute memory references within all conditional branches of a loop; these memory references can be safely executed in each iteration of the loop. For most code, this directive can improve performance significantly by preloading vector expressions. However, most loops do not require this directive to have preloading performed. SAFE_ADDRESS is required only when the safety of the operation cannot be determined or index expressions are very complicated.

The SAFE_ADDRESS directive is an advisory directive. That is, the compiler may override the directive if it determines the directive is not beneficial. If the directive is not used on a loop and the compiler determines that it would benefit from the directive, it issues a message indicating such. The message is similar to this:

DO I = 1,N
FTN-6375 FTN_DRIVER.EXE: VECTOR X7, FILE = 10928.F, LINE = 110
  A LOOP STARTING AT LINE 110 WOULD BENEFIT FROM "!DIR$ SAFE_ADDRESS"

If using the directive on a loop and the compiler determines that it does not benefit from the directive, it issues a message that states the directive is superfluous and can be removed. To see the messages, use the -O msgs option.

Incorrect use of the directive can result in segmentation faults, bus errors, or excessive page faulting. However, it should not result in incorrect answers. Incorrect usage can result in very severe performance degradations or program aborts.

In this example, the compiler will not preload vector expressions, because the value of j is unknown. However, if it is known that references to b (i,j) are safe to evaluate for all iterations of the loop, regardless of the condition, the SAFE_ADDRESS directive can be used. With the directive, the compiler can load b (i,j) with a full vector mask, merge 0.0 where the condition is true, and store the resulting vector using a full mask.

SUBROUTINE X3( A, B, N, M, J )
REAL A(N), B(N,M)
	 			
!DIR$ SAFE_ADDRESS
DO I = 1,64            ! VECTORIZED LOOP
   IF ( A(I).NE.0.0 ) THEN
      B(I,J) = 0.0     ! VALUE OF 'J' IS UNKNOWN
   ENDIF
ENDDO
END

SAFE_CONDITIONAL

!DIR$ SAFE_CONDITIONAL

The SAFE_CONDITIONAL directive specifies that it is safe to execute all memory references and arithmetic operations within all conditional branches of the subsequent scalar or vector loop nest. It can improve performance by allowing the hoisting of invariant expressions from conditional code and by allowing prefetching of memory references.

The SAFE_CONDITIONAL directive is an advisory directive. The compiler may override the directive if it determines the directive is not beneficial.

Incorrect use of the directive can result in segmentation faults, bus errors, excessive page faulting, or arithmetic aborts. However, it should not result in incorrect answers. Incorrect usage can result in severe performance degradations or program aborts.

In the following example, the compiler cannot precompute the invariant expression s1*s2 because these values are unknown and may cause an arithmetic trap if executed unconditionally. However, if the condition is known to be true at least once, then it is safe to use the SAFE_CONDITIONAL directive and execute s1*s2 speculatively. With the directive, the compiler evaluates s1*s2 outside of the loop, rather than under control of the conditional code. In addition, all control flow is removed from the body of the vector loop as s1*s2 no longer poses a safety risk.

SUBROUTINE SAFE_COND( A, N, S1, S2 )
    REAL A(N), S1, S2

!DIR$ SAFE_CONDITIONAL
    DO I = 1,N
        IF ( A(I) /= 0.0 ) THEN
            A(I) = A(I) + S1*S2
        ENDIF
    ENDDO
    END

SAME_TBS

!DIR$ SAME_TBS (array, array[, array])

The SAME_TBS directive informs the compiler that the specified assumed-shape arrays are of the same rank and type, and that they have identical low-bound, extent, and stride multiplier information for corresponding dimensions.

This information allows the compiler to generate more efficient code by reducing the number of potentially distinct intermediate values required for array element accesses. This may offer significant execution performance improvement when using assumed-shape dummy arrays of corresponding type, low-bound, extent, and stride.

The SAME_TBS directive supports this option:

array

Two or more array arguments are required. array is the name of an assumed-shape dummy array. The arrays specified must not have the TARGET attribute. All arrays specified on a single SAME_TBS directive must have same element type, bounds, and strides. Use the -Rd command line option to verify that the arrays have the same element type, bounds, and strides.

The SAME_TBS directive applies only to the program unit in which it appears.

Ordinarily, for multidimensional assumed-shape arrays, some classes of loop optimizations cannot be performed when an assumed-shape dummy array is referenced or defined in a loop or an array assignment statement. The lost optimizations and the additional copy operations performed can significantly reduce the performance of a procedure that uses assumed-shape dummy arrays when compared to an equivalent procedure that uses explicit-shape array dummy arguments. This directive may provide significant performance improvement depending on certain factors such as greater numbers of assumed-shape arrays and smaller array sizes.

STACK

!DIR$ STACK

The STACK directive causes storage to be allocated to the stack in the program unit that contains the directive. This directive overrides the -ev command line option in specific program units of a compilation unit.

Data specified in the specification part of a module or in a DATA statement is always allocated to static storage. This directive has no effect on static storage allocation.

All SAVE statements are honored in program units that also contain a STACK directive. This directive does not override the SAVE statement.

If the compiler finds a STACK directive and a SAVE statement without any objects specified in the same program unit, a warning message is issued.

The following rules apply when using this directive:

  • It must be specified within the scope of a program unit.

  • If it is specified in the specification part of a module, a message is issued. The STACK directive is allowed in the scope of a module procedure.

  • If it is specified within the scope of an interface body, a message is issued.

SUPPRESS

!DIR$ SUPPRESS [var [,var] … ]

Scope: Local and Global

The SUPPRESS directive suppresses scalar optimization for all variables or only for those specified at the point where the directive appears. This often prevents or adversely affects vectorization of any loop that contains SUPPRESS.

The SUPPRESS directive supports an optional comma-delimited list of variables.

var

Variable that is to be stored in memory. If more than one variable is specified, use a comma to separate the variables. If no variables are listed, all variables in the program unit are stored.

At the point at which !DIR$ SUPPRESS appears in the source code, variables in registers are stored to memory (to be read out at their next reference), and expressions containing any of the affected variables are recomputed at their next reference after !DIR$ SUPPRESS. The effect on optimization is equivalent to that of an external subroutine call with an argument list that includes the variables specified by !DIR$ SUPPRESS (or, if no variable list is included, all variables in the program unit).

Example: SUPPRESS directive

Below is an example of the SUPPRESS directive used with an IF statement. The directive takes effect only if it is on an execution path. Optimization proceeds normally if the directive path is not executed because of a GOTO or IF. In this example, optimization replaces the reference to A in the PRINT statement with the constant 1.0, even though !DIR$ SUPPRESS appears between A=1.0 and the PRINT statement. The IF statement can cause the execution path to bypass !DIR$ SUPPRESS. If SUPPRESS appears before the IF statement, A in PRINT * is not replaced by the constant 1.0.

SUBROUTINE SUB (L)
    LOGICAL L
    A = 1.0 ! A is local
    IF (L) THEN
!DIR$ SUPPRESS ! Has no effect if L is false
        CALL ROUTINE()
    ELSE
        PRINT *, A
    END IF
    END

UNROLL, NOUNROLL

!DIR$ UNROLL [n]

!DIR$ NOUNROLL

Scope: Local

The UNROLL directive allows the user to control unrolling for individual loops or to specify no unrolling of a loop. Loop unrolling can improve program performance by revealing cross-iteration memory optimization opportunities such as read-after-write and read-after-read.

The UNROLL directive supports one argument:

n

Where n specifies either no loop unrolling (n = 0 or 1) or the total number of loop body copies to be generated (2 <= n <= 63).

If a value for n is not specified, the compiler will determine the number of copies to generate based on the number of statements in the loop nest.

The NOUNROLL directive disables loop unrolling for the next loop. This is equivalent to specifying UNROLL0 or UNROLL1.

The UNROLL directive can be used only on loops with iteration counts that can be calculated before entering the loop. If UNROLL is specified on a loop that is not the innermost loop in a loop nest, the inner loops must be nested perfectly. That is, all loops in the nest can contain only one loop, and the innermost loop can contain work. Note that the compiler cannot always safely unroll non-innermost loops due to data dependencies. In these cases, this directive is ignored.

The advantages of loop unrolling include:

  • Improved loop scheduling by increasing basic block size

  • Reduced loop overhead

  • Improved chances for cache hits

Example 1: Unroll outer loops

In the following example, assume that the outer loop of the following nest will be unrolled by 2:

!DIR$ UNROLL 2
         DO I = 1, 10
             DO J = 1,100
                     A(J,I) = B(J,I) + 1
             END DO
         END DO

With outer loop unrolling, the compiler produces the following nest, in which the two bodies of the inner loop are adjacent:

DO I = 1, 10, 2
     DO J = 1,100
             A(J,I) = B(J,I) + 1
     END DO
     DO J = 1,100
             A(J,I+1) = B(J,I+1) + 1
     END DO
END DO

The compiler jams, or fuses, the inner two loop bodies together, producing the following nest:

DO I = 1, 10, 2
     DO J = 1,100
             A(J,I) = B(J,I) + 1
             A(J,I+1) = B(J,I+1) + 1
     END DO
END DO

Example 2: Illegal unrolling of outer loops

Outer loop unrolling is not always legal because the transformation can change the semantics of the original program. For example, unrolling the following loop nest on the outer loop would change the program semantics because of the dependency between A(...,I) and A(...,I+1).

!DIR$ UNROLL 2
         DO I = 1, 10
             DO J = 1,100
                A(J,I) = A(J-1,I+1) + 1
             END DO
         END DO

Example 3: Unroll nearest neighbor pattern

The following example shows unrolling with nearest neighbor pattern. This allows register reuse and reduces memory references from 2 per trip to 1.5 per trip.

!DIR$ UNROLL 2
      DO J = 1,N
         DO I = 1,N      ! VECTORIZE
            A(I,J) = B(I,J) + B(I,J+1)
         ENDDO
      ENDDO

The preceding code fragment is converted to the following code:

DO J = 1,N,2       ! UNROLLED FOR REUSE OF B(I,J+1)
   DO I = 1,N      ! VECTORIZED
      A(I,J) = B(I,J) + B(I,J+1)
      A(I,J+1) = B(I,J+1) + B(I,J+2)
   END DO
END DO

VECTOR, NOVECTOR

!DIR$ VECTOR [clause [,clause] … ]

!DIR$ NOVECTOR

The VECTOR and NOVECTOR directives apply only to the next loop.

The NOVECTOR directive suppresses compiler attempts to vectorize loops and array syntax statements. It overrides any other vectorization-related directives, as well as the -O vectorn command line option. This directive is ignored if vectorization or scalar optimization has been disabled.

The VECTOR directive supports the following optional clauses:

ALWAYS

Vectorize the loop that immediately follows the directive. This directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard. This directive has the same effect as the PREFERVECTOR directive.

ALIGNED

Directs the compiler to generate aligned data movement instructions for array references when vectorizing. For current INTEL processors, data alignment is necessary for efficient vectorization. Use with care to improve performance. If some of the access patterns are actually unaligned, using the ALIGNED clause may generate incorrect code. This directive also directs the compiler to ignore explicit and implicit vector dependencies.

UNALIGNED

Directs the compiler to generate unaligned data movement instructions for all array references when vectorizing.

Differences between HPE CCE versions

Prior to CCE 9.0, the NOVECTOR directive applied to the rest of the program unit unless subsequently superseded by a VECTOR directive. The NOVECTOR and VECTOR directives behaved as toggle switches, controlling vectorization for the remainder of the program unit unless superseded by the countervailing directive.

Beginning with CCE 9.0, the VECTOR and NOVECTOR directives apply only to the next loop. The HPE Cray Fortran -h vector_classic command line option is provided in order to provide pre-CCE 9.0 behavior.

WEAK

!DIR$ WEAK procedure_name [,procedure_name] …

!DIR$ WEAK procedure_name=stub_name [,procedure_name1=stub_name1] …

Scope: Global

The WEAK directive specifies an external identifier that may remain unresolved throughout the compilation. The WEAK directive supports the following arguments:

procedure_name

A weak object in the form of a variable or procedure.

stub_name

A stub procedure that exists in the code. The stub_name will be called if a strong reference does not exist for procedure_name. The stub_name procedure must have the same name and dummy argument list as procedure_name.

A weak external does not increase the total memory requirements of a program. The WEAK directive can prevent the compiler driver from adding the binary to a program, resulting in a smaller program and less use of memory.

The first form of the directive allows the declaration of one or more weak references on one line.

The second form allows the assigning of a strong reference to a weak reference.

Declaring an object as a weak external directs the linker to do one of these tasks:

  • Link the object if it is already linked. That is, if a strong reference already exists or is defined by the program, that reference will be used.

  • If a strong reference is specified in the weak directive (second form), and a reference is not defined by the program, then the strong reference is assigned to the weak reference.

  • If no strong reference exists, the object is left as an unsatisfied external. The linker does not display an unsatisfied external message for unresolved weak references.

Note that the linker treats weak externals as unsatisfied externals, so they remain silently unresolved if no strong reference occurs during compilation. Thus, it is the developer’s responsibility to ensure that run time references to weak external names do not occur unless the linker (using some “strong” reference elsewhere) has actually linked the entry point in question.

The attributes that weak externals must have depend on the form of the weak directive used:

  • First form, weak externals must be declared, but not defined or initialized, in the source file.

  • Second form, weak externals may be declared, but not defined or initialized, in the source file.

  • Either form, weak externals cannot be declared with a static storage class.

Source Preprocessing

Source preprocessing helps port a program from one platform to another by allowing source text to be specified that is platform specific.

For a source file to be preprocessed automatically, it must have an uppercase extension, either .F or .FOR (for a file in fixed source form), or .F90, F95,.F03, .F08, .F18, or .FTN (for a file in free source form). To specify preprocessing of source files with other extensions, including lowercase ones, use the -eP or -eZ options described in Command Line Options.

General Rules

Alter the source code through source preprocessing directives. These directives are fully explained below in Directives. The directives must be used according to the following rules:

  • Do not use source preprocessor (#) directives within multiline compiler directives (CDIR$, !DIR$, C$OMP, or !$OMP).

  • A source file that contains an #if directive cannot be included without a balancing #endif directive within the same file.

  • The #if directive includes the #ifdef and #ifndef directives.

  • If a directive is too long for one source line, the backslash character () is used to continue the directive on successive lines. Successive lines of the directive can begin in any column.

  • The backslash character () can appear in any location within a directive in which white space can occur. A backslash character () in a comment is treated as a comment character. It is not recognized as signaling continuation.

  • Every directive begins with the pound character (#), and the pound character (#) must be in column 1.

  • Blank and tab (HT) characters can appear between the pound character (#) and the directive keyword.

  • Form feed (FF) or vertical tab (VT) characters cannot be written to separate tokens on a directive line. That is, a source preprocessing line must be continued, by using a backslash character (), if it spans source lines.

  • Blanks are significant, so the use of spaces within a source preprocessing directive is independent of the source form of the file. The fields of a source preprocessing directive must be separated by blank or tab (HT) characters.

  • Any user-specified identifier that is used in a directive must follow Fortran rules for identifier formation. The exceptions to this rule are as follows:

    • The first character in a source preprocessing name (a macro name) can be an underscore character (_).

    • Source preprocessing names are significant in their first 132 characters whereas a typical Fortran identifier is significant only in its first 63 characters.

  • Source preprocessing identifier names are case sensitive.

  • Numeric literal constants must be integer literal constants or real literal constants, as defined for Fortran.

  • Comments written in the style of the C language, beginning with /* and ending with */, can appear anywhere within a source preprocessing directive in which blanks or tabs can appear. The comment, however, must begin and end on a single source line.

  • Directive syntax allows an identifier to contain the ! character. Therefore, placing the ! character to start a Fortran comment on the same line as the directive should be avoided.

Directives

The blanks shown in the syntax descriptions of the source preprocessing directives are significant. The tab character (HT) can be used in place of a blank. Multiple blanks can appear wherever a single blank appears in a syntax description.

  • #include Directive

    The #include directive directs the system to use the content of a file. Just as with the INCLUDE line path processing defined by the Fortran standard, an #include directive effectively replaces that directive line by the content of filename. This directive has the following formats:

    #include "filename"
    #include <filename>
    

    filename

     A file or directory to be used.
    
     In the first form, if filename does not begin with a slash (/) character, the system searches for the named file, first in the directory of the file containing the `#include` directive, then in the sequence of directories specified by the `-I` option(s) on the `ftn` command line, and then the standard (default) sequence. If filename begins with a slash (/) character, it is used as is and is assumed to be the full path to the file.
    
     The second form directs the search to begin in the sequence of directories specified by the `-I` option(s) on the `ftn` command line and then search the standard (default) sequence.
    

    The Fortran standard prohibits recursion in INCLUDE files, so recursion is also prohibited in the #include form.

    The #include directives can be nested.

    When the compiler is invoked to do only source preprocessing, not compilation, text will be included by #include directives but not by Fortran INCLUDE lines.

  • #define Directive

    The #define directive declares a variable name and assigns a value to the variable. It also allows the definition of a function-like macro. This directive has the following format:

    #define identifier value
    #define identifier (dummy_arg_list) value
    

    The first format defines an object-like macro (also called a source preprocessing variable), and the second defines a function-like macro. In the second format, the left parenthesis that begins the dummy_arg_list must immediately follow the identifier, with no intervening white space.

    identifier

     The name of the variable or macro being defined.
    
     Rules for Fortran variable names apply; that is, the name cannot have a leading underscore character (_). For example, `ORIG` is a valid name, but `_ORIG` is invalid.
    

    dummy_arg_list

     A list of dummy argument identifiers.
    

    value

     The value is a sequence of tokens. The value can be continued onto more than one line using backslash (\) characters.
    

    If a preprocessor identifier appears in a subsequent #define directive without being the subject of an intervening #undef directive, and the value in the second #define directive is different from the value in the first #define directive, then the preprocessor issues a warning message about the redefinition. The second directive’s value is used.

    When an object-like macro’s identifier is encountered as a token in the source file, it is replaced with the value specified in the macro’s definition. This is referred to as an invocation of the macro.

    The invocation of a function-like macro is more complicated. It consists of the macro’s identifier, immediately followed by a left parenthesis with no intervening white space, then a list of actual arguments separated by commas, and finally a terminating right parenthesis. There must be the same number of actual arguments in the invocation as there are dummy arguments in the #define directive. Each actual argument must be balanced in terms of any internal parentheses. The invocation is replaced with the value given in the macro’s definition, with each occurrence of any dummy argument in the definition replaced with the corresponding actual argument in the invocation.

    For example, the following program prints Hello, world. when compiled and run:

         PROGRAM P
    #define GREETING 'Hello, world.'
          PRINT *, GREETING
          END PROGRAM P
    

    The following program prints Hello, world. when compiled and run:

         PROGRAM P
    #define GREETING(str1, str2) str1, str1, str2
          PRINT *, GREETING('Hello, ', 'world.')
          END PROGRAM P
    
  • #undef Directive

    The #undef directive sets the definition state of identifier to an undefined value. If identifier is not currently defined, the #undef directive has no effect. This directive has the following format:

    #undef identifier

    identifier

     The name of the variable or macro being defined.
    
  • # (Null) Directive

    The null directive simply consists of the pound character (#) in column 1 with no significant characters following it. That is, the remainder of the line is typically blank or is a source preprocessing comment. This directive is generally used for spacing out other directive lines.

  • Conditional Directives

    Conditional directives cause lines of code to either be produced by the source preprocessor or to be skipped. The conditional directives within a source file form if-groups. An if-group begins with an #if, #ifdef, or #ifndef directive, followed by lines of source code that may or may not be skipped. Several similarities exist between the Fortran IF construct and if-groups:

    • The #elif directive corresponds to the ELSE IF statement.

    • The #else directive corresponds to the ELSE statement.

    • Just as an IF construct must be terminated with an END IF statement, an if-group must be terminated with an #endif directive.

    • Just as with an IF construct, any of the blocks of source statements in an if-group can be empty. For example, the following directives can be written:

      #if MIN_VALUE == 1
      #else
        ...
      #endif
      

    Determining which group of source lines (if any) to compile in an if-group is essentially the same as the Fortran determination of which block of an IF construct should be executed.

  • #if Directive

    The #if directive has the following format:

    #if expression

    expression

     An expression. The values in expression must be integer literal constants or previously defined preprocessor variables. The expression is an integer constant expression as defined by the C language standard. All the operators in the expression are C operators, not Fortran operators. The expression is evaluated according to C language rules, not Fortran expression evaluation rules.
    
     Note that unlike the Fortran `IF` construct and `IF` statement logical expressions, expression in an `#if` directive need not be enclosed in parentheses.
    

    The #if expression can also contain the unary defined operator, which can be used in either of the following formats:

    • defined identifier

    • defined (identifier)

    When the defined subexpression is evaluated, the value is 1 if identifier is currently defined, and 0 if it is not.

    All currently defined source preprocessing variables in expression, except those that are operands of defined unary operators, are replaced with their values. During this evaluation, all source preprocessing variables that are undefined evaluate to 0.

    Note that the following two directives are not equivalent:

    • #if X

    • #if defined(X) In the first case, the condition is true if X has a nonzero value. In the second case, the condition is true only if X has been defined (has been given a value that could be 0).

  • #ifdef Directive

    The #ifdef directive is used to determine if identifier is predefined by the source preprocessor, has been named in a #define directive, or has been named in a ftn -D command line option. This directive has the following format:

    #ifdef identifier

    The #ifdef directive is equivalent to either of the following two directives:

    • #if defined identifier

    • #if defined (identifier)

  • #ifndef Directive

    The #ifndef directive tests for the presence of an identifier that is not defined. This directive has the following format:

    #ifndef identifier

    This directive is equivalent to either of the following two directives:

    • #if !defined identifier

    • #if !defined (identifier)

  • #elif Directive

    The #elif directive serves the same purpose in an if-group as does the ELSE IF statement of a Fortran IF construct. This directive has the following format:

    #elif expression

    expression

    The expression follows all the rules of the integer constant expression in an #if directive.

  • #else Directive

    The #else directive serves the same purpose in an if-group as does the ELSE statement of a Fortran IF construct. This directive has the following format:

    #else

  • #endif Directive

    The #endif directive serves the same purpose in an if-group as does the END IF statement of a Fortran IF construct. This directive has the following format:

    #endif

Predefined Macros

The HPE Cray Fortran compiler source preprocessing supports a number of predefined macros. They are divided into groups as follows:

  • Macros based on the host machine:

    Macro

    Description

    unix, __unix, unix

    Always defined. (The leading characters in the second form consist of 2 consecutive underscores; the third form consists of 2 leading and 2 trailing underscores.)

  • Macros based on CLE system targets:

    Macro

    Description

    _ADDR64

    Defined for CLE systems as targets. The target system must have 64-bit address registers.

    _MAXVL_8, _MAXVL_16, _MAXVL_32, _MAXVL_64, _MAXVL_128

    MAXVL (Maximum Vector Length) is defined by dividing the size of the widest hardware vector register by the number of bits in the data type. For x86 targets supporting AVX512, the values are 64, 32, 16, 8, and 4. For x86 targets not supporting AVX512, the values are 32, 16, 8, 4, and 2. For ARM targets, the values vary according to the maximum hardware vector length supported by the system.

  • Macros based on the HPE Cray Fortran compiler:

    Macro

    Description

    _CRAYFTN

    Defined as 1.

    _CRAY_COARRAY

    Defined as 1 if -hcaf is specified on the command line. If -hnocaf is specified, this macro is undefined.

    _OPENMP

    Defined as the publication date of the OpenMP standard supported, as a string of the form yyyymm.

    _RELEASE_MAJOR

    Defined as the major release level of the compiler.

    _RELEASE_MINOR

    Defined as the minor release level of the compiler.

    _RELEASE_PATCHLEVEL

    Represents the patch level of the compiler (the third field in the version string).

    _RELEASE_STRING

    Defined as a string that describes the version of the compiler.

  • Macros based on the source file:

    Macro

    Description

    line, __ LINE__

    Defined to be the line number of the current source line in the source file.

    file, FILE

    Defined to be the name of the current source file.

    date, DATE

    Defined to be the current date in the form mm/dd/yy.

    time, TIME

    Defined to be the current in the form hh:mm:ss.

The following predefined macros are based on the source file:

Command Line Options

The following ftn command line options affect source preprocessing. See the crayftn(1) man page for more information about these options.

  • The -D identifier=value option defines variables used for source preprocessing.

  • The -dF option controls macro expansion in Fortran source statements.

  • The -eP option performs source preprocessing on file.f90, file.F90, file.F95, file.F03, file.F08, file.F18, file.ftn, or file.FTN but does not compile. The -eP option produces file.i.

  • The -eZ option performs source preprocessing and compilation on file.f90,file.F90, file.F95, file.F03, file.F08, file.F18, file.ftn, or file.FTN. The -eZ option produces file.i.

  • The -U identifier, identifier … option undefines variables used for source preprocessing. For more information about this option, see Fortran Command-line Options.

    The -D identifier =value and -U identifier, identifier… options are ignored unless one of the following is true:

    • The Fortran input source file is specified as either file.f90, file.F90, file.F95, file.F03, file.F08, file.F18 file.ftn, or file.FTN.

    • The -eP or -eZ options have been specified.

OpenMP Overview

The OpenMP API provides a parallel programming model that is portable across shared memory architectures from HPE and other vendors. The OpenMP specification is accessible at https://www.openmp.org. OpenMP is disabled by default in HPE CCE and must be explicitly enabled using the -homp or -fopenmp option.

Supported Version

CCE supports full OpenMP 5.0 and partial OpenMP 5.1 and 5.2. The following OpenMP 5.1 features are supported:

  • masked construct without filter clause (Fortran)

  • metadirective dynamic user condition and target_device selectors (Fortran)

  • assume and assumes directives (Fortran)

  • nothing directive (Fortran)

The following OpenMP 5.2 features are supported:

  • otherwise clause for metadirective (Fortran)

Compiling

OpenMP is disabled at default and must be explicitly enabled. These HPE CCE options affect OpenMP applications:

  • -h [no]omp

  • -f openmp (synonym for -h omp)

  • -h threadn

Executing

For OpenMP applications, use both the OMP_NUM_THREADS environment variable to specify the number of threads and the aprun -ddepth option to specify the number of CPUs hosting the threads. The number of threads specified by OMP_NUM_THREADS should not exceed the number of cores in the CPU. If neither the OMP_NUM_THREADS environment variable nor the omp_set_num_threads call is used to set the number of OpenMP threads, the system defaults to 1 thread.

Debugging

The -g option is compatible with the -homp option, and together the options provide debugging support for OpenMP directives. The -g option, when specified with no optimization options or with -O0, provides debugging support identical to specifying the -G0 option. If any optimization is specified, -g is ignored.

OpenMP Implementation Defined Behavior

The OpenMP Application Program Interface Specification, presents a list of implementation defined behaviors. The HPE implementation is described in the following sections.

Atomicity of memory access by multiple threads

When multiple threads access the same shared memory location and at least one thread is a write, threads should be ordered by explicit synchronization to avoid data race conditions and the potential for non-deterministic results. Always use explicit synchronization for any access smaller than one byte.

Internal Control Variables (ICVs)

ICV

Initial Value

Note

nthreads-var

1

dyn-var

TRUE

Behaves according to Algorithm 2-1 of the specification.

run-sched-var

static

stacksize-var

128 MB

wait-policy-var

ACTIVE

thread-limit-var

64

Threads may be dynamically created up to an upper limit which is 4 times the number of cores/node. It is up to the programmer to try to limit oversubscription.

max-active-levels-var

4095

def-sched-var

static

The chunksize is rounded up to improve alignment for vectorized loops.

Dynamic Adjustment of Threads

The ICV dyn-var is enabled by default. Threads may be dynamically created up to an upper limit which is 4 times the number of cores/node. It is up to the programmer to try to limit oversubscription.

If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads specified for the parallel region exceeds the number that the runtime system can supply, the program terminates. The number of physical processors actually hosting the threads at any given time is fixed at program startup and is specified by the aprun -d depth option. The OMP_NESTED environment variable and the omp_set_nested() call control nested parallelism. To enable nesting, set OMP_NESTED to true or use the omp_set_nested() call. Nesting is disabled by default.

Directives and Clauses

  • atomic directive

    • When supported by the target architecture, atomic directives are lowered into hardware atomic instructions. Otherwise, atomicity is guaranteed with a lock. OpenMP atomic directives are compatible with C11 and C++11 atomic operations, as well as GNU atomic builtins.

  • for directive

    • For the schedule(guided,chunk) clause, the size of the initial chunk for the master thread and other team members is approximately equal to the trip count divided by the number of threads.

    • For the schedule(runtime) clause, the schedule type and, optionally, chunk size can be chosen at runtime by setting the OMP_SCHEDULE environment variable. If this environment variable is not set, the default behavior of the schedule(runtime) clause is as if the schedule(static) clause appeared instead.

    • In the absence of the schedule clause, the default schedule is static and the default chunk size is approximately the number of iterations divided by the number of threads.

    • The integer type or kind used to compute the iteration count of a collapsed loop are signed 64-bit integers, regardless of how the original induction variables and loop bounds are defined. If the schedule specified by the runtime schedule clause is specified and run-sched-var is auto, then the HPE implementation generates a static schedule.

  • do and parallel do directives

    • For the schedule(guided,chunk) clause, the size of the initial chunk for the master thread and other team members is approximately equal to the trip count divided by the number of threads.

    • For the schedule(runtime) clause, the schedule type and, optionally, chunk size can be chosen at runtime by setting the OMP_SCHEDULE environment variable. If this environment variable is not set, the default behavior of the schedule(runtime) clause is as if the schedule(static) clause appeared instead.

    • In the absence of the schedule clause, the default schedule is static and the default chunk size is approximately the number of iterations divided by the number of threads.

    • The integer type or kind used to compute the iteration count of a collapsed loop are signed 64-bit integers, regardless of how the original induction variables and loop bounds are defined. If the schedule specified by the runtime schedule clause is specified and run-sched-var is auto, then the HPE implementation generates a static schedule.

  • parallel directive

    • If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads specified for the parallel region exceeds the number that the runtime system can supply, the program terminates.

    • The number of physical processors actually hosting the threads at any given time is fixed at program startup and is specified by the aprun -d depth option.

    • The OMP_NESTED environment variable and the omp_set_nested() call control nested parallelism. To enable nesting, set OMP_NESTED to true or use the omp_set_nested() call. Nesting is disabled by default.

  • private clause

    • If a variable is declared as private, the variable is referenced in the definition of a statement function, and the statement function is used within the lexical extent of the directive construct, then the statement function references the private version of the variable.

  • sections construct

    • Multiple structured blocks within a single sections construct are scheduled in lexical order and an individual block is assigned to the first thread that reaches it. It is possible for a different thread to execute each section block, or for a single thread to execute multiple section blocks. There is not a guaranteed order of execution of structured blocks within a section.

  • single directive

    • A single block is assigned to the first thread in the team to reach the block; this thread may or may not be the master thread.

  • threadprivate directive

    • The threadprivate directive specifies that variables are replicated, with each thread having its own copy. If the dynamic threads mechanism is enabled, the definition and association status of a thread’s copy of the variable is undefined, and the allocation status of an allocatable array is undefined.

  • thread_limit clause

    • The thread_limit clause places a limit on the number of threads that a team construct may create. For non-host GPU accelerator targets, this clause controls the number of CUDA threads per thread block. Only constant integer expressions are supported. If HPE CCE does not support a thread_limit expression, then it will issue a warning message indicating the default value that will be used instead.

Library Routines

  • omp_get_max_active_levels()

    • The omp_get_max_active_levels() routine returns the maximum number of nested parallel levels currently allowed. There is a single max-active-levels-var internal control variable for the entire runtime system. Thus, a call to omp_get_max_active_levels() will bind to all threads, regardless of which thread calls it.

  • omp_get_nested()

    • The deprecated omp_get_nested() routine returns whether nested parallelism is enabled or disabled, according to the value of the max-active-levels-var internal control variable. The default is false.

  • omp_set_dynamic()

    • The omp_set_dynamic() routine enables or disables dynamic adjustment of the number of threads available for the execution of subsequent parallel regions by setting the value of the dyn-var internal control variable. The default is on.

  • omp_set_max_active_levels()

    • Sets the max-active-levels-var internal control variable. Defaults to 64. If then argument is less than 1, then set to 1.

  • omp_set_nested()

    • The deprecated omp_set_nested() routine enables or disables nested parallelism, by setting the max-active-levels-var internal control variable. The default is false.

  • omp_set_num_threads()

    • Sets the nthreads-var internal control variable to a positive integer. If the argument is less than 1, then sets nthreads-var to 1.

  • omp_set_schedule()

    • Sets the schedule type as defined by the current specification. There are no implementation-defined schedule types.

  • omp_set_num_threads

    • Sets nthreads-var to a positive integer. If the argument is < 1, then set nthreads-var to 1.

  • omp_set_schedule

    • Sets the schedule type as defined by the current specification. There are no implementation defined schedule types.

Runtime Library Definitions

It is implementation-defined and determines whether the include file omp_lib.h or the module omp_lib (or both) is provided. It is implementation-defined whether any of the OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different KIND type can be Fortran accommodated. Both omp_lib.h and the module omp_lib are provided. HPE Cray Fortran uses generic interfaces for routines. If an OMP runtime library routine is defined to be generic, use of arguments of kind other than those specified by OMP_*_KIND constants is undefined.

Environment Variables

CRAY_OMP_CHECK_AFFINITY

This environment variable is superseded by OMP_DISPLAY_AFFINITY. HPE recommends that users use OMP_DISPLAY_AFFINITY instead of this environment variable.

CRAY_OMP_CHECK_AFFINITY is a run time environment variable. Set it to TRUE to display affinity binding for each OpenMP thread. The messages contain the hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.

OMP_DISPLAY_AFFINITY

This is a runtime environment variable. Set it to TRUE to display formatted affinity binding for each OpenMP thread. The default format includes the hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding. The format can be changed using the OMP_AFFINITY_FORMAT environment variable, which is documented in the OpenMP 55.0 API Syntax Reference Guide.

OMP_DYNAMIC

The default value is true.

OMP_MAX_ACTIVE_LEVELS

The default value is 64.

OMP_NESTED

This environment variable is deprecated. Use OMP_MAX_ACTIVE_LEVELS instead.

OMP_NUM_THREADS

If this environment variable is not set and you do not use the omp_set_num_threads() routine to set the number of OpenMP threads, the default is to the maximum number of available CPUs on the system.

The maximum number of threads per compute node is 4 times the number of allocated processors. If the requested value of OMP_NUM_THREADS is more than the number of threads an implementation can support, the behavior of the program depends on the value of the OMP_DYNAMIC environment variable. If OMP_DYNAMIC is false, the program terminates. If OMP_DYNAMIC is true, it uses up to 4 times the number of allocated processors.

OMP_PROC_BIND

When set to false, the OpenMP runtime does not attempt to set or change affinity binding for OpenMP threads. When not false, this environment variable controls the policy for binding threads to places. Care must be taken when using OpenMP affinity binding with other binding mechanisms. For example, when launching an application with ALPS aprun, the -cc cpu affinity binding option (the default) should only be used with OMP_PROC_BIND=false or OMP_PROC_BIND=auto, otherwise, the ALPS/CLE binding will severely over-constrain OpenMP binding. When setting OMP_PROC_BIND to a value other than false or auto, applications should be launched with -cc depth or -cc none. Using -cc depth is particularly important when running multiple PEs per compute node, since it will allow each PE to bind to CPUs in non-overlapping subsets of the node. Valid values for this environment variable are true, false, auto, or a comma-separated list of spread, close, and master. A value of true is mapped to spread.

The default value for OMP_PROC_BIND is auto, an HPE-specific extension. The auto binding policy directs the OpenMP runtime library to select the affinity binding setting that it determines to be most appropriate for a given situation. If there is only a single place in the place-partition-var ICV, and that place corresponds to the initial affinity mask of the master thread, then the auto binding policy maps to false (i.e., binding is disabled). Otherwise, the auto binding policy causes threads to bind in a manner that partitions the available places across OpenMP threads

OMP_PLACES

This environment variable has no effect if OMP_PROC_BIND=false; when OMP_PROC_BIND is not false, then OMP_PLACES defines a set of places, or CPU affinity masks, to which threads are bound. When using the threads, cores, and sockets keywords, places are constructed according to the CPU topology presented by Linux. However, the place list is always constrained by the initial affinity mask of the master thread. As a result, specific numeric CPU identifiers appearing in OMP_PLACES will map onto CPUs in the initial CPU affinity mask. If an application is launched with -cc none, then numeric CPU identifiers will exactly match Linux CPU numbers. If instead it is launched with -cc depth, then numeric CPU identifier 0 will map to the first CPU in the initial affinity mask for the master thread; identifier 1 will map to the second CPU in the initial mask, and so on. This allows the same OMP_PLACES environment variable for all PEs to be used, even when launching multiple PEs per node; the -cc depth setting ensures that each PE begins executing with a non-overlapping initial affinity mask, allowing each instance of the OpenMP runtime to assign thread affinity within those non-overlapping affinity masks.

The default value of OMP_PLACES depends on the value of OMP_PROC_BIND. If OMP_PROC_BIND is auto, then the default value for OMP_PLACES is cores. Otherwise, the default value of OMP_PLACES is threads.

OMP_SCHEDULE

The default value for this environment variable is static. For the schedule(runtime) clause, the schedule type and, optionally, chunk size can be chosen at run time by setting the OMP_SCHEDULE environment variable.

OMP_STACKSIZE

The default value is 128 MB.

OMP_THREAD_LIMIT

Sets the number of OpenMP threads to use for the entire OpenMP program by setting the thread-limit-var ICV. The HPE implementation defaults to 4 times the number of available processors.

OMP_WAIT_POLICY

Provides a hint to an OpenMP implementation about the desired behavior of waiting threads by setting the wait-policy-var ICV. Possible values are ACTIVE and PASSIVE, as defined by the OpenMP specification, and AUTO, an HPE-specific extension. The default value for this environment variable is AUTO, which direct the OpenMP runtime library to select the most appropriate wait policy for the situation. In general, the AUTO policy behaves like ACTIVE, unless the number of OpenMP threads or affinity binding results in over subscription of the available hardware processors. If over subscription is detected, the AUTO policy behaves like PASSIVE

HPE-specific OpenMP API

This section describes OpenMP API specific to HPE.

cray_omp_set_wait_policy

subroutine cray_omp_set_wait_policy ( policy )
          character(*), intent(in) :: policy
void cray_omp_set_wait_policy( const char *policy );

This routine allows dynamic modification of the wait-policy-var ICV value, which corresponds to the OMP_WAIT_POLICY environment variable. The policy argument provides a hint to the OpenMP runtime library environment about the desired behavior of waiting threads; acceptable values are AUTO, ACTIVE, or PASSIVE (case insensitive). It is an error to call this routine in an active parallel region. The OpenMP runtime library supports a “wait policy” and a “contention policy,” both of which can be set with the following environment variables:

OMP_WAIT_POLICY=(AUTO|ACTIVE|PASSIVE)
            CRAY_OMP_CONTENTION_POLICY=(Automatic|Standard|MonitorMwait)

These environment variables allow the policies to be set once at program launch for the entire execution. However, in some circumstances it would be useful for the programmer to explicitly change the policy at various points during a program’s execution. This HPE-specific routine allows the programmer to dynamically change the wait policy (and potentially the contention policy). This addresses the situation when an application needs OpenMP for the first part of program execution, but there is a clear point after which OpenMP is no longer used. Unfortunately, the idle OpenMP threads still consume resources since they are waiting for more work, resulting in performance degradation for the remainder of the application. A passive-waiting policy might eliminate the performance degradation after OpenMP is no longer needed, but the developer may still want an active-waiting policy for the OpenMP-intensive region of the application. This routine notifies all threads of the policy change at the same time, regardless of whether they are idle or active (to avoid deadlock from waiting and signaling threads using different policies).

CRAY_OMP_CHECK_AFFINITY

This environment variable is superseded by OMP_DISPLAY_AFFINITY. HPE recommends that users use OMP_DISPLAY_AFFINITY instead of this environment variable.

omp_lib

If the omp_lib module is not used and the kind of the actual argument does not match the kind of the dummy argument, the behavior of the procedure is undefined.

omp_get_wtime omp_get_wtick

These procedures return real(kind=8) values instead of double precision values.

Optimizations

A certain amount of overhead is associated with multiprocessing a loop. If the work occurring in the loop is small, the loop can actually run slower by multiprocessing than by single processing. To avoid this, make the amount of work inside the multiprocessed region as large as possible, as is shown in the following examples.

Consider the following code:

DO K = 1, N
          DO I = 1, N
          DO J = 1, N
          A(I,J) = A(I,J) + B(I,K) * C(K,J)
          END DO
          END DO
          END DO

For the preceding code fragment, parallelize the J loop or the I loop. The K loop cannot be parallelized because different iterations of the K loop read and write the same values of A(I,J). Try to parallelize the outermost DO loop if possible, because it encloses the most work. In this example, that is the I loop. For this example, use the technique calledloop interchange. Although the parallelizable loops are not the outermost ones, the loops can be reordered to make one of them outermost.

Thus, loop interchange would produce the following code fragment:

!$OMP PARALLEL DO PRIVATE(I, J, K)
          DO I = 1, N
          DO K = 1, N
          DO J = 1, N
          A(I,J) = A(I,J) + B(I,K) * C(K,J)
          END DO
          END DO
          END DO

Now the parallelizable loop encloses more work and shows better performance. In practice, relatively few loops can be reordered in this way. However, it does occasionally happen that several loops in a nest of loops are candidates for parallelization. In such a case, it is usually best to parallelize the outermost one.

Occasionally, the only loop available to be parallelized has a fairly small amount of work. It may be worthwhile to force certain loops to run without parallelism or to select between a parallel version and a serial version, on the basis of the length of the loop.

The loop is worth parallelizing if N is sufficiently large. To overcome the parallel loop overhead, N needs to be around 1000, depending on the specific hardware and the context of the program. The optimized version would use an IF clause on the PARALLEL DO directive:

!$OMP PARALLEL DO IF (N .GE. 1000), PRIVATE(I)
          DO I = 1, N
          A(I) = A(I) + X*B(I)
          END DO

OpenACC Use

HPE CCE supports full OpenACC 2.0 and partial OpenACC 2.6 for Fortran (OpenACC is not supported for C or C++). The following OpenACC 2.6 features are supported:

  • attach/detach behavior and clauses

  • default(present) clause

  • Implied present-or behavior for copy, copyin, copyout, and create data clauses

OpenACC directives are supported for offloading to NVIDIA GPUs, AMD GPUs, or the current CPU target. An appropriate accelerator target module must be loaded in order to use OpenACC directives.

OpenACC is a parallel programming model that facilitates the use of an accelerator device attached to a host CPU. The OpenACC API allows the programmer to supplement information available to the compilers in order to offload code from a host CPU to an attached accelerator device.

This release supports the OpenACC Application Programming Interface standard developed by PGI, Cray Inc., and NVIDIA, with support from CAPS enterprise. For further information, refer to http://www.openacc-standard.org.

For the most current information regarding the HPE implementation of OpenACC, see the intro_openacc(7) man page. See the OpenACC.EXAMPLES(7) man page for example OpenACC codes.

OpenACC Execution Model

The CPU host offloads compute intensive regions to the accelerator device. The accelerator executes parallel regions, which contain work sharing loops executed as kernels on the accelerator. The CPU host manages execution on the accelerator by allocating memory on the accelerator, initiating data transfer, sending code, passing arguments to the region, waiting for completion, transferring accelerator results back to the CPU host and releasing memory.

The accelerator on the HPE system supports multiple levels of parallelism. The accelerator executes a kernel composed of parallel threads or vectors. Vectors (threads) are grouped into sets called workers. Threads in a set of workers are scheduled together and execute together. Workers are grouped into larger sets called gangs. One or more gangs may comprise a kernel. To summarize, a kernel is executed as a set of gangs of workers of vectors.

The compiler determines the number of gangs/workers/vectors based on the problem and then maps the vectors, workers, and gangs onto the accelerator architecture. Specifying the number of gangs, workers, or vectors is optional but may permit tuning to a particular target architecture. The way that the compiler maps a particular problem onto a constellation of gangs, workers, and vectors which are then mapped onto the accelerator architecture is implementation defined.

OpenACC terminology is situated in the context of the PGAS programming model. In the PGAS model, there may be one or more Processing Elements (PEs) per node. Each PE is multi-threaded and each thread can execute vector instructions. The PGAS thread concept is not the same as the OpenACC thread concept.

OpenACC Memory Model

The memory on the accelerator is separate from host memory. Accelerator device memory is not mapped onto the host’s virtual memory space. All data movement between host and accelerator memory is initiated by the host through the library functions that move data. Also, it is not assumed that the accelerator can access host memory, though it is supported by some devices. In this model, data movement between memories is managed by the compiler according to OpenACC directives. The programmer needs to be aware of device memory size, as well as memory bandwidth between host and device in order to effectively accelerate a region of code.

Current accelerators implement a weak memory model; they do not support memory coherence between operations executed by different execution units - an execution unit is a hardware abstraction which can execute one or more gangs. If an operation updates a memory location and another reads from the same location, or two operations store a value to the same location, the hardware may not guarantee repeatable results. Some potential errors of this type are prevented by the compiler, but it is possible to write an accelerator parallel region that produces inconsistent results. Memory coherence is guaranteed when memory operations referencing the same location are separated by an explicit barrier.

Map the OpenACC Programming Model onto Accelerator Components

The compiler maps the OpenACC execution model (kernels, gangs, workers, vectors) onto the accelerator architecture as described in the following sections.

Stream Multiprocessors (SM) and Scalar Processor (SP) cores

The OpenACC execution model maps to the NVIDIA GPU hardware as follows (GPU terms are in parenthesis): One or more OpenACC kernels may execute on an GPU. The compiler divides a kernel into one or more gangs (blocks) of vectors (threads). Several concurrent gangs (blocks) of threads may execute on one SM depending on several factors, including memory requirements, compiler optimizations, or user directives. A single block (gang) does not span SMs and will remain on one SM until completion. When the SM encounters a block (gang), each gang (block) is further broken up into workers (warps) which are groups of threads to execute in parallel. Scheduling occurs at the granularity of the worker (warp). Individual threads within a warp start together and execute one common instruction at a time. If conditional branching occurs within a worker (warp), the warp serially executes each branch path taken causing some threads to wait until threads converge back to the same instruction. Data dependent conditional code within a warp usually has negative performance impact. Worker (warp) threads also fetch data from memory together and when accessing global memory, the accesses of the threads within a warp are grouped to minimize transactions. Each thread in a worker (warp) is executed on a different SP core.

There may be up to 32 threads in a worker (warp) - a limit defined by the hardware.

See the intro_openacc(7) man page for more detail on Partition Mapping.

Memory

There is a hierarchy of memory spaces used by OpenACC threads. Each thread has its own private local memory. Each gang of workers of threads has shared memory visible to all threads of the gang. All OpenACC threads running on a GPU have access to the same global memory. Global memory on the accelerator is accessible to the host CPU.

Mixed Model Support

OpenMP directives may appear inside of OpenACC data or host data regions only. OpenMP directives are not allowed inside of any other OpenACC directives.

OpenACC may not appear inside OpenMP directives. To have OpenACC directives nested inside of OpenMP constructs, place them in calls that are not inlined.

Compile with OpenACC

The HPE CCE compiler recognizes OpenACC directives, by default. Use either the ftn or cc command to compile.

The HPE CCE compiler does not produce CUDA code. It generates PTX (Parallel Thread Execution) instructions which are then translated into assembly.

Note the following interactions between directives and command line options.

  • -x

    The -x option accepts one or more directives as arguments. Directives specified with the -x option are ignored during compilation. To ignore all directives, specify -x all. To ignore accelerator directives, specify -x acc.

  • -h [no]acc

    -h noacc disables OpenACC directives.

  • -h acc_model=option [:option ...]

    Explicitly controls the execution and memory model utilized by the accelerator support system. The option arguments identify the type of behavior desired. There are three option sets. Only one member of a set may be used at a time; however, all three sets may be used together.

    Default: auto_async_kernel:fast_addr:no_deep_copy

    option Set 1:

    auto_async_none

    Execute kernels and updates synchronously, unless there is an async clause present on the kernels or update directive.

    auto_async_kernel

    (Default) Execute all kernels asynchronously ensuring program order is maintained.

    auto_async_all

    Execute all kernels and data transfers asynchronously, ensuring program order is maintained.

  • option Set 2:

    no_fast_addr

    Use default types for addressing.

    fast_addr

    (Default) Attempt to use 32 bit integers in all addressing to improve performance. Base addresses remain as 64 bit. The performance is improved by potentially using fewer registers and faster arithmetic for offset calculations. This optimization may result in incorrect behavior for codes that make use within accelerator regions of any of the following: very large arrays (offsets would require greater than 32 bits), very large array lower bounds (max offset plus lower bound is greater than 32 bits), bitfields/other bit operations.

  • option Set 3:

    no_deep_copy

    Do not look inside of an object type to transfer sub-objects. Allocatable members of derived type objects will not be allocated on the device.

    deep_copy : Look inside of derived type objects and recreate the derived type on the accelerator recursively. A derived type object that contains an allocatable member will have memory allocated on the device for the member.

Module Support

To compile, ensure that PrgEnv-cray module is loaded. Then, load either the craype-accel-nvidia35 module for Kepler support or the craype-accel-nvidia60 module for Pascal support.

The craype-accel-host module supports compiling and running an OpenACC application on the host X86 processor. This provides source code portability between systems with and without an accelerator. The accelerator directives are automatically converted at compile time to OpenMP equivalent directives.

Use either the ftn or cc command to compile.

Debug

Use either Allinea DDT or Rogue Wave TotalView.

The following applies to all debuggers:

  • To enable debugging, compile use the -g option.

  • When compiling with the debug option (-g), HPE CCE may require additional memory from the accelerator heap, exceeding the 8MB default. In this case, there will be malloc failures during compilation. The environment variable CRAY_ACC_MALLOC_HEAPSIZE specifies the accelerator heap size in bytes. It may be necessary to increase the accelerator heap size to 32MB (33554432), 64MB (67108864), or greater by setting CRAY_ACC_MALLOC_HEAPSIZE accordingly. The accelerator heap size defaults to 8MB.

  • Debug one rank/image/thread/PE per node.

  • HPE CCE does not generate CUDA code, but generates PTX code. Debuggers will not display CUDA intermediate code.

  • To enter an OpenACC region using a debugger, breakpoints may be set inside the OpenACC region. It is not possible to do a single step into the region from the code immediately prior to the start of an OpenACC directive.

OpenACC Directives

For information on the OpenACC directives, see the OpenACC 2.0 Specification available at http://www.openacc-standard.org.

For the most current information regarding the HPE implementation of OpenACC, see the intro_openacc(7) man page. See the OpenACC.EXAMPLES(7) man page for example OpenACC codes.

Runtime Routines

Runtime routines defined by the standard specification are supported unless otherwise noted in the intro_openacc(7) man page.

Extended OpenACC Run Time Library Routines

Extended OpenACC run time library routines are HPE-specific low level routines that give object oriented programmers a mechanism for moving objects from the host CPU to the accelerator and copying memory between the host and the accelerator. These routines are implemented in C. See the intro_openacc(7) man page.

HPE Cray Fortran provides a wrapper interface to the C routines using ISO C bindings. To use these routine bindings from Fortran, include the header file openacc_lib.h or use the openacc_lib module. Please see the example “Using_OPENACC_LIB” on the OpenACC.EXAMPLES(7) man page.

Environment Variables

The following are environment variables are defined by the API specification:

  • ACC_DEVICE_NUM

  • ACC_DEVICE_TYPE

The following environment variable is HPE specific:

  • CRAY_ACC_MALLOC_HEAPSIZE

    Specifies the accelerator heap size in bytes. The accelerator heap size defaults to 8MB. When compiling with the debug option (-g), HPE CCE may require additional memory from the accelerator heap, exceeding the 8MB default. In this case, there will be malloc failures during compilation. It may be necessary to increase the accelerator heap size to 32MB (33554432), 64MB (67108864), or greater.

OpenACC Examples

See the OpenACC.EXAMPLES(7) man page for example OpenACC codes.

Conformance Checks

The amount of error-checking of edit descriptors with input/output (I/O) list items during formatted READ and WRITE statements can be selected through a compiler driver option or through an environment variable.

By default, the compiler provides only limited error-checking.

Use the compiler driver options to choose the table to be used for the conformance check. The table is then part of the executable and no environment variable is required. The compiler driver options allow a choice of checking or no checking with a particular version of the Fortran standard for formatted READ and WRITE. See the following tables: RELAXED Compatibility Between Data Types and Data Edit Descriptors, STRICT77 Compatibility Between Data Types and Data Edit Descriptors, and STRICT90 and STRICT95 Compatibility Between Data Types and Data Edit Descriptors in Input/Output Editing.

The environment variable FORMAT_TYPE_CHECKING is evaluated during execution. The environment variable overrides a table chosen through the compiler driver option. The environment variable provides an intermediate type of checking that is not provided by the compiler driver option. The environment variable FORMAT_TYPE_CHECKING is described in Set Environment Variables to the HPE Cray Fortran Compiler.

To select the least amount of checking, use one or more of the following ftn command line options.

  • On Cray Linux Environment (CLE) systems with formatted READ, use:

    ftn -W1,--defsym,_RCHK=_RNOCHK *.f(note the double dashes that precede defsym)
    
  • On CLE systems with formatted WRITE, use:

    ftn -W1,--defsym,_WCHK=_WNOCHK *.f
    
  • On CLE systems with both formatted READ and WRITE, use:

    ftn -W1,--defsym,_WCHK=_WNOCHK -W1,--defsym,_RCHK=_RNOCHK *.f
    

    To select strict amount of checking for either FORTRAN 77 or Fortran 90, use one or more of the following ftn command line options.

  • On CLE systems with formatted READ, use:

    ftn -W1,--defsym,_RCHK=_RCHK77 *.f
    ftn -W1,--defsym,_RCHK=_RCHK90 *.f
    
  • On CLE systems with formatted WRITE, use:

    ftn -W1,--defsym,_WCHK=_WCHK77 *.f
    ftn -W1,--defsym,_WCHK=_WCHK90 *.f
    
  • On CLE systems with both formatted READ and WRITE, use:

    ftn -W1,--defsym,_WCHK=_WCHK77 -W1,--defsym,_RCHK=_RCHK77 *.f
    ftn -W1,--defsym,_WCHK=_WCHK90 -W1,--defsym,_RCHK=_RCHK90 *.f
    

HPE Cray Fortran Language Extensions

The HPE Cray Fortran Compiler supports extended features beyond those specified by the current standard. Some of these extensions are widely implemented in other compilers and likely to become standard features in the future, while others are unique and specific to HPE Cray systems. The implementation of any extension may change in order to conform to future language standards.

The listings provided by the compiler identify language extensions when the -e n command line option is specified.

128-Bit Precision

The Fortran Compiler supports 128-bit floating point and 256-bit complex predefined types using the X86-64 ABI definitions for type names and data layout. These types are sometimes referred to as “quad-precision”. In Fortran, use real(kind=16) and complex(kind=16) to declare variables of these types. In C and C++, use __float128, and __float128 complex.

Fortran and C forms of intrinsic math functions (for example, QSIN, QCOS, QTAN, QSQRT, sinq, cosq, tanq) offer full support for quad-precision types. See the intro_quad_precision(3i) man page for a complete list of intrinsic functions that support quad-precision.

The base type itself uses 128 bits of storage with a guaranteed minimum alignment on a 128-bit boundary, little endian, has a 15-bit exponent, a 113-bit mantissa, and an exponent bias of 16383, and is compatible with the gcc implementation.

Characters, Lexical Tokens, and Source Form

Characters Allowed in Names

Variables, named constants, program units, common blocks, procedures, arguments, constructs, derived types (types for structures), namelist groups, structure components, dummy arguments, and function results are among the elements in a program that have a name. As extensions, the Cray Fortran compiler permits the following characters in names:

alphanumeric_character

currency_symbol

currency_symbol

$

A name must begin with a letter and can consist of letters, digits, and underscores. The Cray Fortran compiler permits use of the dollar sign ($) in a name, but it cannot be the first character of a name.

Cray does not recommend using $ in user names because it can cause conflicts with the names of internal variables or library routines.

Switch Source Forms

The Cray Fortran compiler allows switching between fixed and free source forms within a source or include file by using the FIXED and FREE compiler directives.

Continuation Line Limit

The Cray Fortran compiler allows a statement to have an unlimited number of continuation lines. In free source form, the Cray Fortran compiler allows a statement to have an unlimited number of continuation lines.

Statement and Line length

In free source form, the Cray Fortran compiler allows up to 10,000 characters per line, with a total statement length up to 1,000,000 characters.

D Lines in Fixed Source Form

The Cray Fortran compiler allows a D or d character to occur in column one in fixed source form. Typically, the compiler treats a line with a D or d character in column one as a comment line. When the -e d command line option is in effect, however, the compiler replaces the D or d character with a blank and treats the rest of the line as a source statement. This can be used, for example, for debugging purposes if the rest of the line contains a PRINT statement.

This functionality is controlled through the -e d and -d d options on the compiler command line. For more information about these options, see the ftn(1) man page.

Types

The Cray Fortran compiler supports the following additional data types. This preserves compatibility with other vendor’s systems.

  • Cray pointer

  • Cray character pointer

  • Boolean (or typeless)

    The Cray Fortran compiler also supports the TYPEALIAS statement as a means of creating alternate names for existing types and supports an expanded form of the ENUM statement.

Alternate Form of LOGICAL Constants

The Cray Fortran compiler no longer accepts .T. and .F. as alternate forms of .true. and .false., respectively.

Cray Pointer Type

The Cray POINTER statement declares one variable to be a Cray pointer (that is, to have the Cray pointer data type) and another variable to be its pointee. The value of the Cray pointer is the address of the pointee. This POINTER statement has the following format:

POINTER (pointer_name,  pointee_name  (array_spec) )
      , (pointer_name, pointee_name  (array_spec) )  ...

pointer_name

Pointer to the corresponding pointee_name. pointer_name contains the address of pointee_name. Only a scalar variable can be declared type Cray pointer; constants, arrays, coarrays, statement functions, and external functions cannot.

pointee_name

Pointee of corresponding pointer_name. Must be a variable name, array declarator, or array name. The value of pointer_name is used as the address for any reference to pointee_name; therefore, pointee_name is not assigned storage. If pointee_name is an array declarator, it can be explicit-shape (with either constant or nonconstant bounds) or assumed-size.

array_spec

If present, this must be either an explicit_shape_spec_list (with either constant or nonconstant bounds) or an assumed_size_spec. A codimension used to indicate a coarray may not appear in array_spec.

Fortran pointers are declared as follows:

POINTER ::  object-name-list

Cray Fortran pointers and Fortran standard pointers cannot be mixed.

Example:

POINTER(P,B),(Q,C)

This statement declares Cray pointer P and its pointee B, and Cray pointer Q and pointee C; the pointer’s current value is used as the address of the pointee whenever the pointee is referenced.

An array that is named as a pointee in a Cray POINTER statement is a pointee array. Its array declarator can appear in a separate type or DIMENSION statement or in the pointer list itself. In a subprogram, the dimension declarator can contain references to variables in a common block or to dummy arguments. As with nonconstant bound array arguments to subprograms, the size of each dimension is evaluated on entrance to the subprogram, not when the pointee is referenced. For example:

POINTER(IX, X(N,0:M))

In addition, pointees must not be deferred-shape or assumed-shape arrays. An assumed-size pointee array is not allowed in a main program unit.

Pointers can be used to access user-managed storage by dynamically associating variables and arrays to particular locations in a block of storage. Cray pointers do not provide convenient manipulation of linked lists because, for optimization purposes, it is assumed that no two pointers have the same value. Cray pointers also allow the accessing of absolute memory locations.

The range of a Cray pointer or Cray character pointer depends on the size of memory for the machine in use.

Restrictions on Cray pointers are as follows:

  • A Cray pointer variable should only be used to alias memory locations by using the LOC intrinsic.

  • A Cray pointer cannot be pointed to by another Cray or Fortran pointer; that is, a Cray pointer cannot also be a pointee or a target.

  • A Cray pointer cannot appear in a PARAMETER statement or in a type declaration statement that includes the PARAMETER attribute.

  • A Cray pointer variable cannot be declared to be of any other data type.

  • A Cray character pointer cannot appear in a DATA statement.

  • An array of Cray pointers is not allowed.

  • A Cray pointer cannot be a component of a structure.

Restrictions on Cray pointees are as follows:

  • A Cray pointee cannot appear in a SAVE, STATIC, DATA, EQUIVALENCE, COMMON, AUTOMATIC, or PARAMETER statement or Fortran pointer statement.

  • A Cray pointee cannot be a dummy argument; that is, it cannot appear in a FUNCTION, SUBROUTINE, or ENTRY statement.

  • A function value cannot be a Cray pointee.

  • A Cray pointee cannot be a structure component.

  • An equivalence object cannot be a Cray pointee.

    Cray pointees can be of type character, but their Cray pointers are different from other Cray pointers; the two kinds cannot be mixed in the same expression.

    The Cray pointer is a variable of type Cray pointer and can appear in a COMMON list or be a dummy argument in a subprogram.

    The Cray pointee does not have an address until the value of the Cray pointer is defined; the pointee is stored starting at the location specified by the pointer. Any change in the value of a Cray pointer causes subsequent references to the corresponding pointee to refer to the new location.

Cray pointers can be assigned values in the following ways:

  • A Cray pointer can be set as an absolute address. For example:

    Q = 0
    
  • Cray pointers can have integer expressions added to or subtracted from them and can be assigned to or from integer variables. For example:

    P = Q + 100
    

    However, Cray pointers are not integers. For example, assigning a Cray pointer to a real variable is not allowed.

    The (nonstandard) LOC intrinsic function generates the address of a variable and can be used to define a Cray pointer, as follows:

    P = LOC(X)
    

    The following example uses Cray pointers in the ways just described:

    SUBROUTINE SUB(N)
     INTEGER WORDS
     COMMON POOL(100000), WORDS(1000)
     INTEGER BLK(128), WORD64
     REAL A(1000), B(N), C(100000-N-1000)
     POINTER(PBLK,BLK), (IA,A), (IB,B), &
           (IC,C), (ADDRESS,WORD64)
     ADDRESS = LOC(WORDS) + 64*KIND(WORDS)
     PBLK = LOC(WORDS)
     IA = LOC(POOL)
     IB = IA + 1000*KIND(POOL)
     IC = IB + N*KIND(POOL)
    

    BLK is an array that is another name for the first 128 words of array WORDS. A is an array of length 1000; it is another name for the first 1000 elements of POOL. B follows A and is of length N. C follows B. A, B, and C are associated with POOL. WORD64 is the same as BLK(65) because BLK(1) is at the initial address of WORDS.

    If a pointee is of a noncharacter data type that is one machine word or longer, the address stored in a pointer is a word address. If the pointee is of type character or of a data type that is less than one word, the address is a byte address. The following example also uses Cray pointers:

    PROGRAM TEST
    REAL X(*), Y(*), Z(*), A(10)
    POINTER (P_X,X)
    POINTER (P_Y,Y)
    POINTER (P_Z,Z)
    INTEGER*8 I,J
            
    !USE LOC INTRINSIC TO SET POINTER MEMORY LOCATIONS
    !*** RECOMMENDED USAGE, AS PORTABLE CRAY POINTERS ***
    P_X = LOC(A(1))
    P_Y = LOC(A(2))
            
    !USE POINTER ARITHMETIC TO DEMONSTRATE COMPILER AND COMPILER
    !FLAG DIFFERENCES
    !*** USAGE NOT RECOMMENDED, HIGHLY NON-PORTABLE ***
    P_Z = P_X + 1
            
    I = P_Y
    J = P_Z
            
    IF ( I .EQ. J ) THEN
       PRINT *, 'NOT A BYTE-ADDRESSABLE MACHINE'
    ELSE
       PRINT *, 'BYTE-ADDRESSABLE MACHINE'
    ENDIF
            
    END
    

    On Cray systems, this prints the following:

    Byte-addressable machine
    

    Cray does not recommend the use of pointer arithmetic because it is not portable.

    For purposes of optimization, the compiler assumes that the storage of a pointee is never overlaid on the storage of another variable; that is, it assumes that a pointee is not associated with another variable or array. This kind of association occurs when a Cray pointer has two pointees, or when two Cray pointers are given the same value. Although these practices are sometimes used deliberately (such as for equivalencing arrays), results can differ depending on whether optimization is turned on or off. The code developer is responsible for preventing such association. For example:

    POINTER(P,B), (P,C)
    REAL X, B, C
    P = LOC(X)
    B = 1.0
    C = 2.0
    PRINT *, B
    

    Because B and C have the same pointer, the assignment of 2.0 to C gives the same value to B; therefore, B will print as 2.0 even though it was assigned 1.0.

    As with a variable in common storage, a pointee, pointer, or argument to a LOC intrinsic function is stored in memory before a call to an external procedure and is read out of memory at its next reference. The variable is also stored before a RETURN or END statement of a subprogram.

Cray Character Pointer Type

If a pointee is declared as a character type, its Cray pointer is a Cray character pointer.

Restrictions for Cray pointers also apply to Cray character pointers. In addition, the following restrictions apply:

  • When included in an I/O statement iolist, a Cray character pointer is treated as an integer.

  • If the length of the pointee is explicitly declared (that is, not of an assumed length), any reference to that pointee uses the explicitly declared length.

  • If a pointee is declared with an assumed length (that is, as CHARACTER(*)), the length of the pointee comes from the associated Cray character pointer.

  • A Cray character pointer can be used in a relational operation only with another Cray character pointer. Such an operation applies only to the character address and bit offset; the length field is not used.

Boolean Type

A Boolean constant represents the literal constant of a single storage unit. There are no Boolean variables or arrays, and there is no Boolean type statement. Binary, octal, and hexadecimal constants are used to represent Boolean values. For more information about Boolean expressions, see Expressions and Assignment.

Alternate Form of ENUM Statement

An enumeration defines the name of a group of related values and the name of each value within the group. The Cray Fortran compiler allows the following additional form for enum_def (enumerations):

enum_def forms

enum_def_stmt

is

ENUM, ,BIND(C) :: type_alias_name

or

ENUM kind_selector :: type_alias_name

  • kind_selector. If it is not specified, the compiler uses the default integer kind.

  • type_alias_name is the name to assign to the group. This name is treated as a type alias name.

TYPEALIAS Statement

A TYPEALIAS statement allows another name to be defined for an intrinsic data type or user-defined data type. Thus, the type alias and the type specification it aliases are interchangeable. Type aliases do not define a new type.

This is the form for type aliases:

TYPEALIAS forms

type_alias_stmt

is

TYPEALIAS :: type_alias_list

type_alias

is

type_alias_name => type_spec

This example shows how a type alias can define another name for an intrinsic type, a user-defined type, and another type alias:

TYPEALIAS :: INTEGER_64 => INTEGER(KIND = 8), &
             TYPE_ALIAS => TYPE(USER_DERIVED_TYPE), &
             ALIAS_OF_TYPE_ALIAS => TYPE(TYPE_ALIAS)
      
INTEGER(KIND = 8) :: I
TYPE(INTEGER_64) :: X, Y
TYPE(TYPE_ALIAS) :: S
TYPE(ALIAS_OF_TYPE_ALIAS) :: T

A type alias or the data type it aliases can be used interchangeably. That is, explicit or implicit declarations that use a type alias have the same effect as if the data type being aliased was used. For example, the above declarations of I, X, and Y are the same. Also, S and T are the same.

If the type being aliased is a derived type, the type alias name can be used to declare a structure constructor for the type.

The following are allowed as the type_spec in a TYPEALIAS statement:

  • Any intrinsic type defined by the Cray Fortran compiler.

  • Any type alias in the same scoping unit.

  • Any derived type in the same scoping unit.

Data Object Declarations and Specifications

The Cray Fortran compiler accepts the following extensions to declarations. The maximum rank is equal to 31. The standard requires a maximum rank of 15.

BOZ Constraints in DATA Statements

The Cray Fortran compiler permits a default real object to be initialized with a BOZ, typeless, or character (used as Hollerith) constant in a DATA statement. BOZ constants are formatted in binary, octal, or hexadecimal. No conversion of the BOZ value, typeless value, or character constant takes place.

The Cray Fortran compiler permits an integer object to be initialized with a BOZ, typeless, or character (used as Hollerith) constant in a type declaration statement. The Cray Fortran compiler also allows an integer object to be initialized with a typeless or character (used as Hollerith) constant in a DATA statement.

If the last item in the data_object_list is an array name, the value list can contain fewer values than the number of elements in the array. Any element that is not assigned a value is undefined.

The following alternate forms of BOZ constants are supported:

literal-constant

is

typeless-constant

typeless-constant

is

octal-typeless-constant

octal-typeless-constant

is

digit digit… B

or

” digit digit… “O

or

‘digit digit… ‘O

hexadecimal-typeless-constant

is

X’ hex-digit hex-digit… ‘

or

X” hex-digit hex-digit… “

or

‘ hex-digit hex-digit… ‘X

or

” hex-digit hex-digit… “X

AUTOMATIC Attribute and Statement

The Cray Fortran AUTOMATIC attribute specifies stack-based storage for a variable or array. Such variables and arrays are undefined upon entering and exiting the procedure. The following is the format for the AUTOMATIC specification:

type, AUTOMATIC  , attribute-list [::] entity-list

automatic-stmt

is

AUTOMATIC [::]entity-list

entity-list

For entity-list, specify a variable name or an array declarator. If an entity-list item is an array, it must be declared with an explicit-shape-spec with constant bounds. If an entity-list item is a pointer, it must be declared with a deferred-shape-spec.

If an entity-list item has the same name as the function in which it is declared, the entity-list item must be scalar and of type integer, real, logical, complex, or double precision.

If the entity-list item is a pointer, the AUTOMATIC attribute applies to the pointer itself and not to any target that may become associated with the pointer.

Subject to the rules governing combinations of attributes, attribute-list can contain the following:

  • DIMENSION

  • TARGET

  • POINTER

  • VOLATILE

The following entities cannot have the AUTOMATIC attribute:

  • Pointers or arrays used as function results

  • Dummy arguments

  • Statement functions

  • Automatic array or character data objects

An entity-list item cannot have the following characteristics:

  • It cannot be defined in the scoping unit of a module.

  • It cannot be a common block item.

  • It cannot be specified more than once within the same scoping unit.

  • It cannot be initialized with a DATA statement or with a type declaration statement.

  • It cannot also have the SAVE or STATIC attribute.

  • It cannot be specified as a Cray pointee.

IMPLICIT Statement

Implicit Extensions

The Cray Fortran compiler accepts the IMPLICIT AUTOMATIC or IMPLICIT STATIC syntax. It is recommended that none of the IMPLICIT extensions be used in new code.

Storage Association of Data Objects

EQUIVALENCE Statement Extensions

The Cray Fortran compiler allows equivalencing of character data with noncharacter data. The Fortran standard does not address this. It is recommended that equivalencing is not performed in this manner, however, because alignment and padding differs across platforms, thus rendering the code less portable.

COMMON Statement Extensions

The Cray Fortran compiler treats named common blocks and blank common blocks identically, as follows:

  • Variables in blank common and variables in named common blocks can be initialized.

  • Named common blocks and blank common are always saved.

  • Named common blocks of the same name and blank common can be of different sizes in different scoping units.

Expressions and Assignment

Expressions

In Fortran, calculations are specified by writing expressions. Expressions look much like algebraic formulas in mathematics, particularly when the expressions involve calculations on numerical values.

Expressions often involve nonnumeric values, such as character strings, logical values, or structures; these also can be considered to be formulas that involve nonnumeric quantities rather than numeric ones.

Rules for Forming Expressions

The Cray Fortran compiler supports exclusive disjunct expressions of the form:

exclusive-disjunct-expr

is

exclusive-disjunct-expr

.XOR.

inclusive-disjunct-expr

Intrinsic and Defined Operations

Cray supports the following intrinsic operators as extensions:

less_greater_op

is

.LG.

or

<>

exclusive_disjunct_op

is

.XOR.

The Cray Fortran less than or greater than intrinsic operation is represented by the <> operator and the .LG. keyword. This operation is suggested by the IEEE standard for floating-point arithmetic, and the Cray Fortran compiler supports this operator. Only values of type real can appear on either side of the <> or .LG. operators. If the operands are not of the same kind type value, the compiler converts them to equivalent kind types. The <> and .LG. operators perform a less-than-or-greater-than operation as specified in the IEEE standard for floating-point arithmetic.

The Cray Fortran compiler no longer allows abbreviations for logical and masking operators. The abbreviations .A., .O., .N., and .X. are no longer synonyms for .AND., .OR., .NOT., and .XOR., respectively. This change does not affect user-defined operators and operator overloads; therefore, users can create user-defined operators to behave as shorthand operators.

The masking of Boolean operators and their abbreviations, which are extensions to Fortran, can be redefined as defined operators. If a masking operator is redefined, the definition overrides the intrinsic masking operator definition. See Bitwise Logical Expressions for a list of the operators.

Intrinsic Operations

In the following table, the symbols I, R, Z, C, L, B, and P stand for the types integer, real, complex, character, logical, Boolean, and Cray pointer, respectively. Where more than one type for x2 is given, the type of the result of the operation is given in the same relative position in the next column. Boolean and Cray pointer types are extensions of the Fortran standard.

Intrinsic operator

Type of x1

Type of x2

Type of result

Unary +, -

I, R, Z, B, P

I, R, Z, I, P

Binary +, -, *, /, **

I

I, R, Z, B, P

I, R, Z, I, P

R

I, R, Z, B

R, R, Z, R

Z

I, R, Z

Z, Z, Z

B

I, R, B, P

I, R, B, P

P

I, B, P

P, P, P

(For Cray pointer, only + and - are allowed.)

//

C

C

C

.EQ., ==, .NE., /=

I

I, R, Z, B, P

L, L, L, L, L

R

I, R, Z, B, P

L, L, L, L, L

Z

I, R, Z, B, P

L, L, L, L, L

B

I, R, Z, B, P

L, L, L, L, L

P

I, R, Z, B, P

L, L, L, L, L

C

C

L

.GT., >, .GE., >=, .LT., <, .LE., <=

I

I, R, B, P

L, L, L, L

R

I, R, B

L, L, L

C

C

L

P

I, P

L, L

.LG., <>

R

R

L

.NOT.

L

L

I, R, B

B

.AND., .OR., .EQV., .NEQV., .XOR.

L

L

L

I, R, B

I, R, B

B

Arithmetic operator (the binary +, -, *, /, and **) followed by Unary operator (+ or -, before the second operand) is allowed. This is an extension to the Fortran standard.

The operators .NOT., .AND., .OR., .EQV., and .XOR. can also be used in the Cray Fortran compiler’s bitwise masking expressions; these are extensions to the Fortran standard. The result is Boolean (or typeless) and has no kind type parameters.

Bitwise Logical Expressions

A bitwise logical expression (also called a masking expression) is an expression in which a logical operator operates on individual bits within integer, real, Cray pointer, or Boolean operands, giving a result of type Boolean. Each operand is treated as a single storage unit. The result is a single storage unit, which is either 32 or 64 bits depending on the -s option specified during compilation. Boolean values and bitwise logical expressions use the same operators but are different from logical values and expressions.

Operator category

Intrinsic operator

Operand types

Bitwise masking (Boolean) expressions

.NOT., .AND., .OR., .XOR., .EQV., .NEQV.

Integer, real, typeless, or Cray pointer.

Bitwise logical operators can also be written as functions; for example A .AND. B can be written as IAND(A,B) and .NOT. A can be written as NOT(A).

x1 x2

Integer

Real

Boolean

Pointer

Logical

Character

Integer

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Not valid

Not valid**

Real

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Not valid

Not valid**

Boolean

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Not valid

Not valid**

Pointer

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Masking, operation, Boolean result.

Not valid

Not valid**

Logical

Not valid**

Not valid**

Not valid**

Not valid**

Logical operation, logical results

Not valid**

Character

Not valid**

Not valid**

Not valid**

Not valid**

Not valid

Not valid**

  • x1 and x2 represent operands for a logical or bitwise expression, using operators .NOT., .AND., .OR., .XOR., .NEQV., and .EQV..

** Indicates that if the operand is a character operand of 32 or fewer characters, the operand is treated as a Hollerith constant and is allowed.

Bitwise logical expressions can be combined with expressions of Boolean or other types by using arithmetic, relational, and logical operators. Evaluation of an arithmetic or relational operator processes a bitwise logical expression with no type conversion. Boolean data is never automatically converted to another type.

A bitwise logical expression performs the indicated logical operation separately on each bit. The interpretation of individual bits in bitwise multiplication-exprs, summation-exprs, and general expressions is the same as for logical expressions. The results of binary 1 and 0 correspond to the logical results TRUE and FALSE, respectively, in each of the bit positions. These values are summarized as follows:

.NOT. 1100            1100           1100           1100           1100
     =0011      .AND. 1010      .OR. 1010     .XOR. 1010     .EQV. 1010
                      ----           ----           ----           ----
                      1000           1110           0110           1001

Assignment

The Cray Fortran compiler supports Boolean and Cray pointer intrinsic assignments. The Cray Fortran compiler supports type Boolean or BOZ constants in assignment statements in which the variable is of type integer or real. The bits specified by the constant are moved into the variable with no type conversion.

Array Reference

The Cray Fortran compiler allows arrays to be referenced with fewer than the declared number of dimensions. The subscripts specified in the array reference are used for the leftmost dimensions, and the lower bounds are used for the rightmost subscripts that were omitted. This extension to the Fortran standard applies to both arrays and coarrays.

When the option to note deviations from the Fortran standard is in effect (-en), this type of an array reference will cause compilation messages.

Input/Output Statements

The Fortran standard does not specifically describe the implementation of I/O processing. This section provides information about processor-dependent areas and the implementation of the support for I/O.

File Connection

OPEN Statement

The OPEN statement specifies the connection properties between the file and the unit. The Values for Keyword Specifier Variables in an OPEN Statement table indicates the keyword specifiers in an OPEN statement that are Cray Fortran compiler extensions.

Specifier

Possible Values

Default Value

FORM

SYSTEM

Unformatted with no records marks

CONVERT

LITTLE_ENDIAN, BIG_ENDIAN, CRAY, NATIVE

NATIVE

The FORM specifier has the following format:FORM= scalar-char-expr

A file opened with FORM=”SYSTEM” is unformatted and has no record marks.

The CONVERT specifier converts unformatted data between BIG- and LITTLE-ENDIAN representation. Overrides any numeric conversion specified via assign or by compilation option.

The CONVERT specifier has the following format:CONVERT=”format-specifier” format-specifier describes the format of the file being opened and it only applies for that single file. It may be one of the following strings:

  • LITTLE_ENDIAN

  • Specifies little endian integer data and IEEE floating-point data. Has no effect except to override any numeric conversion specified via assign statement or by compilation option.

  • BIG_ENDIAN

  • Specifies big endian integer data and IEEE floating-point data. This has the same effect as specifying -hbyteswapio on the compilation, but it applies on a per file basis. The assign -Nswap_endian f:filename command also converts the named file to BIG_ENDIAN format.

  • CRAY

  • Indicates BIG_ENDIAN integer data and Cray floating point data of size REAL(8) or COMPLEX(8). It has the same effect as the assign command: assign -Ncray f:filename.

  • NATIVE

  • Default. Same effect as “LITTLE_ENDIAN”.

Error, End-of-record, and End-of-file Conditions

End-of-file Condition and the END-specifier

Multiple End-of-file Records

The file position prior to data transfer depends on the method of access: sequential or direct. Although the Fortran standard does not allow files that contain an end-of-file to be positioned after the end-of-file prior to data transfer, the Cray Fortran compiler permits more than one end-of-file for some file structures.

Input/Output Editing

Data Edit Descriptors

Integer Editing

The Cray Fortran compiler allows w to be zero for the G edit descriptor, and it permits w to be omitted for the I, B, O, Z, or G edit descriptors.

The Cray Fortran compiler allows signed binary, octal, or hexadecimal values as input.

If the minimum digits (m) field is specified, the default field width is increased, if necessary, to allow for that minimum width.

Real Editing

The Cray Fortran compiler allows the use of B, O, and Z edit descriptors of REAL data items. The Cray Fortran compiler accepts the Dw.dEe edit descriptor.

The Cray Fortran compiler accepts the ZERO_WIDTH_PRECISION environment variable, which can be used to modify the default size of the width w field. This environment variable is examined only upon program startup. Changing the value of the environment variable during program execution has no effect. For more information about the ZERO_WIDTH_PRECISION environment, see ZERO_WIDTH_PRECISION.

The Cray Fortran compiler allows w to be zero or omitted for the D, E, EN, ES, or G edit descriptors.

The Cray Fortran compiler does not restrict the use of Ew.d and Dw.d to an exponent less than or equal to 999. The Ew.dEe form must be used.

Default Fractional and Exponent Digits

Data Size and Representation

w

d

e

4-byte (32-bit) IEEE

17

9

2

8-byte (64-bit) IEEE

26

17

3

Logical Editing

The Cray Fortran compiler allows w to be zero or omitted on the L or G edit descriptors.

Character Editing

The Cray Fortran compiler allows w to be zero or omitted on the G edit descriptor.

Q Control Edit Descriptor

The Cray Fortran supports the Q edit descriptor. The Q edit descriptor is used to determine the number of characters remaining in the input record. It has the following format:Q

When a Q edit descriptor is encountered during execution of an input statement, the corresponding input list item must be of type integer. Interpretation of the Q edit descriptor causes the input list item to be defined with a value that represents the number of characters remaining to be read in the formatted record.

For example, if c is the character position within the current record of the next character to be read, and the record consists of n characters, then the item is defined with the following value MAX(n-c+1,0).

If no characters have yet been read, then the item is defined as n (the length of the record). If all the characters of the record have been read (c>n), then the item is defined as zero.

The Q edit descriptor must not be encountered during the execution of an output statement.

The following example code uses Q on input:

INTEGER N
CHARACTER LINE * 80
READ (*, FMT='(Q,A)') N, LINE(1:N)

List-directed Formatting

Input values are generally accepted as list-directed input if they are the same as those required for explicit formatting with an edit descriptor. The exceptions are as follows:

  • When the data list item is of type integer, the constant must be of a form suitable for the I edit descriptor. The Cray Fortran compiler permits binary, octal, and hexadecimal based values in a list-directed input record to correspond to I edit descriptors.

Namelist Formatting Extensions

The Cray Fortran compiler has extended the namelist feature. The following additional rules govern namelist processing:

  • An ampersand (&) or dollar sign ($) can precede the namelist group name or terminate namelist group input. If an ampersand precedes the namelist group name, either the slash (/) or the ampersand must terminate the namelist group input. If the dollar sign precedes the namelist group name, either the slash or the dollar sign must terminate the namelist group input.

  • Octal and hexadecimal constants are allowed as input to integer and single-precision real namelist group items. An error is generated if octal and hexadecimal constants are specified as input to character, complex, or double-precision real namelist group items.

  • Octal constants must be of the following form:

    • O”123”

    • O’123’

    • o”123”

    • o’123’

  • Hexadecimal constants must be of the following form:

    • Z”1a3”

    • Z’1a3’

    • z”1a3”

    • z’1a3’

I/O Editing

Usually, data is stored in memory as the values of variables in some binary form. On the other hand, formatted data records in a file consist of characters. Thus, when data is read from a formatted record, it must be converted from characters to the internal representation. When data is written to a formatted record, it must be converted from the internal representation into a string of characters.

The tables below list the control and data edit descriptor extensions supported by the Cray Fortran compiler and provide a brief description of each.

Summary of Control Edit Descriptors

Descriptor

Description

$ or \

Suppress carriage control

Summary of Data Edit Descriptors

Descriptor

Description

Q

Return number of characters left in record

The following tables show the use of the Cray Fortran compiler’s edit descriptors with all intrinsic data types. In these tables:

  • NA indicates invalid usage that is not allowed.

  • I,O indicates that usage is allowed for both input and output.

  • I indicates legal usage for input only.

  • NA indicates invalid usage that is not allowed.

  • I,O indicates that usage is allowed for both input and output.

  • I indicates legal usage for input only.

Default Compatibility Between I/O List Data Types and Data Edit Descriptors

Data types

Q

Z

R

O

L

I

G

F

ES

EN

E

D

B

A

Int

I

I,O

I,O

I,O

NA

I,O

I,O

NA

NA

NA

NA

NA

I,O

I,O

Real

NA

I,O

I,O

I,O

NA

NA

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

Comp

NA

I,O

I,O

I,O

NA

NA

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

Log

NA

I,O

I,O

I,O

I,O

NA

I,O

NA

NA

NA

NA

NA

I,O

I,O

Char

NA

NA

NA

NA

NA

NA

I,O

NA

NA

NA

NA

NA

NA

I,O

The table below, RELAXED Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to RELAXED. Not all data edit descriptors support all data sizes; for example, a 16-byte real variable with an I edit descriptor cannot be read/write.

RELAXED Compatibility Between Data Types and Data Edit Descriptors

Data types

Q

Z

R

O

L

I

G

F

ES

EN

E

D

B

A

Int

I

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

NA

I,O

I,O

Real

NA

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

Comp

NA

I,O

I,O

I,O

NA

NA

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

Log

NA

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

I,O

NA

I,O

I,O

Char

NA

NA

NA

NA

NA

NA

I,O

NA

NA

NA

NA

NA

NA

I,O

STRICT77 Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to STRICT77.

STRICT77 Compatibility Between Data Types and Data Edit Descriptors

Data types

Q

Z

R

O

L

I

G

F

ES

EN

E

D

B

A

Int

NA

I,O

NA

I,O

NA

I,O

NA

NA

NA

NA

NA

NA

I,O

NA

Real

NA

NA

NA

NA

NA

NA

I,O

I,O

NA

NA

I,O

I,O

NA

NA

Comp

NA

NA

NA

NA

NA

NA

I,O

I,O

NA

NA

I,O

I,O

NA

NA

Log

NA

NA

NA

NA

I,O

NA

NA

NA

NA

NA

NA

NA

NA

NA

Char

NA

NA

NA

NA

NA

NA

NA

NA

NA

NA

NA

NA

NA

I,O

STRICT90 and STRICT95 Compatibility Between Data Types and Data Edit Descriptors shows the restrictions for the various data types that are allowed when the FORMAT_TYPE_CHECKING environment variable is set to STRICT90 or STRICT95.

STRICT90 or STRICT95 Compatibility Between Data Types and Data Edit Descriptors

Data types

Q

Z

R

O

L

I

G

F

ES

EN

E

D

B

A

Int

NA

I,O

NA

I,O

NA

I,O

I,O

NA

NA

NA

NA

NA

I,O

NA

Real

NA

NA

NA

NA

NA

NA

I,O

I,O

I,O

I,O

I,O

I,O

NA

NA

Com

NA

NA

NA

NA

NA

NA

I,O

I,O

I,O

I,O

I,O

I,O

NA

NA

Log

NA

NA

NA

NA

I,O

NA

I,O

NA

NA

NA

NA

NA

NA

NA

Char

NA

NA

NA

NA

NA

NA

I,O

NA

NA

NA

NA

NA

NA

I,O

Program Units

Main Program

Program Statement Extension

The HPE Cray Fortran compiler supports the use of a parenthesized list of args at the end of a program statement. The compiler ignores any args specified after program-name.

Block Data Program Units

Block Data Program Unit Extension

The HPE Cray Fortran compiler permits named common blocks to appear in more than one block data program unit.

Procedures

Procedure Interface

Interface Duplication

The HPE Cray Fortran compiler allows specification of an interface body for the program unit being compiled if the interface body matches the program unit definition.

Procedure Definition

Recursive Function Extension

The HPE Cray Fortran compiler allows direct recursion for functions that do not specify a RESULT clause on the FUNCTION statement.

Empty CONTAINS Sections

The HPE Cray Fortran compiler allows a CONTAINS statement with no internal or module procedure following.

Intrinsic Procedures and Modules

Intrinsic Procedures

The HPE Cray Fortran compiler has implemented intrinsic procedures in addition to the ones required by the standard. These procedures have the status of intrinsic procedures, but programs that use them may not be portable. It is recommended that such procedures be declared INTRINSIC to allow other processors to diagnose whether or not they are intrinsic for those processors.

The nonstandard intrinsic procedures supported by the HPE Cray Fortran compiler are summarized in the following table. For more information about a particular procedure, see its man page.

Procedure

Description

ACOSD

Arccosine, value in degrees

AMO_AADD

Atomic memory add

AMO_AADDF

Atomic memory add, return new

AMO_AFADD

Atomic memory add, return old

AMO_AAX

Atomic memory AND and XOR

AMO_AFAX

Atomic memory AND and XOR, return old

AMO_AANDF

Atomic memory AND, return new

AMO_AFAND

Atomic memory AND, return old

AMO_ANANDF

Atomic memory NAND, return new

AMO_AFNAND

Atomic memory NAND, return old

AMO_AORF

Atomic memory OR, return new

AMO_AFOR

Atomic memory OR, return old

AMO_AXORF

Atomic memory XOR, return new

AMO_AFXOR

Atomic memory XOR, return old

AMO_ACSWAP

Atomic memory swap, return old

AMO_ASWAP

Atomic memory swap, return new

AMO_AFLUSH

Atomic memory flush forces var to be written to memory.

ASIND

Arcsine, value in degrees

ATAND

Arctangent, value in degrees

ATAND2

Arctangent, value in degrees

CO_BCAST

Broadcast a coarray to all images in an application.

CO_SUM

Sum of corresponding elements on all images in a coarray application

CO_MIN, CO_MAX

Maximum or minimum value of corresponding elements on all images in a coarray application

COSD

Cosine, argument in degrees

COT

Cotangent

EXIT

Program termination

FREE

Free Cray pointee memory

GET_BORROW_S@

Get scalar borrow bit

GSYNC

Complete outstanding memory references

IBCHNG

Reverse bit within a word

ILEN

Length in bits of an integer

INT_MULT_UPPER

Upper bits of integer product

LOC

Address of argument

MALLOC

Allocate Cray pointee memory

MASK

Creates a bit mask in a word

SET_BORROW_S@

Set scalar borrow bits

SET_CARRY_S@

Set scalar carry bits

SIND

Sin, argument in degrees

SIZEOF

Size of argument in bytes

SUB_BORROW_S@

Subtract scalar with borrow

TAND

Tangent, argument in degrees

CO_BCAST, CO_SUM, CO_MIN, and CO_MAX are collective intrinsic subroutines, which are extensions of the Fortran 2008 standard. Support for teams is deferred. For specific information about these routines, see the co_bcast(3i),co_max(3i), co_sum(3i) man pages.

Many intrinsic procedures have both a vector and a scalar version. If a vector version of an intrinsic procedure exists, and the intrinsic is called within a vectorizable loop, the compiler uses the vector version of the intrinsic. For information about which intrinsic procedures vectorize, see the intro_intrin(3i) man page.

For more information about the atomic memory intrinsic procedures see the amo(3i) man page.

Exceptions and IEEE Arithmetic

The Exceptions

The intrinsic module IEEE_EXCEPTIONS supplied with the Cray Fortran compiler contains three named constants in addition to those specified by the standard. These are of type IEEE_STATUS_TYPE and can be used as arguments to the IEEE_SET_STATUS subroutine. Their definitions correspond to common combinations of settings and allow for simple and fast changes to the IEEE mode settings. The constants are:

Name

Effect of CALL IEEE_SET_STATUS (Name)

ieee_cri_nostop_mode

Clears all currently set exception flags

Disables halting for all exceptions

Enables setting of all exception flags

Sets rounding mode to round_to_nearest

ieee_cri_default_mode

Clears all currently set exception flags

Enables halting for overflow, divide_by_zero, and invalid

Disables halting for underflow and inexact

Enables setting of all exception flags

Sets rounding mode to round_to_nearest

Compile and Execute Programs Containing Coarrays

There are various commands, tools, and products available in the programming environment to use for compiling and executing programs containing coarrays.

ftn and aprun Options Affecting Coarrays

The compiler recognizes coarray syntax by default. The -h nocaf disables coarray syntax recognition.

Upon execution of an a.out file that has been compiled and linked with the -h caf option, an image is created and executed on every processing element assigned to the job. Images 1 through NUM_IMAGES are assigned to processing elements 0 through N$PES-1, consecutively. The functions THIS_IMAGE() and NUM_IMAGES() may be used to retrieve the image number of the current image, or the total number of images at run time, respectively.

Set the number of processing elements assigned to a job at compile time by specifying the -X option on the ftn command. The number of processing elements can also be set at run time by executing the a.out file by using the aprun command with the -n option specified. If mixed -X values are used when compiling and linking different object files, or the number of PEs specified at run time differs from that specified when compiling and linking, a run time error will be received.

Bounds checking is performed by specifying the -Rb option on the ftn command line. This feature is not implemented for codimensions of coarrays.

For more information about the ftn and aprun commands, see the ftn(1) and aprun(1) man pages.

Interoperate with Other Message Passing and Data Passing Models

Coarrays can interoperate with all other message and data passing models. This allows for the introduction of coarrays into existing application codes incrementally. However, while it may work in some cases, mixing language-based PGAS with SHMEM is not officially supported.

These models are implemented through procedure calls, so the language interaction between coarrays and these models is well defined.

MPI and SHMEM generally use processing element numbers, which start at zero, but the coarray model generally deals with image numbers, which start at one.

Coarrays are symmetric for the purposes of SHMEM programming. Pointers in coarrays of derived type, however, may not necessarily point to symmetric data.

For more information about the other message passing and data passing models, see the following man pages:

  • intro_mpi(3)

  • intro_shmem(3)

Optimize Programs with Coarrays

Programs containing coarrays benefit from all the usual steps taken to improve run time performance of code that runs on a single image.

HPE Cray Fortran Deferred Implementation and Optional Features

ISO_10646 Character Set

The Fortran 2003 features related to supporting the ISO_10646 character set are not supported. This includes declarations, constants, and operations on variables of character(kind=4) and I/O operations. Support for this feature is optional in Fortran 2018.

Restrictions on Unlimited Polymorphic Variables

Unlimited polymorphic variables whose dynamic types are integer(1), integer(2), logical(1), or logical(2) are not supported, unless the -dh option is specified to disable packed storage for short integers and logicals.

HPE Cray Fortran Implementation Specifics

The Fortran standard specifies the rules for writing a standard conforming Fortran program. Many of the details of how such a program is compiled and executed are intentionally not specified or are explicitly specified as being processor-dependent. This chapter describes the implementation used by the HPE Cray Fortran compiler. Included are descriptions of the internal representations used for data objects and the values of processor-dependent language parameters.

Companion Processor

For the purpose of C interoperability, the Fortran standard refers to a companion processor. The companion processor for the HPE Cray Fortran compiler is the HPE Cray C compiler.

INCLUDE Line

There is no limit to the nesting level for INCLUDE lines. The character literal constant in an INCLUDE line is interpreted as the name of the file to be included. This case-sensitive name may be prefixed with additional characters based on the -I compiler command line option.

INTEGER Kinds and Values

INTEGER kind type parameters of 1, 2, 4, and 8 are supported. The default kind type parameter is 4 unless the -sdefault64 or -sinteger64 command line option is specified, in which case the default kind type parameter is 8. The interpretation of kinds 1 and 2 depend on whether the -dh command line option is specified. Integer values are represented as two’s complement binary values.

REAL Kinds and Values

REAL kind type parameters of 4, 8, and 16 are supported. The default kind type parameter is 4 unless the -sdefault64 or -sreal64 command line option is specified, in which case, the default kind type parameter is 8. Real values are represented in the format specified by the IEEE 754 standard, with kinds 4 and 8 corresponding to the 32 and 64 bit IEEE representations.

DOUBLE PRECISION Kinds and Values

The DOUBLE PRECISION type is an alternate specification of a REAL type. The kind type parameter of that REAL type is twice the value of the kind type parameter for default REAL unless the -sdefault64 or -sreal64 command line options are specified, in which case, the kind type parameter for DOUBLE PRECISION and default REAL are the same, and REAL constants with a D exponent are treated as if the D were an E. Note that if the -sdefault64 or -sreal64 options are specified, the compiler is not standard conforming.

LOGICAL Kinds and Values

LOGICAL kind type parameters of 1, 2, 4, and 8 are supported. The default kind type parameter is 4 unless the -sdefault64 or -sinteger64 command line option is specified, in which case, the default kind type parameter is 8. The interpretation of kinds 1 and 2 depend on whether the -dh command line option is specified. Logical values are represented by a bit sequence in which the low order bit is set to 1 for the value .true. and to 0 for .false., and the other bits in the representation are set to 0.

CHARACTER Kinds and Values

The CHARACTER kind type parameter of 1 is supported. The default kind type parameter is 1. Character values are represented using the 8-bit ASCII character encoding.

Cray Pointers

Cray pointers are 64-bit objects.

ENUM Kind

An enumerator that specifies the BIND(C) attribute creates values with a kind type parameter of 4.

Storage Issues

This section describes how the HPE Cray Fortran compiler uses storage, including how this compiler accommodates programs that use overindexing of blank common.

Storage Units and Sequences

The size of the numeric storage units is 32 bits, unless the -sdefault64 option is specified, in which case the numeric storage unit is 64 bits. If the -sreal64 or -sinteger64 option is specified alone, or the -dp is specified in addition to -sdefault64 or -sreal64, the relative sizes of the storage assigned for default intrinsic types do not conform to the standard. In this case, storage sequence associations involving variables declared with default intrinsic noncharacter types may be invalid and should be avoided.

Static and Stack Storage

The HPE Cray Fortran compiler allocates variables to storage according to the following criteria:

  • Variables in common blocks are always allocated in the order in which they appear in COMMON statements.

  • Data in modules are statically allocated.

  • User variables that are defined or referenced in a program unit, and that also appear in SAVE or DATA statements, are allocated to static storage, but not necessarily in the order shown in the source program.

  • Other referenced user variables are assigned to the stack. If -ev is specified on the HPE Cray Fortran compiler command line, referenced variables are allocated to static storage. This allocation does not necessarily depend on the order in which the variables appear in the source program.

  • Compiler-generated variables are assigned to a register or to memory (to the stack or heap), depending on how the variable is used. Compiler-generated variables include DO-loop trip counts, dummy argument addresses, temporaries used in expression evaluation, argument lists, and variables storing adjustable dimension bounds at entries.

  • Automatic objects may be allocated to either the stack or to the heap, depending on how much stack space is available when the objects are allocated.

  • Heap or stack allocation can be used for some compiler-generated temporary data such as automatic arrays and array temporaries.

  • Unsaved variables may be assigned to a register by optimization and not allocated storage.

  • Unreferenced user variables not appearing in COMMON statements are not allocated storage.

Dynamic Memory Allocation

Many FORTRAN 77 programs contain a memory allocation scheme that expands an array in a common block located in central memory at the end of the program. This practice of expanding a blank common block or expanding a dynamic common block (sometimes referred to as overindexing) causes conflicts between user management of memory and the dynamic memory requirements of CLE libraries. It is recommended that programs are modified rather than expand blank common blocks, particularly when migrating from other environments.

The image below shows the structure of a program under the CLE operating systems in relation to expanding a blank common block. In both figures, the user area includes code, data and common blocks.

Memory Use

Finalization

A finalizable object in a module is not finalized in the event that there is no longer any active procedure referencing the module.

A finalizable object that is allocated via pointer allocation is not finalized in the event that it later becomes unreachable due to all pointers to that object having their pointer association status changed.

ALLOCATE Error Status

If an error occurs during the execution of an ALLOCATE statement with a stat= specifier, subsequent items in the allocation list are not allocated.

DEALLOCATE Error Status

If an error occurs during the execution of a DEALLOCATE statement with a stat= specifier, subsequent items in the deallocation list are not deallocated.

ALLOCATABLE Module Variable Status

An unsaved allocatable module variable remains allocated if it is allocated when the execution of an END or RETURN statement results in no active program unit having access to the module.

Kind of a Logical Expression

For an expression such as x1 op x2 where op is a logical intrinsic binary operator and the operands are of type logical with different kind type parameters, the kind type parameter of the result is the larger kind type parameter of the operands.

STOP Code Availability

If a STOP code is specified in a STOP statement, its value is output to stderr when the STOP statement is executed.

When the stop code is a string of digits, only the least-significant 8 bits of the integer value is used as the process exit status. When the stop code is of type character or does not appear, the value zero is the process exit status.

Stream File Record Structure Position

A formatted file written with stream access may be later read as a record file. In that case, embedded newline characters (char(10)) indicate the end of a record and the terminating newline character is not considered part of the record.

The file storage unit for a formatted stream file is a byte. The position is the ordinal byte number in the file; the first byte is position 1. Positions corresponding to newline characters (char(10)) that were inserted by the I/O library as part of record output do not correspond to positions of user-written data.

File Unit Numbers

The values of INPUT_UNIT, OUTPUT_UNIT, and ERROR_UNIT defined in the ISO_Fortran_env module are 100, 101, and 102, respectively. These three unit numbers are reserved and may not be used for other purposes. The files connected to these units are the same files used by the companion C processor for standard input (stdin), output (stdout), and error (stderr). An asterisk (*) specified as the unit for a READ statement specifies unit 100. An asterisk specified as the unit for a WRITE statement, and the unit for PRINT statements is unit 101. All positive default integer values are available for use as unit numbers.

OPEN Specifiers

If the ACTION= specifier is omitted from an OPEN statement, the default value is determined by the protections associated with the file. If both reading and writing are permitted, the default value is READWRITE.

If the ENCODING= specifier is omitted or specified as DEFAULT in an OPEN statement for a formatted file, the encoding used is ASCII.

The case of the name specified in a FILE= specifier in an OPEN statement is significant.

If the FILE= specifier is omitted, fort. is prepended to the unit number.

If the RECL= specifier is omitted from an OPEN statement for a sequential access file, the default value for the maximum record length is 32767 (2**15-1).

If the file is connected for unformatted I/O, the length is measured in 8-bit bytes.

The FORM= specifier may also be SYSTEM for unformatted files.

If the ROUND= specifier is omitted from an OPEN statement, the default value is NEAREST. Specifying a value of PROCESSOR_DEFINED is equivalent to specifying NEAREST.

If the STATUS= specifier is omitted or specified as UNKNOWN in an OPEN statement, the specification is equivalent to OLD if the file exists, otherwise, it is equivalent to NEW. If STATUS=”SCRATCH” is specified the file is placed in the directory specified by the TMPDIR environment variable. If TMPDIR is not set, or the file cannot be created in the specified directory for some other reason, the file is placed in the /tmp directory. If /tmp does not exist, or cannot be accessed, the program aborts.

FLUSH Statement

Execution of a FLUSH statement causes memory resident buffers to be flushed to the physical file. Output to the unit specified by ERROR_UNIT in the ISO_Fortran_env module is never buffered; execution of FLUSH on that unit has no effect.

Asynchronous I/O

The ASYNCHRONOUS= specifier may be set to YES to allow asynchronous I/O for a unit or file.

Asynchronous I/O is used if the FFIO layer attached to the file provides asynchronous access.

REAL I/O of an IEEE NaN

An IEEE NaN may be used as an I/O value for the F, E, D, or G edit descriptor or for list-directed or namelist I/O.

Input of an IEEE NaN

The form of NaN is an optional sign followed by the string ‘NAN’ optionally followed by a hexadecimal digit string enclosed in parentheses. The input is case insensitive. Some examples are:

NaN                  - quiet NaN
            nAN()                - quiet NaN
            -nan(ffffffff)       - quiet NaN
            NAn(7f800001)        - signalling NaN
            NaN(ffc00001)        - quiet NaN
            NaN(ff800001)        - signalling NaN

The internal value for the NaN becomes a quiet NaN if the hexadecimal string is not present or is not a valid NaN.

A ‘+’ or ‘-’ preceding the NaN on input is used as the high order bit of the corresponding READ input list item. An explicit sign overrides the sign bit from the hexadecimal string. The internal value becomes the hexadecimal string if it represents an IEEE NaN in the internal data type. Otherwise, the form of the internal value is undefined.

Output of an IEEE NaN

The form of an IEEE NaN for the F, E, D, or G edit descriptor or for list-directed or namelist output is:

  • If the field width w is absent, zero, or greater than (5 + 1/4 of the size of the internal value in bits), the output consists of the string ‘NaN’ followed by the hexadecimal representation of the internal value within a set of parentheses. An example of the output field is:

    NaN(7fc00000)
    
  • If the field width w is at least 3 but less than (5 + 1/4 of the size of the internal value in bits), the string ‘NaN’ will be right-justified in the field with blank fill on the left.

  • If the field width w is 1 or 2, the field is filled with asterisks.

The output field has no ‘+’ or ‘-’; the sign is contained in the hexadecimal string.

To get the same internal value for a NaN, write it with a list-directed write statement and read it with a list-directed read statement.

To write and then read the same NaN, the field width w in D, E, F, or G must be at least the number of hexadecimal digits of the internal datum plus 5.

REAL(4):   w >= 13
            REAL(8):   w >= 21
            REAL(16):  w >= 37

List-directed and NAMELIST Output Default Formats

The length of the output value in NAMELIST and list-directed output depends on the value being written. Blanks and unnecessary trailing zeroes are removed unless the -w option to the assign command is specified, which turns off this compression.

By default, full-precision printing is assumed unless a precision is specified by the LISTIO_PRECISION environment variable (for more information about the LISTIO_PRECISION environment variable, see LISTIO_PRECISION).

The form of list-directed and NAMELIST output can be changed by using the assign command with one of the following options.

assign Option

Effect

-S

Suppress comma-delimited output; use blank spaces instead

-W

Disable compression of floating-point values

-y

Disable the repeat-count form; write as many copies of the value as needed

-U

Set all three of the above

For example, consider this code:

integer(4), dimension(5) :: ia
            real(4), dimension(5) :: ra
            ia = 102
            ra = 200.10
            NAMELIST/TNAMEL/ia,ra
            write(6,TNAMEL)
            print *, ' ia=',ia
            print *, ' ra=',ra
            print *, iarray, rarray
            end

When compiled and executed with the default settings, it produces the following output:

&TNAMEL  RA = 2*200.100006, IA = 2*102
            ia = 2*102
            ra = 2*200.100006
            2*102,  2*200.100006

However, if the FILENV environment variable is set to a file and uses the assign -U command to change the output behavior, as shown below:

% setenv FILENV ASGTMP
            % assign -U on g:sf

The same code now produces the following output:

&TNAMEL  RA =    200.1000        200.1000     IA =         102         
            102 /
            ia =          102         102
            ra =     200.1000        200.1000
            102         102    200.1000        200.1000

For more information about the assign command and Assign Environment, see Enhanced I/O: Using the assign Environment.

Random Number Generator

A multiplicative congruential generator with period 2**46 is used to produce the output of the RANDOM_NUMBER intrinsic subroutine. The seed array contains one 64-bit integer value.

Timing Intrinsics

A call to the SYSTEM_CLOCK intrinsic subroutine with the COUNT argument present translates into the inline instructions that directly access the hardware clock register. See the description of the -es and -ds command line options for information about the values returned for the count and count rate. For fine-grained timing, HPE recommends using a 64-bit COUNT argument.

The CPU_TIME subroutine obtains the value of its argument from the getrusage system call. Its execution time is significantly longer than for the SYSTEM_CLOCK routine, but the values returned are closer to those used by system accounting utilities.

IEEE Intrinsic Modules

The IEEE intrinsics modules IEEE_EXCEPTIONS, IEEE_ARITHMETIC, and IEEE_FEATURES are supplied. Denormal numbers are not supported on HPE Cray hardware. The IEEE_SUPPORT_DENORMAL inquiry function returns .false. for all kinds of arguments.

At the start of program execution, all floating point exception traps are disabled.

Enhanced I/O: Using the assign Environment

Fortran programs often need the ability to alter details of a file connection, such as device residency, an alternative file name, a file space allocation scheme or structure, or data conversion properties. These file connection details taken together comprise the assign environment, and they can be modified by using the assign command and assign library interface.

The assign environment can also be accessed from C/C++ by using the ffassign library interface. For more information, see the assign(1), assign(3f), and ffassign(3c) man pages.

Understand the assign Environment

The assign command information is stored in the assign environment file, .assign, or in a shell environment variable. To begin using the assign environment to control a program’s I/O behavior, follow these steps.

Set the FILENV environment variable to the desired path.

set FILENV environment-file

Run the assign command to define the current assign environment.

assign arguments assign-object

For example:

assign -F cachea g:su

Run the program.

If not satisfied with the I/O performance observed during program execution, return to step 2, use the assign command to adjust the assign environment, and try again.

The assign command passes information to Fortran open statements and to the ffopen routine to identify the following elements:

  • A list of numbers

  • File names

  • File name patterns that have attributes associated with them

  • The assign object is the file name, file name pattern, unit number, or type of I/O open request to which the assign environment applies. When the unit or file is opened from Fortran, the environment defined by the assign command is used to establish the properties of the connection.

Assign Objects and Open Processing

The I/O library routines apply options to a file connection for all related assign objects.

If the assign object is a unit, the application of options to the unit occurs whenever that unit is connected.

If the assign object is a file name or pattern, the application of options to the file connection occurs whenever a matching file name is opened from a Fortran program.

When any of the library I/O routines opens a file, it uses the specified assign environment options for any assign objects that apply to the open request. Any of the following assign objects or categories can apply to a given open request.

Assign-object

Applies To

g:all

All open requests

g:su

Open sequential unformatted

g:du

Open direct unformatted

g:sf

Open sequential formatted

g:df

Open direct formatted

g_ff

ffopen

u:unit-number

Open unit-number

p:pattern

When a file whose name matches pattern is opened. The assign environment can contain only one p:assign-object that matches the current open file. The exception is that the p:%pattern (which uses the % wildcard character) is silently ignored if a more specific pattern also matches the current file name being opened.

f:filename

Whenever file filename is opened.

Options from the assign objects in these categories are collected to create the complete set of options used for any particular open. The options are collected in the listed order, with options collected later in the list of assign objects overriding those collected earlier.

assign Command Syntax

Here is the syntax for the assign command:

assign -I -O -aactualfile -bbs -ffortstd -msetting -sft -t -ubufcnt -ysetting -Bsetting -Ccharcon -Dfildes -Fspec,specs -Nnumcon -R -Ssetting -Tsetting -Usetting -V -Wsetting -Ysetting -Zsetting assign-object

The following specifications cannot be used with any other options:assign -R assign-objectassign -V assign-object

A summary of the command options follows. For details, see the assign(1) and intro_ffio(3f) man pages.

Control options:

-I

Specifies an incremental use of assign. All attributes are added to the attributes already assigned to the current assign-object. This option and the -O option are mutually exclusive.

-O

Specifies a replacement use of assign. This is the default control option. All currently existing assign attributes for the current assign-object are replaced. This option and the -I option are mutually exclusive.

-R

Removes all assign attributes for assign-object. If assign-object is not specified, all currently assigned attributes for all assign-objects are removed.

-V

Views attributes for assign-object. If assign-object is not specified, all currently assigned attributes for all assign-objects are printed.

Attribute options:

-a actualfile

The file= specifier or the actual file name.

-b bs

Library buffer size in 4096-byte (512-word) blocks.

-f fortstd

Specifies the type of Fortran with which to be compatible. Used by Fortran I/O. The valid values for fortstd are:

-   `90` - Causes the Fortran file to be compatible with the current Cray Fortran compiler.
-   `95` - Causes the Fortran file to be compatible with Cray Fortran 95. If this value is set, the list-directed and namelist output of a floating point will remain 0.E+0.

A file's compatibility is established when it is opened. By default, a Fortran file is compatible with the language from which an OPEN statement or implicit open caused the file to be connected.

-m setting

Special handling of a file that will be accessed concurrently by several processes or tasks. Special handling includes skipping the check that only one Fortran unit be connected to a unit, suppressing file truncation to true size by the I/O buffering routines, and ensuring that the file is not truncated by the I/O buffering routines. Enter either on or off for setting.

-s ft

File type. Enter text, cos, blocked, unblocked, u, sbin, or bin for ft. The default is text.

-t

Temporary file.

-u bufcnt

Buffer count. Specifies the number of buffers to be allocated for a file.

-y setting

Suppresses repeat counts in list-directed output. setting can be either on or off. The default setting is off.

-B setting

Activates or suppresses the passing of the O_DIRECT flag to the open(2) system call. Enter either on or off for setting. This is an important feature for I/O optimization; if this is on, it enables reads and writes directly to and from the user program buffer.

-C charcon

Character set conversion information. Enter ascii, or ebcdic for charcon. If the -C option is specified, the -F option must also be specified.

-D fildes

Specifies a connection to a standard file. Enter stdin, stdout, or stderr for fildes.

-F spec ,specs

Flexible file I/O (FFIO) specification. See the assign(1) man page for details about allowed values for spec and for details about hardware platform support. See the intro_ffio(3f) man page for details about specifying the FFIO layers.

-N numcon

Foreign numeric conversion specification. See the assign(1) man page for details about allowed values for numcon and for details about hardware platform support.

-S setting

Suppresses use of a comma as a separator in list-directed output. Enter either on or off for setting. The default setting is off.

-T setting

Activates or suppresses truncation after write for sequential Fortran files. Enter either on or off for setting.

-U setting

Produces a non-UNICOS form of list-directed output. This is a global setting that sets the value for the -y, -eS, and -W options. Enter either on or off for setting. The default setting is off.

-W setting

Suppresses compressed width in list-directed output. Enter either on or off for setting. The default setting is off.

-Y setting

Skips unmatched namelist groups in a namelist input record. Enter either on or off for setting. The default setting is on.

-Z setting

Recognizes -0.0 for IEEE floating-point systems and writes the minus sign for edit-directed, list-directed, and namelist output. Enter either on or off for setting. The default setting is on.

assign-object

Specify either a file name or a unit number for assign-object. The assign command associates the attributes with the file or unit specified. These attributes are used during the processing of Fortran open statements or during implicit file opens.

Use one of the following formats for assign-object:

  • f:filename

  • g:io-type, where io-type can be su, sf, du, df, or ff (for example, g:ff for ffopen(3C))

  • p:pattern (for example, p:file%)

  • u:unit-number (for example, u:9)

  • filename

    When the p:pattern form is used, the % and _ wildcard characters can be used. The % matches any string of 0 or more characters. The _ matches any single character. The % performs like the * when doing file name matching in shells. However, the % character also matches strings of characters containing the / character.

Use the Library Routines

The assign, asnunit, asnfile, and asnrm routines can be called from a Fortran program to access and update the assign environment. The assign routine provides an easy interface to assign processing from a Fortran program. The asnunit and asnfile routines assign attributes to units and files, respectively. The asnrm routine removes all entries currently in the assign environment.

The calling sequences for library routines are as follows:

call assign (cmd, ier)
        
        call asnunit (iunit,astring,ier)
        
        call asnfile (fname,astring,ier)
        
        call asnrm (ier)

Where:

cmd

Fortran character variable containing a complete assign command in the format acceptable to the pxfsystem routine.

ier

Integer variable that is assigned the exit status on return from the library interface routine.

iunit

Integer variable or constant that contains the unit number to which attributes are assigned.

astring

Fortran character variable that contains any attribute options and option values from the assign command. Control options -I, -O, and -R can also be passed.

fname

Character variable or constant that contains the file name to which attributes are assigned.

A status of 0 indicates normal return. A status of greater than 0 indicates a specific error status. Use the explain command to determine the meaning of the error status.

The following calls are equivalent to the assign -s u f:file command:

call assign('assign -s u f:file',ier)
        call asnfile('file','-s u',ier)

The following call is equivalent to executing the assign -I -n 2 u:99 command:

iun = 99
        call asnunit(iun,'-i -n 2',ier)

The following call is equivalent to executing the assign -R command:

call asnrm(ier)

Tune File Connection Behavior

Use Alternative File Names

The -a option specifies the actual file name to which a connection is made. This option allows files to be created in different directories without changing the FILE= specifier on an OPEN statement.

For example, consider the following assign command issued to open unit 1:

assign -a /tmp/mydir/tmpfile u:1

The program then opens unit 1 with any of the following statements:

WRITE(1) variable          ! implicit open
OPEN(1)                    ! unnamed open
OPEN(1,FORM='FORMATTED')   ! unnamed open

Unit 1 is connected to file /tmp/mydir/tmpfile. Without the -a attribute, unit 1 would be connected to file fort.1.

When the -a attribute is associated with a file, any Fortran open that is set to connect to the file causes a connection to the actual file name. An assign command of the following form causes a connection to file $FILENV/joe:

assign -a $FILENV/joe ftfile

This is true when the following statement is executed in a program:

OPEN(IUN,FILE='ftfile')

If the following assign command is issued and in effect, any Fortran INQUIRE statement whose FILE= specification is foo refers to the file named actual instead of the file named foo for purposes of the EXISTS=, OPENED=, or UNIT= specifiers:

assign -a actual f:foo

If the following assign command is issued and in effect, the -a attribute does not affect INQUIRE statements with a UNIT= specifier:

assign -a actual ftfile

When the following OPEN statement is executed, INQUIRE(UNIT=n,NAME=fname) returns a value of ftfile in fname, as if no assign had occurred:

OPEN(n,file='ftfile')

The I/O library routines use only the actual file (-a) attributes from the assign environment when processing an INQUIRE statement. During an INQUIRE statement that contains a FILE= specifier, the I/O library searches the assign environment for a reference to the file name that the FILE= specifier supplies. If an assign-by-filename exists for the file name, the I/O library determines whether an actual name from the -a option is associated with the file name. If the assign-by-filename supplied an actual name, the I/O library uses that name to return values for the EXIST=, OPENED=, and UNIT= specifiers; otherwise, it uses the file name. The name returned for the NAME= specifier is the file name supplied in the FILE= specifier. The actual file name is not returned.

Specify File Structure

A file structure defines the way records are delimited and how the end-of-file is represented. The assign command supports two mutually exclusive file structure options:

  • To select a structure using an FFIO layer, use assign -F

  • To select a structure explicitly, use assign -s

Using FFIO layers is more flexible than selecting structures explicitly. FFIO allows nested file structures, buffer size specifications, and support for file structures not available through the -s option. Better I/O performance is realized by using the -F option and FFIO layers.

The remainder of this section covers the -s option.

Fortran sequential unformatted I/O uses four different file structures: f77 blocked structure, text structure, unblocked structure, and COS blocked structure. By default, the f77 blocked structure is used unless a file structure is selected at open time. If an alternative file structure is needed, the user can select a file structure by using the -s or -F option on the assign command.

The -s and -F options are mutually exclusive. The following examples show how to use different assign command options to select different file structures.

Structure

assign Command(s)

F77 blocked

assign -F f77

text

assign -F text, assign -s text

unblocked

assign -F system, assign -s unblocked

COS blocked

assign -F cos, assign -s cos

The following examples show how to adjust blocking:

  • To select an unblocked file structure for a sequential unformatted file:

    IUN = 1
    CALL ASNUNIT(IUN,'-s unblocked',IER)
    OPEN(IUN,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
    
  • The assign -s u command can also be used to specify the unblocked file structure for a sequential unformatted file. When this option is selected, I/O is unbuffered. Each Fortran READ or WRITE statement results in a read or write system call such as the following:

    CALL ASNFILE('fort.1','-s u',IER)
    OPEN(1,FORM='UNFORMATTED',ACCESS='SEQUENTIAL')
    
  • To assign unit 10 a COS blocked structure:

    assign -s cos u:10
    

The full set of options allowed with the assign -s command are as follows:

  • bin (not recommended)

  • blocked

  • cos

  • sbin

  • text

  • unblocked

Access and Form

assign -s ft Defaults

assign -s ft Options

Sequential unformatted, BUFFER IN and BUFFER OUT

blocked / cos / f77

bin sbin u unblocked

Direct unformatted

unblocked

bin sbin u unblocked

Sequential formatted

text

blocked cos sbin/text

Direct formatted

text

sbin/text

Unblocked File Structure

A file with an unblocked file structure contains undelimited records. Because it does not contain any record control words, it does not have record boundaries. The unblocked file structure can be specified for a file opened with either unformatted sequential access or unformatted direct access. It is the default file structure for a file opened as an unformatted direct-access file.

Do not attempt to use a BACKSPACE statement to reposition a file with an unblocked file structure. Since record boundaries do not exist, the file cannot be repositioned to a previous record.

BUFFER IN and BUFFER OUT statements can specify a file having an unbuffered and unblocked file structure. If the file is specified with assign -s u, BUFFER IN and BUFFER OUT statements can perform asynchronous unformatted I/O.

There are several ways to use the assign command to specify unblocked file structure. All ways result in a similar file structure but with different library buffering styles, use of truncation on a file, alignment of data, and recognition of an end-of-file record in the file. The following unblocked data file structure specifications are available:

Specification

Structure

assign -s unblocked

Library-buffered

assign -F system

No library buffering

assign -s sbin

Buffering that is compatible with standard I/O; for example, both library and system buffering

The type of file processing for an unblocked data file structure depends on the assign -s ft option that is declared or assumed for a Fortran file.

For more information about buffering, see Specify Buffer Behavior.

An I/O request for a file specified using the assign -s unblocked command does not need to be a multiple of a specific number of bytes. Such a file is truncated after the last record is written to the file. Padding occurs for files specified with the assign -s bin command and the assign -s unblocked command. Padding usually occurs when noncharacter variables follow character variables in an unformatted direct-access file.

No padding is done in an unformatted sequential access file. An unformatted direct-access file created by a Fortran program on CLE systems contains records that are the same length. The end-of-file record is recognized in sequential-access files.

assign -s sbin File Processing

Use an assign -s sbin specification for a Fortran file opened with either unformatted direct access or unformatted sequential access. The file does not contain record delimiters. The file created for assign -s sbin in this instance has an unblocked data file structure and uses unblocked file processing.

The assign -s sbin option can be specified for a Fortran file that is declared as formatted sequential access. Because the file contains records that are delimited with the new-line character, it is not an unblocked data file structure. It is the same as a text file structure.

The assign -s sbin option is compatible with the standard C I/O functions.

HPE discourages the use of assign -s sbin because it typically yields poor I/O performance. If an FFIO layer cannot be used, using assign -s text for formatted files and assign -s unblocked for unformatted files usually produces better I/O performance than using assign -s sbin.

assign -s bin File Processing

An I/O request for a file that is specified with assign -s bin does not need to be a multiple of a specific number of bytes. Padding occurs when noncharacter variables follow character variables in an unformatted record.

The I/O library uses an internal buffer for the records. If opened for sequential access, a file is not truncated after each record is written to the file.

assign -s u File Processing

The assign -s u command specifies undefined or unknown file processing. An assign -s u specification can be specified for a Fortran file declared as unformatted sequential or direct access. Because the file does not contain record delimiters, it has an unblocked data file structure. Both synchronous and asynchronous BUFFER IN and BUFFER OUT processing can be used with u file processing.

Fortran sequential files declared by using assign -s u are not truncated after the last word written. The user must execute an explicit ENDFILE statement on the file.

text File Structure

The text file structure consists of a stream of 8-bit ASCII characters. Every record in a text file is terminated by a newline character (\n, ASCII 012). Some utilities may omit the newline character on the last record, but the Fortran library treats such an occurrence as a malformed record. This file structure may be specified for a file that is declared as either formatted sequential access or formatted direct access. It is the default file structure for formatted sequential access and formatted direct access files.

The assign -s text command specifies the library-buffered text file structure. Both library and system buffering are done for all text file structures.

An I/O request for a file using assign -s text does not need to be a multiple of a specific number of bytes.

BUFFER IN and BUFFER OUT statements cannot be used with this structure. Use a BACKSPACE statement to reposition a file with this structure.

cos or blocked File Structure

The cos or blocked file structure uses control words to mark the beginning of each sector and to delimit each record. Specify this file structure for a file that is declared as unformatted sequential access. Synchronous BUFFER IN and BUFFER OUT statements can create and access files with this file structure.

Specify this file structure with one of the following assign commands:

assign -s cos
assign -s blocked
assign -F cos
assign -F blocked        

These four assign commands result in the same file structure.

An I/O request on a blocked file is library buffered.

In a cos file structure, one or more ENDFILE records are allowed. BACKSPACE statements can be used to reposition a file with this structure.

A blocked file is a stream of words that contains control words called Block Control Word (BCW) and Record Control Words (RCW) to delimit records. Each record is terminated by an EOR (end-of-record) RCW. At the beginning of the stream, and every 512 words thereafter (including any RCWs), a BCW is inserted. An end-of-file (EOF) control word marks a special record that is always empty. Fortran considers this empty record to be an endfile record. The end-of-data (EOD) control word is always the last control word in any blocked file. The EOD is always immediately preceded by either an EOR, or by an EOF and a BCW.

Each control word contains a count of the number of data words to be found between it and the next control word. In the case of the EOD, this count is 0. Because there is a BCW every 512 words, these counts never point forward more than 511 words.

A record always begins at a word boundary. If a record ends in the middle of a word, the rest of that word is zero filled; the ubc field of the closing RCW contains the number of unused bits in the last word.

The following illustration and table is a representation of the structure of a BCW.

m

unused

bdf

unused

bn

fwi

(4)

(7)

(1)

(19)

(24)

(9)

Field

Bits

Description

m

0-3

Type of control word; 0 for BCW

bdf

11

Bad Data flag (1-bit, 1=bad data)

bn

31-54

Block number (modulo 224)

fwi

55-63

Forward index; the number of words to the next control word

The following illustration and table is a representation of the structure of an RCW.

m

ubc

tran

bdf

srs

unused

pfi

pri

fwi

(4)

(6)

(1)

(1)

(1)

(7)

(20)

(15)

(9)

Field

Bits

Description

m

0-3

Type of control word; 108 for EOR, 168 for EOF, and 178 for EOD

ubc

4-9

Unused bit count; number of unused low-order bits in last word of previous record

tran

10

Transparent record field (unused)

bdf

11

Bad data flag (unused)

srs

12

Skip remainder of sector (unused)

pfi

20-39

Previous file index; offset modulo 220 to the block where the current file starts (as defined by the last EOF)

pri

40-54

Previous record index; offset modulo 215 to the block where the current record starts

fwi

55-63

Forward index; the number of words to the next control word

Specify Buffer Behavior

A buffer is a temporary storage location for data while the data is being transferred. Buffers are often used for the following purposes:

  • Small I/O requests can be collected into a buffer, and the overhead of making many relatively expensive system calls can be greatly reduced.

  • Many data file structures such as cos contain control words. During the write process, a buffer can be used as a work area where control words can be inserted into the data stream (a process called blocking). The blocked data is then written to the device. During the read process, the same buffer work area can be used to remove the control words before passing the data on to the user (called deblocking).

  • When data access is random, the same data may be requested many times. A cache is a buffer that keeps old requests in the buffer in case these requests are needed again. A cache that is sufficiently large or efficient can avoid a large part of the physical I/O by having the data ready in a buffer. When the data is often found in the cache buffer, it is referred to as having a high hit rate. For example, if the entire file fits in the cache and the file is present in the cache, no more physical requests are required to perform the I/O. In this case, the hit rate is 100%.

  • Running the I/O devices and the processors in parallel often improves performance; therefore, it is useful to keep processors busy while data is being moved. To do this when writing, data can be transferred to the buffer at memory-to-memory copy speed. Use an asynchronous I/O request. The control is then immediately returned to the program, which continues to execute as if the I/O were complete (a process called write-behind). A similar process called read-ahead can be used while reading; in this process, data is read into a buffer before the actual request is issued for it. When it is needed, it is already in the buffer and can be transferred to the user at very high speed.

  • When direct I/O is enabled (assign -B on), data is staged in the system buffer cache. While this can yield improved performance, it also means that performance is affected by program competition for system buffer cache. To minimize this effect, avoid public caches when possible.

  • In many cases, the best asynchronous I/O performance can be realized by using the FFIO cachea layer (assign -F cachea). This layer supports read-ahead, write-behind, and improved cache reuse.

    The size of the buffer used for a Fortran file can have a substantial effect on I/O performance. A larger buffer size usually decreases the system time needed to process sequential files. However, large buffers increase a program’s memory usage; therefore, optimizing the buffer size for each file accessed in a program on a case-by-case basis can help increase I/O performance and minimize memory usage.

    The -b option on the assign command specifies a buffer size, in blocks, for the unit. The -b option can be used with the -s option, but it cannot be used with the -F option. Use the -F option to provide I/O path specifications that include buffer sizes; the -b, and -u options do not apply when -F is specified.

    For more information about the selection of buffer sizes, see the assign(1) man page.

The following examples of buffer size specification illustrate using the assign -b and assign -F options:

  • If unit 1 is a large sequential file for which many Fortran READ or WRITE statements are issued, increase the buffer size to a large value, using the following assign command:

    assign -b buffer-size u:buffer-count
    
  • If the file foo is a small file or is accessed infrequently, minimize the buffer size using the following assign command:

    assign -b 1 f:foo
    

Specify Foreign File Formats

The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F command specifies a foreign record blocking; the assign -C command specifies the type of character conversion; the -N option specifies the type of numeric data conversion. When -N or -C is specified, the data is converted automatically during the processing of Fortran READ and WRITE statements. For example, assume that a record in file fgnfile contains the following character and integer data:

character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int 

Use the following assign command to specify foreign record blocking and foreign data formats for character and integer data:

assign -F ibm.vbs -N ibm -C ebcdic fgnfile

One of the most common uses of the assign command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system, use the following command:

assign -N swap_endian fgnfile

This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.

If all unformatted sequential files are the opposite endianness, use the following command:

assign -N swap_endian g:su

Default Buffer Sizes

The Fortran I/O library automatically selects default buffer sizes according to file access type as shown in the table, Default Buffer Sizes for Fortran I/O Library Routines. Override the defaults by using the assign command. The following subsections describe the default buffer sizes on various systems.

One block is 4,096 bytes on CLE systems.

Access Type

Default Buffer Size

Sequential formatted

16 blocks (65,536 bytes)

Sequential unformatted

128 blocks (524,288 bytes)

Direct formatted

The smaller of the record length in bytes +1 or 16 blocks (65,536 bytes).

Direct unformatted

The larger of the record length is 16 blocks (65,536 bytes).

Four buffers of default size are allocated. For more information, see the description of the cachea layer in the intro_ffio(3F) man page.

Library Buffering

The term library buffering refers to a buffer that the I/O library associates with a file. When a file is opened, the I/O library checks the access, form, and any attributes declared on the assign command to determine the type of processing that should be used on the file. Buffers are an integral part of the processing.

If the file is assigned with one of the following assign options, library buffering is used:

  • -s blocked

  • -F spec (buffering as defined by spec)

  • -s cos

  • -s bin

  • -s unblocked

The -F option specifies flexible file I/O (FFIO), which uses library buffering if the specifications selected include a need for buffering. In some cases, more than one set of buffers might be used in processing a file. For example, the -F bufa,cos option specifies two library buffers for a read of a blank compressed COS blocked file. One buffer handles the blocking and deblocking associated with the COS blocked control words, and the second buffer is used as a work area to process blank compression. In other cases (for example, -F system), no library buffering occurs.

System Cache

The operating system uses a set of buffers in kernel memory for I/O operations. These are collectively called the system cache. The I/O library uses system calls to move data between the user memory space and the system buffer. The system cache ensures that the actual I/O to the logical device is well formed, and it tries to remember recent data in order to reduce physical I/O requests.

The following assign command options can be expected to use system cache:

  • -s sbin

  • -F spec (FFIO, depends on spec)

For the assign -F cachea command, a library buffer ensures that the actual system calls are well formed and the system buffer cache is bypassed. This is not true for the assign -s u option. If assign -s u is planned to be used to bypass the system cache, all requests must be well formed.

Unbuffered I/O

The simplest form of buffering is none at all; this unbuffered I/O is known as direct I/O. For sufficiently large, well-formed requests, buffering is not necessary and can add unnecessary overhead and delay. The following assign command specifies unbuffered I/O:

assign -s u  ...

Use the assign command to bypass both library buffering and the system cache for all well-formed requests. The data is transferred directly between the user data area and the logical device. Requests that are not well formed will result in I/O errors.

Specify Foreign File Formats

The Fortran I/O library can read and write files with record blocking and data formats native to operating systems from other vendors. The assign -F command specifies a foreign record blocking; the assign -C command specifies the type of character conversion; the -N option specifies the type of numeric data conversion. When -N or -C is specified, the data is converted automatically during the processing of Fortran READ and WRITE statements. For example, assume that a record in file fgnfile contains the following character and integer data:

character*4 ch
integer int
open(iun,FILE='fgnfile',FORM='UNFORMATTED')
read(iun) ch, int

Use the following assign command to specify foreign record blocking and foreign data formats for character and integer data:

assign -F ibm.vbs -N ibm -C ebcdic fgnfile

One of the most common uses of the assign command is to swap big-endian for little-endian files. To access big-endian unformatted files on a little-endian system, use the following command:

assign -N swap_endian fgnfile

This assumes the file is a normal f77 unformatted file with 32-bit record control images with a byte count. The library routines swap both the control images and the data when reading or writing the file.

If all unformatted sequential files are the opposite endianness, use the following command:

assign -N swap_endian g:su

Specify Memory Resident Files

The assign -F mr command specifies that a file will be memory resident. Because the mr flexible file I/O layer does not define a record-based file structure, it must be nested beneath a file structure layer when record blocking is needed.

For example, if unit 2 is a sequential unformatted file that is to be memory resident, the following Fortran statements connect the unit:

CALL ASNUNIT (2,'-F cos,mr',IER)
OPEN(2,FORM='UNFORMATTED')

The -F cos,mr specification selects COS blocked structure with memory residency.

Use and Suppress File Truncation

The assign -T option activates or suppresses truncation after the writing of a sequential Fortran file. The -T on option specifies truncation; this behavior is consistent with the Fortran standard and is the default setting for most assign -s fs specifications.

The assign(1) man page lists the default setting of the -T option for each -s fs specification. It also indicates if suppression or truncation is allowed for each of these specifications.

FFIO layers that are specified by using the -F option vary in their support for suppression of truncation with -T off.

The following figure, Access Methods and Default Buffer Sizes, summarizes the available access methods and the default buffer sizes.

Access Methods and Default Buffer Sizes

Define the Assign Environment File

The assign command information is stored in the assign environment file. The location of the active assign environment file must be provided by setting the FILENV environment variable to the desired path and file name.

Use Local Assign Mode

The assign environment information is usually stored in the .assign environment file. Programs that do not require the use of the global .assign environment file can activate local assign mode. If local assign mode is selected, the assign environment will be stored in memory. Thus, other processes cannot adversely affect the assign environment used by the program.

The ASNCTL routine selects local assign mode when it is called by using one of the following command lines:

CALL ASNCTL('LOCAL',1,IER)
      CALL ASNCTL('NEWLOCAL',1,IER)

Local assign mode

In the following example, a Fortran program activates local assign mode and then specifies an unblocked data file structure for a unit before opening it. The -I option is passed to ASNUNIT to ensure that any assign attributes continue to have an effect at the time of file connection.

C    Switch to local assign environment
     CALL ASNCTL('LOCAL',1,IER)
     IUN = 11
C    Assign the unblocked file structure
     CALL ASNUNIT(IUN,'-I -s unblocked',IER)
C    Open unit 11
     OPEN(IUN,FORM='UNFORMATTED')

If a program contains all necessary assign statements as calls to ASSIGN, ASNUNIT, and ASNFILE, or if a program requires total shielding from any assign commands, use the second form of a call to ASNCTL, as follows:

C    New (empty) local assign environment
     CALL ASNCTL('NEWLOCAL',1,IER)
     IUN = 11
C    Assign a large buffer size
     CALL ASNUNIT(IUN,'-b 336',IER)
C    Open unit 11
     OPEN(IUN,FORM='UNFORMATTED')

Interlanguage Communication

The Clang C and C++ compilers provide mechanisms for declaring external functions written in other languages. This enables the writing of portions of an application in C, C++, Fortran, or assembly language, which can be useful in cases where the other languages provide performance advantages or utilities not available in C or C++.

The HPE Cray Compiling Environment LLD now differs from upstream LLD regarding COMMON symbol resolution. When a COMMON symbol is found within a .bss or another uninitialized data section, users will define that symbol. But if a symbol is from a .data section, where data is initialized, users will leave that symbol undefined and let the dynamic linker resolve it at runtime.

Fortran, C, C++ Interoperability

The HPE Cray Compiler supports interoperability mechanisms specified in the Fortran 2008 standard, ISO/IEC 1539-1:2010, and TS 29113 Further Interoperability of Fortran and C.

The Fortran 2008 standard describes interoperability features for:

Intrinsic Types

The Fortran intrinsic module ISO_C_BINDING provides interoperability between Fortran intrinsic types and C types. The ISO_C_BINDING module provides named constants which can be used as KIND type parameters, compatible with C types.

In addition to the named constants required by the Fortran standard, the HPE Cray compiler provides, as an extension, definitions for 128-bit floating, and complex types. C_FLOAT128 and C_FLOAT128_COMPLEX correspond to C types __float128 and __float128 complex.

Derived Types and Structures

Use the BIND attribute when creating an interoperable type:

USE ISO_C_BINDING
TYPE, BIND(C) :: THIS_TYPE
    . . .
END TYPE THIS_TYPE

Global Variables

Use the BIND attribute with a common block declaration, or module variable:

USE ISO_C_BINDING
INTEGER(C_INT), BIND(C) :: EXTERN
INTEGER(C_LONG) :: CVAR
BIND(C, NAME='var') :: CVAR
COMMON /A/ I, J
REAL(C_FLOAT) :: I, J
BIND(C) :: /A/

Pointers

ISO_C_BINDING provides a derived type, c_ptr, that inter-operates with any C pointer type. Also, Fortran named constant c_null_ptr is equivalent to the C value NULL.

Subroutines and Function

Declare a Fortran procedure with the BIND attribute. Procedure arguments must be of interoperable type. By default the Fortran compiler converts the procedure name to lower-case (myfunction); this is the binding label, or corresponding name which is known to the C compiler.

FUNCTION MYFUNCTION(X, Y), BIND(C)
    
Specify a different binding label:

FUNCTION MYFUNCTION(X, Y), BIND(C, NAME='C_Myfunction')

A function result must be scalar and of interoperable type. A subroutine prototype must have a void result.

TS 29113 describes further interoperability features including:

C descriptors

ISO_Fortran_binding.h defines C structure CFI_cdesc_t which facilitates using Fortran data objects from within a C function.

ISO_Fortran binding.h

Contains additional C structure definitions and macro definitions to interoperate with an allocatable, or data pointer argument.

BIND(C) Syntax

The proc-language-binding-spec specification allows Fortran programs to interoperate with C objects. The optional comma in FUNCTION name(), BIND(C) is an HPE extension to the Fortran standard.

ISO_C_BINDING

The ISO_C_BINDING module provides interoperability between Fortran intrinsic types and C types. The ISO_C_BINDING module provides named constants which can be used as KIND type parameters, compatible with C types.

In addition to the named constants required by the Fortran 2008 standard, HPE compiler provides, as an extension, definitions for 128-bit floating, and complex types. C_FLOAT128 and C_FLOAT128_COMPLEX correspond to C types __float128 and __float128 complex.

Interlanguage Communication Examples

Interlanguage Communication using Common Block/Global

//  common_c.c : example of function called from common.f90

#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>

//   globals that match up to the common blocks in common.f90
float c_single;

struct common {
  double var1;
  int var2;
} multiple;

int c_int_array[100];

// c function called from Fortran
void global_var_common()
{
  int i;
  //  just prints and sets the globals


  printf(" In global_var_common\n");
  printf("   c_single: %f\n", c_single);
  printf("   multiple: %f, %d\n", multiple.var1, multiple.var2 );
  printf("   c_int_array: %d, %d\n", c_int_array[0], c_int_array[99]);

  c_single = 2 * c_single;
  multiple.var1 = 77.77;
  multiple.var2 = 17;
  for(i=0; i<100; i++ ) {
     c_int_array[i] = c_int_array[i] * 3;
  }

}  // end of global_var_common

! common.f90
! Needs common_c.c

program common_block
  use, intrinsic :: iso_c_binding
!  use check_error
  implicit none
!
!  declare the common blocks for c globals
!  one with a single real variable
  real(c_float) r_var
  common /c_single/ r_var
!  one with an integer array
  integer i_array(100)
  common / array / i_array
!  one with two variables
  real(c_double) :: var1
  integer(c_int) :: var2
  common / multiple / var1, var2
!   do the bind c on the common blocks, renaming one
BIND(C,name="c_int_array") :: / array /
BIND(C) :: / multiple /,  /c_single/

call sub1()

end program common_block

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine sub1( )
  use, intrinsic :: iso_c_binding

!  declare the common blocks for c globals
!  one with a single real variable
  real(c_float) r_var
  common /c_single/ r_var
!  one with an integer array
  integer i_array(100)
  common / array / i_array
  real(c_double) var1
  integer(c_int) var2
  common / multiple / var1, var2
!   do the bind c on the common blocks, renaming array
BIND(C,name="c_int_array") :: / array /
BIND(C) :: / multiple /,  /c_single/

  interface
     subroutine global_var_common( )  bind(c)
       use,intrinsic :: iso_c_binding
       implicit none
     end subroutine global_var_common
  end interface

r_var = -99.3
var1 = 88.88
var2 = -13
i_array = [(i,i=1,100)]


! call the c function
call global_var_common( )

print *, "In sub1"
print *, "  r_var : ", r_var
print *, "  var1  : ", var1
print *, "  var2  : ", var2
print *, "  array : ", i_array(1), i_array(100)

end subroutine

Interlanguage Communication using Derived Structure

// c program that calls the Fortran subroutine with struct argument, f2008 C.11.3
//*********************************************************************
#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>

  //  declare the structure type
struct pass {
int lenc, lenf;
float *c, *f;
};

  //  prototype for the Fortran function
void simulation(long alpha, double *beta, long *gamma, double delta[], struct pass *arrays);

//  program that calls the Fortran subroutine
int main ( )
{
  int i;
  long alpha, gamma;
  double beta, delta[100];
  struct pass arrays;

  alpha = 1234L;
  gamma = 5678L;
  beta = 12.34;
  for(i=0; i<100; i++ ) {
     delta[i] = i+1;
  }

  //  fill in some of the structure
  arrays.lenc = 100;
  arrays.lenf = 0;
  arrays.c = (float *) malloc( 100*sizeof(float) );
  arrays.f = NULL;
  for(i=0; i<100; i++ ) {
     arrays.c[i] = 2*(i+1);
  }

  //  reference the Fortran subroutine
  simulation(alpha, &beta, &gamma, delta, &arrays);

  printf(" After simulation\n");
  printf("   alpha: %d, beta: %f\n", alpha, beta );
  printf("   gamma: %d\n", gamma );
  printf("   arrays.lenc: %d\n", arrays.lenc);
  printf("   arrays.c[0],[arrays.lenc-1],: %f, %f\n", arrays.c[0], arrays.c[arrays.lenc-1]);
  printf("   arrays.lenf: %d\n", arrays.lenf);
  printf("   arrays.f[0],[arrays.lenf-1],: %f, %f\n", arrays.f[0], arrays.f[arrays.lenf-1]);

}  //  end of main

! Example derived type/structure interoperability, f2008 C.11.3
!**************************************************************
subroutine simulation(alpha, beta, gamma, delta, arrays) bind(c)
  use, intrinsic :: iso_c_binding
  implicit none
  integer (c_long), value :: alpha
  real (c_double), intent(inout) :: beta
  integer (c_long), intent(out) :: gamma
  real (c_double),dimension(*),intent(in) :: delta
  type, bind(c) :: pass
    integer (c_int) :: lenc, lenf
    type (c_ptr) :: c, f
  end type pass
  type (pass), intent(inout) :: arrays
  real (c_float), allocatable, target, save :: eta(:)
  real (c_float), pointer :: c_array(:)
  integer i

  print *, "In simulation"
  print *, "  alpha: ", alpha, ", beta: ", beta
  print *, "  delta(1),(100): ", delta(1), delta(100)

  ! associate c_array with an array allocated in c
  call c_f_pointer (arrays%c, c_array, [arrays%lenc])
  print *, "  c_array(1),(arrays%lenc): ", c_array(1), c_array(arrays%lenc)

  ! allocate an array and make it available in c
  arrays%lenf = 100
  allocate (eta(arrays%lenf))
  arrays%f = c_loc(eta)
  eta = [(i*3,i=1,arrays%lenf)]

  ! change argument values
  c_array = c_array * 2.0
  gamma = 77
  beta = -55.66

end subroutine simulation

Interlanguage Communication using Module

//  c function called from module.f90

#include <stdio.h>
#include <stdlib.h>
#include <ISO_Fortran_binding.h>

//   globals that match up to the module variables in module.f90
float r_var;
double var1;
int var2;
int c_int_array[100];

//  c function called from Fortran
void global_var_module()
{
  int i;
  //  just prints and sets the globals


  printf(" In global_var_module\n");
  printf("   r_var      : %f\n", r_var);
  printf("   var1       : %f\n", var1 );
  printf("   var2       : %d\n", var2 );
  printf("   c_int_array: %d, %d\n", c_int_array[0], c_int_array[99]);

  r_var = 2 * r_var;
  var1 = 77.77;
  var2 = 17;
  for(i=0; i<100; i++ ) {
     c_int_array[i] = c_int_array[i] * 3;
  }

}  // end of global_var_module

! Example of module/global variable interoperability. 
! Needs c function from module_c.c
! ********************************************************************
module module_example_mod
  use, intrinsic :: iso_c_binding
  real(c_float) r_var
  integer i_array(100)
  real(c_double) :: var1
  integer(c_int) :: var2
BIND(C,name="c_int_array") :: i_array
BIND(C) :: r_var, var1, var2
end module module_example_mod

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program module_example
  use module_example_mod
  implicit none

call sub1()

end program module_example

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine sub1( )
  use module_example_mod

  interface  ! for the c function
     subroutine global_var_module( )  bind(c)
       use,intrinsic :: iso_c_binding
       implicit none
     end subroutine global_var_module
  end interface

r_var = -99.3
var1 = 88.88
var2 = -13
i_array = [(i,i=1,100)]


! call the c function
call global_var_module( )

print *, "In sub1"
print *, "  r_var : ", r_var
print *, "  var1  : ", var1
print *, "  var2  : ", var2
print *, "  array : ", i_array(1), i_array(100)

end subroutine