clang - the HPE Cray Clang C, C++, UPC, and HIP compiler

SYNOPSIS

C: cc [options] filename …

C++: CC [options] filename …

UPC: cc -hupc [options] filename …

HIP: CC [options] -x hip filename …

Invoking as clang or clang++ is discouraged; doing so may not find necessary include paths and will not link automatically with HPE Cray libraries.

SUPPORT

HPE Cray supports compiling the C, C++, UPC, and HIP languages and the OpenMP parallel programming model for targets available on HPE Cray systems. Using this compiler for other languages, models, or targets is not supported; any documentation related to such features is provided as-is for reference purposes only. This man page describes changes made to the base Clang compiler, which provides more complete documentation here: https://clang.llvm.org/docs/UsersManual.html

DESCRIPTION

clang is a C, C++, and Objective-C compiler which encompasses preprocessing, parsing, optimization, code generation, assembly, and linking. Depending on which high-level mode setting is passed, Clang will stop before doing a full link. While Clang is highly integrated, it is important to understand the stages of compilation, to understand how to invoke it. These stages are:

Driver

The clang executable is actually a small driver which controls the overall execution of other tools such as the compiler, assembler and linker. Typically you do not need to interact with the driver, but you transparently use it to run the other tools.

Preprocessing

This stage handles tokenization of the input source file, macro expansion, #include expansion and handling of other preprocessor directives. The output of this stage is typically called a “.i” (for C), “.ii” (for C++), “.mi” (for Objective-C), or “.mii” (for Objective-C++) file.

Parsing and Semantic Analysis

This stage parses the input file, translating preprocessor tokens into a parse tree. Once in the form of a parse tree, it applies semantic analysis to compute types for expressions as well and determine whether the code is well formed. This stage is responsible for generating most of the compiler warnings as well as parse errors. The output of this stage is an “Abstract Syntax Tree” (AST).

Code Generation and Optimization

This stage translates an AST into low-level intermediate code (known as “LLVM IR”) and ultimately to machine code. This phase is responsible for optimizing the generated code and handling target-specific code generation. The output of this stage is typically called a “.s” file or “assembly” file.

Clang also supports the use of an integrated assembler, in which the code generator produces object files directly. This avoids the overhead of generating the “.s” file and of calling the target assembler.

Assembler

This stage runs the target assembler to translate the output of the compiler into a target object file. The output of this stage is typically called a “.o” file or “object” file.

Linker

This stage runs the target linker to merge multiple object files into an executable or dynamic library. The output of this stage is typically called an “a.out”, “.dylib” or “.so” file.

Clang Static Analyzer

The Clang Static Analyzer is a tool that scans source code to try to find bugs through code analysis. This tool uses many parts of Clang and is built into the same driver. Please see <https://clang-analyzer.llvm.org> for more details on how to use the static analyzer.

HPE CRAY ENHANCEMENTS

HPE Cray has modified Clang/LLVM to improve the performance of generated code and to provide additional features. In general, performance improvements are enabled by default at appropriate optimization levels, but features must be requested by an option.

The compiler predefines the macro __cray__ in addition to all of the usual Clang predefined macros.

General

-fcray, -fno-cray

Select the compiler’s default behavior, which provides the basis for customization by other options. The default is -fcray, which enables HPE Cray enhancements, whereas -fno-cray disables HPE Cray enhancements. The last instance of -fcray and -fno-cray applies. The position of -fcray or -fno-cray relative to other options does not matter. For example, with -fcray, other options that disable specific HPE Cray enhancements are honored, and with -fno-cray, other options that enable specific HPE Cray enhancements are honored.

Note that -fno-cray is intended to help diagnose whether a problem is caused by a HPE Cray enhancement or is present in the base Clang/LLVM distribution. Either way, the problem should be reported to HPE Cray to receive the fastest response.

-fenhanced-asm=<verbosity>: Emit descriptive comments in assembly code output. The default is -fenhanced-asm=1. Greater levels of verbosity will include more provenance information for inlined code. Use -fenhanced-asm=0 to disable.

-fenhanced-ir=<verbosity>: Emit descriptive comments in IR output. The default is -fenhanced-ir=1. Greater levels of verbosity will include more provenance information for inlined code. Use -fenhanced-ir=0 to disable.

Performance Options

Clang does not apply optimizations unless they are requested. For best performance, -Ofast with -flto is recommended. For applications that are sensitive to floating-point optimizations, it may be necessary to adjust the floating-point optimization level using one of the options below. For applications that require bit reproducibility (i.e., which are designed to calculate the same result no matter how the work is distributed among a constant product of MPI ranks and OpenMP threads), it may be necessary to forgo floating-point optimization by using -O3 instead of -Ofast.

-fast: Enables -Ofast and link-time optimization.

-ffp=<level>

Select a level for HPE Cray floating-point math optimizations and math library functions. Requesting the lowest level, -ffp=0, will generate code with the highest precision and grants the compiler minimal freedom to optimize floating-point operations, whereas requesting the highest level, -ffp=4, will grant the compiler maximal freedom to aggressively optimize but likely will result in lower precision.

Requesting levels 1 through 4 will flush denormals to zero and imply -funsafe-math-optimizations and -fno-math-errno; if those options are subsequently changed, then this option may not work as expected. With -fcray, -ffp=3 is implied by -ffast-math or -Ofast. Using -ffp=0 will prevent the use of HPE Cray math libraries and disable all HPE Cray floating-point optimizations.

Supported values for level are { 0, 1, 2, 3, 4 }.

-fcray-mallopt, -fno-cray-mallopt: Optimize malloc by using HPE Cray’s custom mallopt(3) parameters, which for most programs improves performance but may cause higher memory usage. This is a link-time option. The default is -fcray-mallopt. These optimizations can also be disabled at runtime if -fcray-mallopt was used at link-time by setting CRAY_MALLOPT_OFF.

-fivdep, -fno-ivdep: Enable or disable #pragma ivdep handling. The default is -fivdep.

-flocal-restrict, -fno-local-restrict: Honor restrict-qualified pointers declared in a block scope by assuming that they do not alias with other restrict-qualified pointers declared in the same block scope. The default is -flocal-restrict.

-floop-trips=<scale>

Optimize assuming loops with statically unknown trip counts have trip counts at the scale of <scale>.

Valid values for <scale> are:

huge

Assume loops have trip counts large enough such that referenced data will not fit in the cache.

Feature Options

-fsave-decompile: Generate decompile (.dc) and IR (.ll) files prior to optimization, vectorization, and code generation, as well as after LTO. A decompile is a higher-level presentation of the IR that looks similar to C source code, but cannot be compiled. Use the decompile to gain insight about restructuring and optimization changes made by the compiler.

-fsave-loopmark: Generate a loopmark listing file (.lst) that shows which optimizations were applied to which parts of the source code.

-floopmark-style=<style>

Specifies the style of the loopmark listing file.

Valid values for <style> are:

‘’grouped’’

Places all messages at the end of the listing.

‘’interspersed’’

Places each message after the relevant source code line.

The default value is ‘’grouped’’.

-finstrument-loops: Instrument loops to gather profile data to use with CrayPAT.

-finstrument-openmp: Turns the insertion of the CrayPat OpenMP and accelerator tracing calls on and off.

-fcray-program-library-path=<directory>

Create and use a persistent repository of compiler information specified by <directory>.

The program library repository is implemented as a directory and the information contained in the program library is built up with each compiler invocation. Any compilation that does not have the -fcray-program-library-path option will not add information to this repository.

Because of the persistence of the program library, it is the user’s responsibility to manage it. For example, “rm -r <directory>” might be added to the “make clean” target in an application Makefile. Because the program library is a directory, use “rm -r” to remove it.

If an application Makefile works by creating files in multiple directories during a single build, then <directory> should be an absolute path, otherwise multiple and incomplete program library repositories will be created. For example, avoid “-fcray-program-library-path=./pl” and instead use “-fcray-program-library-path=/fullpath/builddir/pl”.

This option may be specified with either an equals sign or a space before “<directory>”.

-fcray-trapping-math

Generate optimized trap-safe floating point code. This option disables any optimization which would introduce a trap where one did not exist in the source code.

The default is -fno-cray-trapping-math.

Linker Options

-ffpe-trap=<list>

Enable traps at runtime for the specified exceptions. This option accepts a comma separated list of values. If the specified values contradict each other, the last value has priority.

This option does not affect compile time optimizations; it detects runtime exceptions. This option is processed only at link time and affects the entire program; it is not processed when compiling subprograms. Therefore, traps may be set using this command line option at the beginning of execution of the main program only. The program may subsequently change these settings by calling intrinsic or library procedures.

The default is -ffpe-trap=none, which means no exceptions are trapped. Possible values with exceptions include:

none: Disables all traps.
invalid: Trap on invalid operation.
zero: Trap on divide-by-zero.
fp: Trap on zero, invalid, or overflow.
inexact: Trap on inexact result (or rounded result). Enabling traps for inexact results is not recommended.
overflow: Trap on overflow (or the result of an operation is too large to be represented).
underflow: Trap on underflow (or the result of an operation is too small to be represented).
denormal: Trap on denormalized operands.

Uninitialized Variable Policy Control

Uninitialized variables can be a source of programming errors. These options provide control over how the compiler treats such variables. There are separate options for integer and floating-point types so that integer variables may be initialized to zero and floating-point variables may be initialized to NaN. Many bit patterns qualify as a NaN; these options use a quiet NaN of all ones because using a repeated byte pattern makes it possible to initialize large arrays using memset. Conversely, these options apply to integral and floating-point variables which are not part of structures, because structures could require an arbitrarily complex initialization sequence.

-funinitialized-heap-ints=<uninitialized | zero>: Initializes integer memory allocated by “malloc” or “new” to zero. For this option to have any effect, the void pointer returned by malloc must be typecast immediately to a pointer to an integer type because otherwise the compiler does not know how the memory will be used. For example, (int*)malloc(…).

-funinitialized-heap-floats=<uninitialized | nan>: Initializes floating-point memory allocated by “malloc” or “new” to a quiet NaN of all ones. For this option to have any effect, the void pointer returned by malloc must be typecast immediately to a pointer to a floating-point type because otherwise the compiler does not know how the memory will be used. For example, (double*)malloc(…).

-funinitialized-stack-ints=<uninitialized | zero>: Initializes stack integer variables to zero. If the -ftrivial-auto-var-init option is present, then it has precedence and this option does nothing.

-funinitialized-stack-floats=<uninitialized | nan>: Initializes stack floating-point variables to NaN. If the -ftrivial-auto-var-init option is present, then it has precedence and this option does nothing.

-funinitialized-static-floats=<zero | nan>: Initializes static floating-point variables to NaN.

Unified Parallel C (UPC) Options

-hupc, -hdefault: -hupc configures the compiler driver to expect UPC source code. Source files with a .upc extension are automatically treated as UPC code, but this option permits a file with any other extension (typically .c) to be understood as UPC code. -hdefault cancels this behavior; if both -hupc and -hdefault appear in a command line, whichever appears last takes precedence and applies to all source files in the command line.

-fupc-auto-amo, -fno-upc-auto-amo: Automatically use network atomics for remote updates to reduce latency. For example, x += 1 can be performed as a remote atomic add. If an update is recognized as local to the current thread, then no atomic is used. These atomics are intended as a performance optimization only and shall not be relied upon to prevent race conditions. Enabled at -O1 and above.

-fupc-buffered-async, -fno-upc-buffered-async: Set aside memory in the UPC runtime library for aggregating random remote accesses designated with “#pragma pgas buffered_async”. Disabled by default.

-fupc-pattern, -fno-upc-pattern: Identify simple communication loops and aggregate the remote accesses into a single function call which replaces the loop. Enabled at -O1 and above.

-fupc-threads=<N>: Set the number of threads for a static THREADS translation. This option causes __UPC_STATIC_THREADS__ to be defined instead of __UPC_DYNAMIC_THREADS__ and replaces all uses of the UPC keyword THREADS with the value N.

HIP Support and Options

HIP is supported only for AMD GPU targets and requires an AMD ROCm install for HIP header files and runtime libraries.

Several flags must be specified explicitly to compile and link HIP source files. For example, the following command lines will compile and link a HIP source file targeting an AMD MI250X GPU:

CC –offload-arch=gfx90a –rocm-path=<ROCM-INSTALL-PATH> -c -x hip [options] filename …

CC –rocm-path=<ROCM-INSTALL-PATH> [options] filename …

The following compiler options are relevant for compiling and linking HIP source files:

-x hip: Enable HIP compilation for any input files that appear after this option on the command line. This option should not be used on a link line with object files as input, since CCE will treat the object files as HIP source. Or, the -x none flag can be used to cancel a prior -x hip flag on the link line.

--rocm-path=<ROCM-INSTALL-PATH>: Specifies the location of a ROCm install; used to locate HIP header files.

--offload-arch=<gfx908 | gfx90a | gfx942>

Specifies the HIP offload target architecture. CCE currently supports gfx908 (AMD MI100), gfx90a (AMD MI250X), and gfx942 (AMD MI300A). This flag can be specified multiple times to produce a “fat binary” that contains device code for multiple GPUs.

This flag also accepts the LLVM target ID syntax, which is a target processor followed by a colon-delimited list of processor features. Each feature is a pre-defined string, xnack or sramecc, followed by a plus or minus sign to enable or disable the setting (e.g., gfx90a:xnack+ or gfx90a:xnack-). Any unspecified processor features receive a default value of “any”, which ensures the resulting executable will run correctly on a processor with or without that feature. The xnack processor feature is needed to run with unified memory for AMD GPUs.

--cuda-offload-arch=<gfx908 | gfx90a | gfx942>: A synonym for --offload-arch.

-fgpu-rdc, -fno-gpu-rdc: Generate relocatable device code, allowing separate compilation of HIP source files with cross-file references. Compiling with -fgpu-rdc will produce a bundled HIP offload object file that requires linking with --hip-link. Compiling with -fno-gpu-rdc will produce ordinary host object files that do not need to be linked with --hip-link. However, -fno-gpu-rdc requires that all HIP device code in a HIP source file must be completely self-contained, without referencing any external user-defined symbols. The default is -fno-gpu-rdc.

--hip-link: Enables device linking for bundled HIP offload object files. This option is required when linking object files compiled with -fgpu-rdc.

--munsafe-fp-atomics, -mno-unsafe-fp-atomics: Enables use of native floating-point atomic instructions, which are not used at default for AMD MI250X GPUs because they are only safe for coarse-grained memory; floating-point atomic instructions operating on fine-grained memory will be silently ignored. In general, memory granularity can not be determined statically, so at default the compiler will always generate atomic compare-and-swap loops for floating-point atomic operations. (Integer atomic instructions, including atomic compare-and-swap, are safe for any memory granularity.) The -munsafe-fp-atomics compiler flag may be used to enable generation of native floating-point atomic instructions, but users are responsible for ensuring that atomic operations do not target fine-grained memory. The default is -mno-unsafe-fp-atomics, which prevents the compiler from generating native floating-point atomic instructions for operations that may target fine-grained memory at runtime.

Language Extensions

#pragma ivdep

When placed before a for, while, or do while loop, #pragma ivdep causes the compiler to ignore vector dependencies in the loop, including explicit dependencies, when attempting to vectorize the loop. This allows the compiler to vectorize many loops that are potentially unsafe to vectorize.

Reductions within the loop are allowed, except for reductions into global arrays. For example, a[0] += 3 is not allowed if a is a global array.

Even with #pragma ivdep, conditions other than vector dependencies can still inhibit vectorization.

INTEROPERABILITY

CCE Fortran

Mixed-language programs that exchange long double data between Fortran and C or C++ object files will not work correctly on x86 targets. CCE Fortran assumes a 64-bit C_LONG_DOUBLE type, whereas Clang uses an 80-bit long double type padded to 128 bits of storage. To assist in making such programs work, the following options are available. IMPORTANT: When using a non-default long double format, avoid passing the long double data to library functions which will continue to expect the default format.

-mlong-double-64: Make the x86 “long double” type equivalent to the “double” type. This type matches CCE Fortran C_DOUBLE or C_LONG_DOUBLE.

-mlong-double-128: Make the x86 “long double” type equivalent to the “__float128” type. This type matches CCE Fortran C_FLOAT128.

-mlong-double-80: Make the x86 “long double” type equivalent to an 80-bit floating-point type that is padded to 128 bits of storage. This option is the default.

Other options related to CCE Fortran:

-ffortran-byte-swap-io: Tell the Fortran runtime I/O subsystem to byte-swap input and output files for direct and sequential unformatted I/O. This is a link-time option to be used when linking with CCE Fortran object files.

OPTIONS

Stage Selection Options

-E: Run the preprocessor stage.

-fsyntax-only: Run the preprocessor, parser and semantic analysis stages.

-S: Run the previous stages as well as LLVM generation and optimization stages and target-specific code generation, producing an assembly file.

-c: Run all of the above, plus the assembler, generating a target “.o” object file.

no stage selection option: If no stage selection option is specified, all stages above are run, and the linker is run to combine the results into an executable or shared library.

Language Selection and Mode Options

-x <language>: Treat subsequent input files as having type language.

-std=<standard>

Specify the language standard to compile for.

Supported values for the C language are:

c89

c90

iso9899:1990

ISO C 1990

iso9899:199409

ISO C 1990 with amendment 1

gnu89

gnu90

ISO C 1990 with GNU extensions

c99

iso9899:1999

ISO C 1999

gnu99

ISO C 1999 with GNU extensions

c11

iso9899:2011

ISO C 2011

gnu11

ISO C 2011 with GNU extensions

c17

iso9899:2017

ISO C 2017

gnu17

ISO C 2017 with GNU extensions

The default C language standard is gnu17, except on PS4, where it is gnu99.

Supported values for the C++ language are:

c++98

c++03

ISO C++ 1998 with amendments

gnu++98

gnu++03

ISO C++ 1998 with amendments and GNU extensions

c++11

ISO C++ 2011 with amendments

gnu++11

ISO C++ 2011 with amendments and GNU extensions

c++14

ISO C++ 2014 with amendments

gnu++14

ISO C++ 2014 with amendments and GNU extensions

c++17

ISO C++ 2017 with amendments

gnu++17

ISO C++ 2017 with amendments and GNU extensions

c++20

ISO C++ 2020 with amendments

gnu++20

ISO C++ 2020 with amendments and GNU extensions

c++23

ISO C++ 2023 with amendments

gnu++23

ISO C++ 2023 with amendments and GNU extensions

c++2c

Working draft for C++2c

gnu++2c

Working draft for C++2c with GNU extensions

The default C++ language standard is gnu++17.

Supported values for the OpenCL language are:

cl1.0

OpenCL 1.0

cl1.1

OpenCL 1.1

cl1.2

OpenCL 1.2

cl2.0

OpenCL 2.0

The default OpenCL language standard is cl1.0.

Supported values for the CUDA language are:

cuda

NVIDIA CUDA(tm)

-stdlib=<library>: Specify the C++ standard library to use; supported options are libstdc++ and libc++. If not specified, platform default will be used.

-rtlib=<library>: Specify the compiler runtime library to use; supported options are libgcc and compiler-rt. If not specified, compiler-rt will be used when -fcray is enabled, otherwise the platform default will be used.

-ansi: Same as -std=c89.

-ObjC, -ObjC++: Treat source input files as Objective-C and Object-C++ inputs respectively.

-trigraphs: Enable trigraphs.

-ffreestanding: Indicate that the file should be compiled for a freestanding, not a hosted, environment. Note that it is assumed that a freestanding environment will additionally provide memcpy, memmove, memset and memcmp implementations, as these are needed for efficient codegen for many programs.

-fno-builtin: Disable special handling and optimizations of well-known library functions, like strlen() and malloc().

-fno-builtin-<function>: Disable special handling and optimizations for the specific library function. For example, -fno-builtin-strlen removes any special handling for the strlen() library function.

-fno-builtin-std-<function>

Disable special handling and optimizations for the specific C++ standard library function in namespace std. For example, -fno-builtin-std-move_if_noexcept removes any special handling for the std::move_if_noexcept() library function.

For C standard library functions that the C++ standard library also provides in namespace std, use -fno-builtin-<function> instead.

-fmath-errno: Indicate that math functions should be treated as updating errno.

-fpascal-strings: Enable support for Pascal-style strings with “\pfoo”.

-fms-extensions: Enable support for Microsoft extensions.

-fmsc-version=: Set _MSC_VER. When on Windows, this defaults to either the same value as the currently installed version of cl.exe, or 1933. Not set otherwise.

-fborland-extensions: Enable support for Borland extensions.

-fwritable-strings: Make all string literals default to writable. This disables uniquing of strings and other optimizations.

-flax-vector-conversions, -flax-vector-conversions=<kind>, -fno-lax-vector-conversions

Allow loose type checking rules for implicit vector conversions. Possible values of <kind>:

none: allow no implicit conversions between vectors
integer: allow implicit bitcasts between integer vectors of the same overall bit-width
all: allow implicit bitcasts between any vectors of the same overall bit-width

<kind> defaults to integer if unspecified.

-fblocks: Enable the “Blocks” language feature.

-fobjc-abi-version=version: Select the Objective-C ABI version to use. Available versions are 1 (legacy “fragile” ABI), 2 (non-fragile ABI 1), and 3 (non-fragile ABI 2).

-fobjc-nonfragile-abi-version=<version>: Select the Objective-C non-fragile ABI version to use by default. This will only be used as the Objective-C ABI when the non-fragile ABI is enabled (either via -fobjc-nonfragile-abi, or because it is the platform default).

-fobjc-nonfragile-abi, -fno-objc-nonfragile-abi: Enable use of the Objective-C non-fragile ABI. On platforms for which this is the default ABI, it can be disabled with -fno-objc-nonfragile-abi.

Target Selection Options

Clang fully supports cross compilation as an inherent part of its design. Depending on how your version of Clang is configured, it may have support for a number of cross compilers, or may only support a native target.

-arch <architecture>: Specify the architecture to build for (Mac OS X specific).

-target <architecture>: Specify the architecture to build for (all platforms).

-mmacos-version-min=<version>: When building for macOS, specify the minimum version supported by your application.

-miphoneos-version-min: When building for iPhone OS, specify the minimum version supported by your application.

--print-supported-cpus: Print out a list of supported processors for the given target (specified through --target=<architecture> or -arch <architecture>). If no target is specified, the system default target will be used.

-mcpu=?, -mtune=?: Acts as an alias for --print-supported-cpus.

-mcpu=help, -mtune=help: Acts as an alias for --print-supported-cpus.

-march=<cpu>: Specify that Clang should generate code for a specific processor family member and later. For example, if you specify -march=i486, the compiler is allowed to generate instructions that are valid on i486 and later processors, but which may not exist on earlier ones.

--print-enabled-extensions: Prints the list of extensions that are enabled for the target specified by the combination of –target, -march, and -mcpu values. Currently, this option is only supported on AArch64 and RISC-V. On RISC-V, this option also prints out the ISA string of enabled extensions.

--print-supported-extensions: Prints the list of all extensions that are supported for every CPU target for an architecture (specified through --target=<architecture> or -arch <architecture>). If no target is specified, the system default target will be used. Currently, this option is only supported on AArch64 and RISC-V.

Code Generation Options

-O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4: Specify which optimization level to use:

-O0 Means “no optimization”: this level compiles the fastest and generates the most debuggable code.

-O1 Somewhere between -O0 and -O2.

-O2 Moderate level of optimization which enables most optimizations.

-O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).

-Ofast Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards. This is deprecated in Clang 19 and a warning is emitted that -O3 in combination with -ffast-math should be used instead if the request for non-standard math behavior is intended. There is no timeline yet for removal; the aim is to discourage use of -Ofast due to the surprising behavior of an optimization flag changing the observable behavior of correct code.

-Os Like -O2 with extra optimizations to reduce code size.

-Oz Like -Os (and thus -O2), but reduces code size further.

-Og Like -O1. In future versions, this option might disable different optimizations in order to improve debuggability.

-O Equivalent to -O1.

-O4 and higher

Currently equivalent to -O3

-g, -gline-tables-only, -gmodules: Control debug information output. Note that Clang debug information works best at -O0. When more than one option starting with -g is specified, the last one wins:

-g Generate debug information.

-gline-tables-only Generate only line table debug information. This allows for symbolicated backtraces with inlining information, but does not include any information about variables, their locations or types.

-gmodules Generate debug information that contains external references to types defined in Clang modules or precompiled headers instead of emitting redundant debug type information into every object file. This option transparently switches the Clang module format to object file containers that hold the Clang module together with the debug information. When compiling a program that uses Clang modules or precompiled headers, this option produces complete debug information with faster compile times and much smaller object files.

This option should not be used when building static libraries for distribution to other machines because the debug info will contain references to the module cache on the machine the object files in the library were built on.

-fstandalone-debug -fno-standalone-debug

Clang supports a number of optimizations to reduce the size of debug information in the binary. They work based on the assumption that the debug type information can be spread out over multiple compilation units. For instance, Clang will not emit type definitions for types that are not needed by a module and could be replaced with a forward declaration. Further, Clang will only emit type info for a dynamic C++ class in the module that contains the vtable for the class.

The -fstandalone-debug option turns off these optimizations. This is useful when working with 3rd-party libraries that don’t come with debug information. This is the default on Darwin. Note that Clang will never emit type information for types that are not referenced at all by the program.

-feliminate-unused-debug-types: By default, Clang does not emit type information for types that are defined but not used in a program. To retain the debug info for these unused types, the negation -fno-eliminate-unused-debug-types can be used.

-fexceptions: Allow exceptions to be thrown through Clang compiled stack frames (on many targets, this will enable unwind information for functions that might have an exception thrown through them). For most targets, this is enabled by default for C++.

-ftrapv: Generate code to catch integer overflow errors. Signed integer overflow is undefined in C. With this flag, extra code is generated to detect this and abort when it happens.

-fvisibility: This flag sets the default visibility level.

-fcommon, -fno-common: This flag specifies that variables without initializers get common linkage. It can be disabled with -fno-common.

-ftls-model=<model>: Set the default thread-local storage (TLS) model to use for thread-local variables. Valid values are: “global-dynamic”, “local-dynamic”, “initial-exec” and “local-exec”. The default is “global-dynamic”. The default model can be overridden with the tls_model attribute. The compiler will try to choose a more efficient model if possible.

-flto, -flto=full, -flto=thin, -emit-llvm

Generate output files in LLVM formats, suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files (which may be passed to the linker depending on the stage selection options).

The default for -flto is “full”, in which the LLVM bitcode is suitable for monolithic Link Time Optimization (LTO), where the linker merges all such modules into a single combined module for optimization. With “thin”, ThinLTO compilation is invoked instead.

Note

On Darwin, when using -flto along with -g and compiling and linking in separate steps, you also need to pass -Wl,-object_path_lto,<lto-filename>.o at the linking step to instruct the ld64 linker not to delete the temporary object file generated during Link Time Optimization (this flag is automatically passed to the linker by Clang if compilation and linking are done in a single step). This allows debugging the executable as well as generating the .dSYM bundle using dsymutil(1).

Driver Options

-###: Print (but do not run) the commands to run for this compilation.

--help: Display available options.

-Qunused-arguments: Do not emit any warnings for unused driver arguments.

-Wa,<args>: Pass the comma separated arguments in args to the assembler.

-Wl,<args>: Pass the comma separated arguments in args to the linker.

-Wp,<args>: Pass the comma separated arguments in args to the preprocessor.

-Xanalyzer <arg>: Pass arg to the static analyzer.

-Xassembler <arg>: Pass arg to the assembler.

-Xlinker <arg>: Pass arg to the linker.

-Xpreprocessor <arg>: Pass arg to the preprocessor.

-o <file>: Write output to file.

-print-file-name=<file>: Print the full library path of file.

-print-libgcc-file-name: Print the library path for the currently used compiler runtime library (“libgcc.a” or “libclang_rt.builtins.*.a”).

-print-prog-name=<name>: Print the full program path of name.

-print-search-dirs: Print the paths used for finding libraries and programs.

-save-temps: Save intermediate compilation results.

-save-stats, -save-stats=cwd, -save-stats=obj

Save internal code generation (LLVM) statistics to a file in the current directory (-save-stats/”-save-stats=cwd”) or the directory of the output file (“-save-state=obj”).

You can also use environment variables to control the statistics reporting. Setting CC_PRINT_INTERNAL_STAT to 1 enables the feature, the report goes to stdout in JSON format.

Setting CC_PRINT_INTERNAL_STAT_FILE to a file path makes it report statistics to the given file in the JSON format.

Note that -save-stats take precedence over CC_PRINT_INTERNAL_STAT and CC_PRINT_INTERNAL_STAT_FILE.

-integrated-as, -no-integrated-as: Used to enable and disable, respectively, the use of the integrated assembler. Whether the integrated assembler is on by default is target dependent.

-time: Time individual commands.

-ftime-report: Print timing summary of each stage of compilation.

-v: Show commands to run and use verbose output.

Diagnostics Options

-fshow-column, -fshow-source-location, -fcaret-diagnostics, -fdiagnostics-fixit-info, -fdiagnostics-parseable-fixits, -fdiagnostics-print-source-range-info, -fprint-source-range-info, -fdiagnostics-show-option, -fmessage-length: These options control how Clang prints out information about diagnostics (errors and warnings). Please see the Clang User’s Manual for more information.

Preprocessor Options

-D<macroname>=<value>: Adds an implicit #define into the predefines buffer which is read before the source file is preprocessed.

-U<macroname>: Adds an implicit #undef into the predefines buffer which is read before the source file is preprocessed.

-include <filename>: Adds an implicit #include into the predefines buffer which is read before the source file is preprocessed.

-I<directory>: Add the specified directory to the search path for include files.

-F<directory>: Add the specified directory to the search path for framework include files.

-nostdinc: Do not search the standard system directories or compiler builtin directories for include files.

-nostdlibinc: Do not search the standard system directories for include files, but do search compiler builtin include directories.

-nobuiltininc: Do not search clang’s builtin directory for include files.

-nostdinc++: Do not search the system C++ standard library directory for include files.

-fkeep-system-includes

Usable only with -E. Do not copy the preprocessed content of “system” headers to the output; instead, preserve the #include directive. This can greatly reduce the volume of text produced by -E which can be helpful when trying to produce a “small” reproduceable test case.

This option does not guarantee reproduceability, however. If the including source defines preprocessor symbols that influence the behavior of system headers (for example, _XOPEN_SOURCE) the operation of -E will remove that definition and thus can change the semantics of the included header. Also, using a different version of the system headers (especially a different version of the STL) may result in different behavior. Always verify the preprocessed file by compiling it separately.

ENVIRONMENT

MALLOC_MMAP_MAX_: Specifies the maximum number of memory chunks to allocate with mmap. When -fcray-mallopt (default) is used, the compiler changes this from the glibc default to 0. For most HPC programs, runtime performance is improved by this setting, but more memory may be consumed. The default glibc behavior can be restored by linking with -fno-cray-mallopt or setting CRAY_MALLOPT_OFF at runtime. A custom setting of MALLOC_MMAP_MAX_ (see mallopt(3) for details) will also override this HPE Cray change.

MALLOC_TRIM_THRESHOLD_: Specifies the minimum size of the unused memory region at the top of the heap before the region is returned to the operating system. When -fcray-mallopt (default) is used, the compiler changes this from the glibc default to 536870912 bytes. For most HPC programs, runtime performance is improved by this setting, but more memory may be consumed. The default glibc behavior can be restored by linking with -fno-cray-mallopt or setting CRAY_MALLOPT_OFF at runtime. A custom setting of MALLOC_TRIM_THRESHOLD_ (see mallopt(3) for details) will also override this HPE Cray change.

TMPDIR, TEMP, TMP: These environment variables are checked, in order, for the location to write temporary files used during the compilation process.

CPATH

If this environment variable is present, it is treated as a delimited list of paths to be added to the default system include path list. The delimiter is the platform dependent delimiter, as used in the PATH environment variable.

Empty components in the environment variable are ignored.

C_INCLUDE_PATH, OBJC_INCLUDE_PATH, CPLUS_INCLUDE_PATH, OBJCPLUS_INCLUDE_PATH: These environment variables specify additional paths, as for CPATH, which are only used when processing the appropriate language.

MACOSX_DEPLOYMENT_TARGET: If -mmacos-version-min is unspecified, the default deployment target is read from this environment variable. This option only affects Darwin targets.

BUGS

Please report bugs to HPE Cray. Most bug reports should include preprocessed source files (use the -E option) and the full output of the compiler, along with information to reproduce.