Cray LibSci

Description

Cray LibSci is a collection of numerical routines tuned for performance on CPUs on Cray systems. Most users, on most codes, will find they obtain better performance by using calls to Cray LibSci routines in their applications instead of calls to public domain or user‐written versions.

Note: Additionally, Cray EX series systems also make use of the Cray LibSci Accelerator routines for enhanced performance on AMD and Nvidia GPU compute nodes. For more information, see Cray LibSci_ACC.

Most Cray LibSci components contain both single‐processor and parallel routines explicitly optimized to make the best use of Cray systems and interconnect architectures. The general components of Cray LibSci are:

BLAS (Basic Linear Algebra Subroutines)
CBLAS (C interface to the legacy BLAS)
BLACS (Basic Linear Algebra Communication Subprograms)
LAPACK (Linear Algebra routines)
ScaLAPACK (parallel Linear Algebra routines)
CrayBLAS: a library of BLAS routines highly optimized for Cray systems.

Serial and Parallel Libraries

Cray LibSci includes both serial non‐threaded and parallel multi‐threaded (OpenMP) versions of Cray LibSci. In addition, the libraries are structured so that routines that must be linked with an MPI library (ScaLAPACK) are separate from those that do not require MPI linking (BLAS and LAPACK).

This table shows the standard set of libraries included in Cray LibSci, where comp is one of the supported compilers: cray, gnu, amd, aocc, intel, and nvidia.

Library Name	Link Type	Threading	MPI
libsci_comp.a	Static	Serial non‐threaded	non‐MPI
libsci_comp.so	Dynamic Shared	Serial non‐threaded	non‐MPI
libsci_comp_mp.a	Static	Parallel multi‐threaded (OpenMP)	non‐MPI
libsci_comp_mp.so	Dynamic Shared	Parallel multi‐threaded (OpenMP)	non‐MPI
libsci_comp_mpi.a	Static	Serial non‐threaded	MPI
libsci_comp_mpi.so	Dynamic Shared	Serial non‐threaded	MPI
libsci_comp_mpi_mp.a	Static	Parallel multi‐threaded (OpenMP)	MPI
libsci_comp_mpi_mp.so	Dynamic Shared	Parallel multi‐threaded (OpenMP)	MPI

If the parallel multi‐threaded (OpenMP) version is linked into an application, the library uses the value of OMP_NUM_THREADS to determine the maximum number of threads to use.

Linking

Link line generation is controlled by craype. It is compiler‐specific and based on user‐specified options.

Compiler	Default Link	To switch OpenMP threaded libs
CCE	serial non‐threaded libs	`‐homp` (fortran) or `‐fopenmp` (C/C++)
GNU	serial non‐threaded libs	`‐fopenmp`
AMD	serial non-threaded libs	`-fopenmp`
AOCC	serial non-threaded libs	`-fopenmp`
Intel	serial non‐threaded libs	`‐qopenmp`
Nvidia	serial non-threaded libs	`-mp`

Overriding CPU Architecture Selection

By default, Cray LibSci automatically detects the CPU architecture and selects the optimal LibSci kernels for that CPU. However, the user can override this behavior at run time by setting the environment variable LIBSCI_ARCH_OVERRIDE.

Stack Size and Segmentation Faults

Cray LibSci allocates internal buffers onto the stack and therefore expects an unlimited stack size. If your application segfaults when linked to Cray LibSci, try setting the stack size from the command line, using the ulimit ‐s unlimited command. If this is not possible, set the environment variable CRAYBLAS_ALLOC_TYPE to 2 on Cray EX and XD platforms.

Environment Variables

LIBSCI_ARCH_OVERRIDE

    If set, overrides automatic CPU architecture detection and
    forces use of the specified Cray LibSci kernels. The valid
    values for x86_64 are: haswell, broadwell, naples, rome,
    milan, trento, skylake, cascadelake, icelake, sapphirerapids,
    genoa, bergamo, mi300a, and turin. The valid values for
    aarch64 are: thunderx2, tx2, and grace.

    Default: not set (automatic CPU detection enabled)