Cray LibSci

Description

Cray LibSci is a collection of numerical routines tuned for performance on CPUs on Cray systems. Most users, on most codes, will find they obtain better performance by using calls to Cray LibSci routines in their applications instead of calls to public domain or user‐written versions.

Note: Additionally, Cray EX series systems also make use of the Cray LibSci Accelerator routines for enhanced performance on AMD and Nvidia GPU compute nodes. For more information, see Cray LibSci_ACC.

Most Cray LibSci components contain both single‐processor and parallel routines explicitly optimized to make the best use of Cray systems and interconnect architectures. The general components of Cray LibSci are:

  • BLAS (Basic Linear Algebra Subroutines)

  • CBLAS (C interface to the legacy BLAS)

  • BLACS (Basic Linear Algebra Communication Subprograms)

  • LAPACK (Linear Algebra routines)

  • ScaLAPACK (parallel Linear Algebra routines)

  • CrayBLAS: a library of BLAS routines highly optimized for Cray systems.

Serial and Parallel Libraries

Cray LibSci includes both serial non‐threaded and parallel multi‐threaded (OpenMP) versions of Cray LibSci. In addition, the libraries are structured so that routines that must be linked with an MPI library (ScaLAPACK) are separate from those that do not require MPI linking (BLAS and LAPACK).

This table shows the standard set of libraries included in Cray LibSci, where comp is one of the supported compilers: cray, gnu, amd, aocc, intel, and nvidia.

Library Name

Link Type

Threading

MPI

libsci_comp.a

Static

Serial non‐threaded

non‐MPI

libsci_comp.so

Dynamic Shared

Serial non‐threaded

non‐MPI

libsci_comp_mp.a

Static

Parallel multi‐threaded (OpenMP)

non‐MPI

libsci_comp_mp.so

Dynamic Shared

Parallel multi‐threaded (OpenMP)

non‐MPI

libsci_comp_mpi.a

Static

Serial non‐threaded

MPI

libsci_comp_mpi.so

Dynamic Shared

Serial non‐threaded

MPI

libsci_comp_mpi_mp.a

Static

Parallel multi‐threaded (OpenMP)

MPI

libsci_comp_mpi_mp.so

Dynamic Shared

Parallel multi‐threaded (OpenMP)

MPI

If the parallel multi‐threaded (OpenMP) version is linked into an application, the library uses the value of OMP_NUM_THREADS to determine the maximum number of threads to use.

Linking

Link line generation is controlled by craype. It is compiler‐specific and based on user‐specified options.

Compiler

Default Link

To switch OpenMP threaded libs

CCE

serial non‐threaded libs

‐homp (fortran) or ‐fopenmp (C/C++)

GNU

serial non‐threaded libs

‐fopenmp

AMD

serial non-threaded libs

-fopenmp

AOCC

serial non-threaded libs

-fopenmp

Intel

serial non‐threaded libs

‐qopenmp

Nvidia

serial non-threaded libs

-mp

Overriding CPU Architecture Selection

By default, Cray LibSci automatically detects the CPU architecture and selects the optimal LibSci kernels for that CPU. However, the user can override this behavior at run time by setting the environment variable LIBSCI_ARCH_OVERRIDE.

Stack Size and Segmentation Faults

Cray LibSci allocates internal buffers onto the stack and therefore expects an unlimited stack size. If your application segfaults when linked to Cray LibSci, try setting the stack size from the command line, using the ulimit ‐s unlimited command. If this is not possible, set the environment variable CRAYBLAS_ALLOC_TYPE to 2 on Cray EX and XD platforms.

Environment Variables

LIBSCI_ARCH_OVERRIDE

    If set, overrides automatic CPU architecture detection and
    forces use of the specified Cray LibSci kernels. The valid
    values for x86_64 are: haswell, broadwell, naples, rome,
    milan, trento, skylake, cascadelake, icelake, sapphirerapids,
    genoa, bergamo, mi300a, and turin. The valid values for
    aarch64 are: thunderx2, tx2, and grace.

    Default: not set (automatic CPU detection enabled)

See Also