Cray LibSci
Description
Cray LibSci is a collection of numerical routines tuned for performance on CPUs on Cray systems. Most users, on most codes, will find they obtain better performance by using calls to Cray LibSci routines in their applications instead of calls to public domain or user‐written versions.
Note: Additionally, Cray EX series systems also make use of the Cray LibSci Accelerator routines for enhanced performance on AMD and Nvidia GPU compute nodes. For more information, see Cray LibSci_ACC.
Most Cray LibSci components contain both single‐processor and parallel routines explicitly optimized to make the best use of Cray systems and interconnect architectures. The general components of Cray LibSci are:
BLAS (Basic Linear Algebra Subroutines)
CBLAS (C interface to the legacy BLAS)
BLACS (Basic Linear Algebra Communication Subprograms)
LAPACK (Linear Algebra routines)
ScaLAPACK (parallel Linear Algebra routines)
CrayBLAS: a library of BLAS routines highly optimized for Cray systems.
Serial and Parallel Libraries
Cray LibSci includes both serial non‐threaded and parallel multi‐threaded (OpenMP) versions of Cray LibSci. In addition, the libraries are structured so that routines that must be linked with an MPI library (ScaLAPACK) are separate from those that do not require MPI linking (BLAS and LAPACK).
This table shows the standard set of libraries included in Cray LibSci,
where comp
is one of the supported compilers:
cray, gnu, amd, aocc, intel, and nvidia.
Library Name |
Link Type |
Threading |
MPI |
---|---|---|---|
libsci_comp.a |
Static |
Serial non‐threaded |
non‐MPI |
libsci_comp.so |
Dynamic Shared |
Serial non‐threaded |
non‐MPI |
libsci_comp_mp.a |
Static |
Parallel multi‐threaded (OpenMP) |
non‐MPI |
libsci_comp_mp.so |
Dynamic Shared |
Parallel multi‐threaded (OpenMP) |
non‐MPI |
libsci_comp_mpi.a |
Static |
Serial non‐threaded |
MPI |
libsci_comp_mpi.so |
Dynamic Shared |
Serial non‐threaded |
MPI |
libsci_comp_mpi_mp.a |
Static |
Parallel multi‐threaded (OpenMP) |
MPI |
libsci_comp_mpi_mp.so |
Dynamic Shared |
Parallel multi‐threaded (OpenMP) |
MPI |
If the parallel multi‐threaded (OpenMP) version is linked into an
application, the library uses the value of OMP_NUM_THREADS
to
determine the maximum number of threads to use.
Linking
Link line generation is controlled by craype. It is compiler‐specific and based on user‐specified options.
Compiler |
Default Link |
To switch OpenMP threaded libs |
---|---|---|
CCE |
serial non‐threaded libs |
|
GNU |
serial non‐threaded libs |
|
AMD |
serial non-threaded libs |
|
AOCC |
serial non-threaded libs |
|
Intel |
serial non‐threaded libs |
|
Nvidia |
serial non-threaded libs |
|
Overriding CPU Architecture Selection
By default, Cray LibSci automatically detects the CPU architecture and
selects the optimal LibSci kernels for that CPU. However, the user can
override this behavior at run time by setting the environment variable
LIBSCI_ARCH_OVERRIDE
.
Stack Size and Segmentation Faults
Cray LibSci allocates internal buffers onto the stack and therefore
expects an unlimited stack size. If your application segfaults when
linked to Cray LibSci, try setting the stack size from the command
line, using the ulimit ‐s unlimited
command. If this is not possible,
set the environment variable CRAYBLAS_ALLOC_TYPE
to 2
on Cray
EX and XD platforms.
Environment Variables
LIBSCI_ARCH_OVERRIDE
If set, overrides automatic CPU architecture detection and
forces use of the specified Cray LibSci kernels. The valid
values for x86_64 are: haswell, broadwell, naples, rome,
milan, trento, skylake, cascadelake, icelake, sapphirerapids,
genoa, bergamo, mi300a, and turin. The valid values for
aarch64 are: thunderx2, tx2, and grace.
Default: not set (automatic CPU detection enabled)