ScaLAPACK

Description

The ScaLAPACK library contains routines for solving real or complex general, triangular, or positive definite distributed systems. It also contains routines for reducing distributed matrices to condensed form and an eigenvalue problem solver for real symmetric distributed matrices. Finally, it also includes a set of routines that perform basic operations involving distributed matrices and vectors, the PBLAS.

For more information about the ScaLAPACK library, see the ScaLAPACK web site http://www.netlib.org/scalapack/.

Changes from Public Domain Version

ScaLAPACK is a software package provided by Univ. of Tennessee; Univ. of California, Berkeley; Univ. of Colorado, Denver. A version of the package is available in the public domain on the World Wide Web at the following URL http://www.netlib.org/.

The calling sequences to all ScaLAPACK routines remain unchanged.

Initialization

Some of the ScaLAPACK routines require the Basic Linear Algebra Communication Subprograms (BLACS) to be initialized. This can be done through a call to BLACS_GRIDINIT. Finally, each distributed array that is passed as an argument to a ScaLAPACK routine, requires a descriptor, which is set through a call to DESCINIT. If a call is required, it is documented on the man page for the routine.

Environment Variables

Environment Variables for PBLAS Routines

LIBSCI_ALT_PGEMM

    If set, uses an alternate, experimental algorithm for the
    pdgemm and pzgemm routines that may be more optimal for
    select use cases. Users should independently verify if use
    of the alternate, experimental algorithm may be more
    functionally faster than the default algorithm for their
    application.

    Default: not set

LIBSCI_ALT_SCALA

    If set, uses an alternate set of SCALAPACK communication
    parameters at runtime that may yield better performance for
    select applications and use cases.

    Default: not set

Environment Variables for Eigensolvers

LIBSCI_OPT_PDSYEV

    If set, SCALAPACK routines replace the default pdsyev
    algorithm (real symmetric eigensolver) with a divide‐and‐
    conquer algorithm for optimized performance for the case
    where both eigenvectors and eigenvalues are calculated. This
    optimization temporarily sets some additional memory scratch
    space internally. It does not requires users to change their
    existing pdsyev calls.

    Default: not set

LIBSCI_OPT_PZHEEV

    If set, SCALAPACK routines replace the default pzheev
    algorithm (complex Hermitian eigensolver) with a divide‐and‐
    conquer algorithm for optimized performance for the case
    where both eigenvectors and eigenvalues are calculated. This
    optimization temporarily sets some additional memory scratch
    space internally. It does not requires users to change their
    existing pzheev calls.

    Default: not set

LIBSCI_OPT_PDSYEVD

    If set, SCALAPACK routines replace the default pdsyevd
    algorithm (real symmetric eigensolver) with a MRRR algorithm
    that may provide added performance. Users are required to
    place a workplace query with the environment variable set at
    runtime to set appropriate workspaces for the function call.

    Default: not set

LIBSCI_USE_ELPA

    ELPA is not provided with cray‐libsci, however, if this env
    var is set to 1, an external user version of ELPA can be
    used as a backend for the following SCALAPACK eigenvalue
    solver routines in cray‐libsci: pdsyev, pdsyevd, pdsyevr,
    pzheev, pzheevd, pzheevr. If this environment variable is
    set, the external ELPA library must be provided during
    application linking via the PE_SCI_EXT_LIBPATH and
    PE_SCI_EXT_LIBNAME environment variables. The backend uses
    the ELPA API associated with ELPA release 2016.05.004. ELPA
    releases with different APIs are not supported.
    Additionally, dynamic linking is not supported on Cray XC
    systems with this variable. For Cray CS systems, the
    LD_LIBRARY_PATH or the ld.so.cache will need to be modified
    so as to reflect the location of the library used in
    PE_SCI_EXT_LIBPATH.

    For building ELPA with the Cray Programming Environment,
    refer to ${CRAY_LIBSCI_DIR}/etc/README.

    Default: not set

PE_SCI_EXT_LIBPATH

    If set, the value is inserted into the linker´s path during
    linking.
    eg: export PE_SCI_EXT_LIBPATH="‐L/path/to/elpa/lib"

    Default: not set

PE_SCI_EXT_LIBNAME

    If set, the value is passed to the linker during the linking
    step.
    eg: export PE_SCI_EXT_LIBNAME="‐Wl,‐‐undefined=elpa_get_communicators\ ‐lelpa_openmp"

    Note: If using the LIBSCI_USE_ELPA variable, the
    "‐Wl,‐‐undefined=elpa_get_communicators\ " is necessary.
    Otherwise, the ELPA library may not get included in the
    compiled application binary.

Environment Variables for SVD Routines

LIBSCI_OPT_PDGESVD

    If set, uses the alternate polar‐decomposition algorithm
    pdgeqsvd to complete the SVD computation that has seen
    improved performance relative to PDGESVD, specifically with
    ill‐conditioned matrices. This optimization temporarily sets
    additional memory scratch space internally. It does not
    require users to change their existing PDGESVD calls. Note
    that the JOBVT and JOBU arguments are ignored if this
    environment variable is set, and the left and singular
    vectors are always returned. Additionally, a transpose of
    the VT matrix is performed that is not done for direct calls
    to pdgeqsvd. This is done to comply with PDGESVD
    documentation. For more info, see intro_qdwh(3) and
    pdgeqsvd(3).

Overview of Environment Variables for LU and Cholesky Routines

The LU and Cholesky routines in this version of ScaLAPACK have been modified to allow the user to choose a broadcast algorithm before or during program execution. The default value of the BLACS broadcast has been changed from S‐Ring to I‐Ring to take advantage of the fastest BLACS broadcast algorithm.

Note: Testing has shown performance improvements of up to 30% for LU routines and 10% for Cholesky routines when using the new default broadcast value. The performance improvement is highly dependent on the shape and size of the grid. The best performance numbers have been seen with rectangular grids, i.e., Q=2P or Q=4P. In some cases, performance improvements of up to 600% have been seen.

The following environment variables enable the user to specify the broadcast algorithm before beginning program execution. These environment variables can also be set during program execution by using the associated helper routines.

Environment Variables for LU Routines

SCALAPACK_LU_RBCAST

    Defines the type of row‐wise broadcast topology used for
    P[C,S,D,Z]GETRF routines.

    Default: IRING

    Other routines affected: P[C,S,D,Z]GEV and P[C,S,D,Z]GEVX

SCALAPACK_LU_CBCAST

    Defines the type of column‐wise broadcast topology used for
    P[C,S,D,Z]GETRF routines.

    Default: MPI

    Other routines affected: P[C,S,D,Z]GEV and P[C,S,D,Z]GEVX

Environment Variables for Cholesky Routines

SCALAPACK_LLT_RBCAST

    Defines the type of row‐wise broadcast topology used for a
    lower triangular matrix for P[C,S,D,Z]POTRF routines.

    Default: IRING

    Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
    P[S,D]SYGVX, and P[C,Z]HEGVX

SCALAPACK_LLT_CBCAST

    Defines the type of column‐wise broadcast topology used for
    a lower triangular matrix for P[C,S,D,Z]POTRF routines.

    Default: MPI

    Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
    P[S,D]SYGVX, and P[C,Z]HEGVX

SCALAPACK_UTU_RBCAST

    Defines the type of row‐wise broadcast topology used for an
    upper triangular matrix for P[C,S,D,Z]POTRF routines.

    Default: MPI

    Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
    P[S,D]SYGVX, and P[C,Z]HEGVX

SCALAPACK_UTU_CBCAST

    Defines the type of column‐wise broadcast topology used for
    an upper triangular matrix for P[C,S,D,Z]POTRF routines.

    Default: IRING

    Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
    P[S,D]SYGVX, and P[C,Z]HEGVX

Broadcast Topology Strings

All the above environment variables use the following strings to specify the broadcast topology.

IRING     increasing ring
DRING     decreasing ring
SRING     split ring
MRING     multi‐ring
HYPR      hypercube
MPI       MPI bcast
TREE      tree
FULL      fully connected

Helper Routines

The following helper routines enable users to set the respective environment variables from inside a program, where the bcast_str argument is one of the above broadcast topology strings.

cray_scalapack_lu_rbcast_set(bcast_str)
cray_scalapack_lu_cbcast_set(bcast_str)
cray_scalapack_llt_rbcast_set(bcast_str)
cray_scalapack_llt_cbcast_set(bcast_str)
cray_scalapack_utu_rbcast_set(bcast_str)
cray_scalapack_utu_cbcast_set(bcast_str)