ScaLAPACK
Description
The ScaLAPACK library contains routines for solving real or complex general, triangular, or positive definite distributed systems. It also contains routines for reducing distributed matrices to condensed form and an eigenvalue problem solver for real symmetric distributed matrices. Finally, it also includes a set of routines that perform basic operations involving distributed matrices and vectors, the PBLAS.
For more information about the ScaLAPACK library, see the ScaLAPACK web site http://www.netlib.org/scalapack/.
Changes from Public Domain Version
ScaLAPACK is a software package provided by Univ. of Tennessee; Univ. of California, Berkeley; Univ. of Colorado, Denver. A version of the package is available in the public domain on the World Wide Web at the following URL http://www.netlib.org/.
The calling sequences to all ScaLAPACK routines remain unchanged.
Initialization
Some of the ScaLAPACK routines require the Basic Linear Algebra
Communication Subprograms (BLACS) to be initialized. This can be done
through a call to BLACS_GRIDINIT
. Finally, each distributed array that
is passed as an argument to a ScaLAPACK routine, requires a
descriptor, which is set through a call to DESCINIT
. If a call is
required, it is documented on the man page for the routine.
Environment Variables
Environment Variables for PBLAS Routines
LIBSCI_ALT_PGEMM
If set, uses an alternate, experimental algorithm for the
pdgemm and pzgemm routines that may be more optimal for
select use cases. Users should independently verify if use
of the alternate, experimental algorithm may be more
functionally faster than the default algorithm for their
application.
Default: not set
LIBSCI_ALT_SCALA
If set, uses an alternate set of SCALAPACK communication
parameters at runtime that may yield better performance for
select applications and use cases.
Default: not set
Environment Variables for Eigensolvers
LIBSCI_OPT_PDSYEV
If set, SCALAPACK routines replace the default pdsyev
algorithm (real symmetric eigensolver) with a divide‐and‐
conquer algorithm for optimized performance for the case
where both eigenvectors and eigenvalues are calculated. This
optimization temporarily sets some additional memory scratch
space internally. It does not requires users to change their
existing pdsyev calls.
Default: not set
LIBSCI_OPT_PZHEEV
If set, SCALAPACK routines replace the default pzheev
algorithm (complex Hermitian eigensolver) with a divide‐and‐
conquer algorithm for optimized performance for the case
where both eigenvectors and eigenvalues are calculated. This
optimization temporarily sets some additional memory scratch
space internally. It does not requires users to change their
existing pzheev calls.
Default: not set
LIBSCI_OPT_PDSYEVD
If set, SCALAPACK routines replace the default pdsyevd
algorithm (real symmetric eigensolver) with a MRRR algorithm
that may provide added performance. Users are required to
place a workplace query with the environment variable set at
runtime to set appropriate workspaces for the function call.
Default: not set
LIBSCI_USE_ELPA
ELPA is not provided with cray‐libsci, however, if this env
var is set to 1, an external user version of ELPA can be
used as a backend for the following SCALAPACK eigenvalue
solver routines in cray‐libsci: pdsyev, pdsyevd, pdsyevr,
pzheev, pzheevd, pzheevr. If this environment variable is
set, the external ELPA library must be provided during
application linking via the PE_SCI_EXT_LIBPATH and
PE_SCI_EXT_LIBNAME environment variables. The backend uses
the ELPA API associated with ELPA release 2016.05.004. ELPA
releases with different APIs are not supported.
Additionally, dynamic linking is not supported on Cray XC
systems with this variable. For Cray CS systems, the
LD_LIBRARY_PATH or the ld.so.cache will need to be modified
so as to reflect the location of the library used in
PE_SCI_EXT_LIBPATH.
For building ELPA with the Cray Programming Environment,
refer to ${CRAY_LIBSCI_DIR}/etc/README.
Default: not set
PE_SCI_EXT_LIBPATH
If set, the value is inserted into the linker´s path during
linking.
eg: export PE_SCI_EXT_LIBPATH="‐L/path/to/elpa/lib"
Default: not set
PE_SCI_EXT_LIBNAME
If set, the value is passed to the linker during the linking
step.
eg: export PE_SCI_EXT_LIBNAME="‐Wl,‐‐undefined=elpa_get_communicators\ ‐lelpa_openmp"
Note: If using the LIBSCI_USE_ELPA variable, the
"‐Wl,‐‐undefined=elpa_get_communicators\ " is necessary.
Otherwise, the ELPA library may not get included in the
compiled application binary.
Environment Variables for SVD Routines
LIBSCI_OPT_PDGESVD
If set, uses the alternate polar‐decomposition algorithm
pdgeqsvd to complete the SVD computation that has seen
improved performance relative to PDGESVD, specifically with
ill‐conditioned matrices. This optimization temporarily sets
additional memory scratch space internally. It does not
require users to change their existing PDGESVD calls. Note
that the JOBVT and JOBU arguments are ignored if this
environment variable is set, and the left and singular
vectors are always returned. Additionally, a transpose of
the VT matrix is performed that is not done for direct calls
to pdgeqsvd. This is done to comply with PDGESVD
documentation. For more info, see intro_qdwh(3) and
pdgeqsvd(3).
Overview of Environment Variables for LU and Cholesky Routines
The LU and Cholesky routines in this version of ScaLAPACK have been modified to allow the user to choose a broadcast algorithm before or during program execution. The default value of the BLACS broadcast has been changed from S‐Ring to I‐Ring to take advantage of the fastest BLACS broadcast algorithm.
Note: Testing has shown performance improvements of up to 30% for LU routines and 10% for Cholesky routines when using the new default broadcast value. The performance improvement is highly dependent on the shape and size of the grid. The best performance numbers have been seen with rectangular grids, i.e., Q=2P or Q=4P. In some cases, performance improvements of up to 600% have been seen.
The following environment variables enable the user to specify the broadcast algorithm before beginning program execution. These environment variables can also be set during program execution by using the associated helper routines.
Environment Variables for LU Routines
SCALAPACK_LU_RBCAST
Defines the type of row‐wise broadcast topology used for
P[C,S,D,Z]GETRF routines.
Default: IRING
Other routines affected: P[C,S,D,Z]GEV and P[C,S,D,Z]GEVX
SCALAPACK_LU_CBCAST
Defines the type of column‐wise broadcast topology used for
P[C,S,D,Z]GETRF routines.
Default: MPI
Other routines affected: P[C,S,D,Z]GEV and P[C,S,D,Z]GEVX
Environment Variables for Cholesky Routines
SCALAPACK_LLT_RBCAST
Defines the type of row‐wise broadcast topology used for a
lower triangular matrix for P[C,S,D,Z]POTRF routines.
Default: IRING
Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
P[S,D]SYGVX, and P[C,Z]HEGVX
SCALAPACK_LLT_CBCAST
Defines the type of column‐wise broadcast topology used for
a lower triangular matrix for P[C,S,D,Z]POTRF routines.
Default: MPI
Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
P[S,D]SYGVX, and P[C,Z]HEGVX
SCALAPACK_UTU_RBCAST
Defines the type of row‐wise broadcast topology used for an
upper triangular matrix for P[C,S,D,Z]POTRF routines.
Default: MPI
Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
P[S,D]SYGVX, and P[C,Z]HEGVX
SCALAPACK_UTU_CBCAST
Defines the type of column‐wise broadcast topology used for
an upper triangular matrix for P[C,S,D,Z]POTRF routines.
Default: IRING
Other routines affected: P[C,S,D,Z]POSV, P[C,S,D,Z]POSVX,
P[S,D]SYGVX, and P[C,Z]HEGVX
Broadcast Topology Strings
All the above environment variables use the following strings to specify the broadcast topology.
IRING increasing ring
DRING decreasing ring
SRING split ring
MRING multi‐ring
HYPR hypercube
MPI MPI bcast
TREE tree
FULL fully connected
Helper Routines
The following helper routines enable users to set the respective environment variables from inside a program, where the bcast_str argument is one of the above broadcast topology strings.
cray_scalapack_lu_rbcast_set(bcast_str)
cray_scalapack_lu_cbcast_set(bcast_str)
cray_scalapack_llt_rbcast_set(bcast_str)
cray_scalapack_llt_cbcast_set(bcast_str)
cray_scalapack_utu_rbcast_set(bcast_str)
cray_scalapack_utu_cbcast_set(bcast_str)