Copyright and Version
© Copyright 2022-2024 Hewlett Packard Enterprise Development LP. All third-party marks are the property of their respective owners.
: -LocalBuild
Doc git hash: 898e74b1bcdba046cce65e32fc5aa4391548bc4d
Generated: Thu Aug 29 2024
About the HPE CPE installation guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing systems
The HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (S-8022) includes procedures to install HPE Cray Programming Environment (CPE) and Parallel Application Launch Service (PALS) with HPE Performance Cluster Manager (HPCM) on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems.
This publication is intended for system administrators receiving their first release of this product or upgrading from a previous release. The information assumes that the administrator has a good understanding of Linux system administration and HPCM.
Release information
This publication supports installing CPE 24.07 on HPE Cray Supercomputing EX systems with HPCM 1.11, HPE Cray Supercomputing Operating System (COS) 24.7 (COS Base 3.1.0/USS 1.1.0), and:
SLES15 SP5 (X86), or
SLES15 SP5 (AArch64),
This publication also supports installing CPE 24.07 on HPE Cray Supercomputing EX systems with HPCM 1.11 and:
RHEL 8.10,
RHEL 9.4 (X86), or
RHEL 9.4 (AArch64).
COS 23.11 (and later) components comprise:
COS Base
HPE Cray Supercomputing User Services Software (USS)
HPE SUSE Linux Enterprise Server
Variable substitutions
Use the following variable substitutions throughout the included procedures.
<CPE_RELEASE> =
24.07
<CPE_VERSION> =
24.07
<spX> or <SPX> =
sp5
orSP5
(applicable systems running HPE Cray Operating System 24.7)<RHELX-X> =
rhel-8.10
<RHELX-X> =
rhel-9.4
(applicable systems running X86 or AArch64)
Record of revision
New in the CPE 24.07 publication
Updated the Release information section.
Added the Downloading HPE Cray Supercomputing EX software section.
Incorporated editorial updates.
New in the CPE 24.03 publication
Updated the Release information section.
Updated the Variable substitutions section.
Updated the Install HPE Cray Programming Environment on Red Hat Enterprise Linux chapter.
Updated the Complete CPE installation on SLES section.
Updated the Module Path Aliases and Current Compatibility Versions section.
Incorporated editorial updates.
New in the CPE 23.12 publication
Updated the Release information section.
Deleted the Slurm installation note section.
Updated the Objective and Optional sections of the Install HPE Cray Programming Environment on SUSE Linux Enterprise Server chapter.
Updated the procedure in the Install Independent HPE Cray Programming Environment RPM on SUSE Linux Enterprise Server section.
Updated the procedure in the Install Independent HPE Cray Programming Environment RPM on Red Hat Enterprise Linux section.
Updated the compiler versions in the Module Path Aliases and Current Compatibility Versions section.
New in the CPE 23.09 publication
Updated the Release information and Variable substitutions sections.
Updated the Objective and the Procedure sections of the Install HPE Cray Programming Environment on SUSE Linux Enterprise Server chapter.
Updated the Objective and the Procedure sections of the Install HPE Cray Programming Environment on Red Hat Enterprise Linux chapter.
Updated the procedure in the Create modulefiles for third-party products section.
Updated the procedure in the Complete CPE Installation on RHEL section.
Moved the Configuring the ATP Slurm SPANK plugin (Conditional) procedure to its own chapter.
Removed the:
Configure a Workload Manager section in the #Install HPE Cray Programming Environment on SUSE Linux Enterprise Server* chapter
Troubleshooting slrmctld segfault on startup section
Configure a Workload Manager section in the Install HPE Cray Programming Environment on Red Hat Enterprise Linux chapter.
Configure a Workload Manager chapter
New in the CPE 23.05 publication
Updated the Release information and Variable substitutions sections.
Updated the Configure PBS on SLES for systems with Slingshot and HPE 200GB NICs and Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs sections. The procedures now clarify node types in the instructions.
Updated instructions on how to configure
vnid
,palsd
, andhodagd
in the Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs section.Updated the procedure in the Generating Slurm topology configuration section.
Updated the procedure in the Create modulefiles for third-party products section.
Added the new Troubleshooting slurmctld segfault on startup section.
Added the new Installation Prerequisites section.
Added the new Enabling PALS to use the Spindle package section.
Added the new Configuring Slingshot traffic classes in PBS section.
Updated the compatible compiler version table in the Module Path Aliases and Current Compatibility Versions section. Added the SLES compatible compiler versions alongside of the RHEL compiler version, updated the RHEL
cce
andgcc
compatible compiler versions, and added the RHEL and compatible compiler version.Made minor edits throughout the guide.
New in the CPE 23.02 (Rev. A) publication
Updated the Create modulefiles for third-party products section to provide additional detail on loading the
crypkg-gen
utility for Intel systems.Added the Enabling PALS to use the Spindle package section.
Updated the Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs and Configure PBS on SLES for Systems with Slingshot and HPE 200GB NICs sections.
New in the CPE 23.02 publication
Updated the
SwitchParameters
option listing and resources listing in Configuring Slurm for Systems with Slingshot and HPE 200GB NICs.
Publication Title |
Date |
---|---|
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (24.07) S-8022 |
August 2024 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (24.03) S-8022 |
May 2024 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.12) S-8022 |
December 2023 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.09) S-8022 |
September 2023 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.05) S-8022 |
June 2023 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02-Rev A) S-8022 |
March 2023 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02) S-8022 |
February 2023 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.12) S-8022 |
December 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.11) S-8022 |
November 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.10) S-8022 |
October 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.09) S-8022 |
September 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08 Rev A) S-8022 |
August 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08) S-8022 |
August 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.06) S-8022 |
June 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.05) S-8022 |
May 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.04) S-8022 |
April 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.03) S-8022 |
March 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.02) S-8022 |
February 2022 |
HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (21.02 - 21.12) S-8022 |
Feb - Dec 2021 |
Downloading HPE Cray Supercomputing EX software
To download HPE Cray Supercomputing EX software, refer to the HPE Support Center or download it directly from My HPE Software Center. The HPE Support Center contains a wealth of documentation, training videos, knowledge articles, and alerts for HPE Cray Supercomputing EX systems. It provides the most detailed information about a release as well as direct links to product firmware, software, and patches available through My HPE Software Center.
Downloading the software through the HPE Support Center
HPE recommends downloading software through the HPE Support Center because of the many other resources available on the website.
Visit the HPE Cray Supercomputing EX product page on the HPE Support Center.
Search for specific product info, such as the full software name or recipe name and version.
For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”
Find the desired software in the search results and select it to review details.
Select Obtain Software and select Sign in Now when prompted.
If a customer’s Entitlement Order Number (EON) is tied to specific hardware rather than software, the software is available without providing account credentials. Access the software instead by selecting Download Software and skip the next step in this procedure.
Enter account credentials when prompted and accept the HPE License Terms.
To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties on My HPE Software Center. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.
Choose the needed software and documentation files to download and select curl Copy to access the files.
Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.
HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.
To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download.
Save the text file to a central location.
On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.
For example:
ncn-m001# bash -x <TEXT_FILE_PATH>
The
-x
option in this example tracks the download progress of each curl command in the text file.
Downloading the software directly from the My HPE Software Center
Users already familiar with a release can save time by downloading software directly from My HPE Software Center.
Visit My HPE Software Center and select Sign in.
Enter account credentials when prompted and select Software in the left navigation bar.
Search for specific product info, such as the full software name or recipe name and version.
For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”
Find the desired software in the search results and review details by selecting Product Details under the Action column.
Select Go To Downloads Page and accept the HPE License Terms.
To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.
Choose the needed software and documentation files to download and select curl Copy to access the files.
Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.
HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.
To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download
Save the text file to a central location.
On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.
For example:
ncn-m001# bash -x <TEXT_FILE_PATH>
The
-x
option in this example tracks the download progress of each curl command in the text file.
Installation Prerequisites
For systems that include Scalable Unit (SU) leaders, HPE only supports
NFS for CPE deployments using leader aliases. HPCM does not support
exporting the filesystem from the admin node if leaders are in place.
All nodes deploying CPE subsequently must have a leader IP alias
assigned. Use the scalable bittorrent
transport to accommodate these
requirements. Compute nodes on an HPE Cray Supercomputing EX system are
often already set up with SU leader aliases during auto-discovery. Any
other nodes must be manually configured. Note also that multiple nodes
can be managed together by passing a list or range of nodes to -n
.
Set up the transport for service nodes. Use
bittorrent
for the transport for service nodes.IMPORTANT: This setup is required, even if you are not reinstalling the node. The running node is not affected; the node is affected only if it is provisioned again in the future.
admin# cm node set --transport bittorrent -n n1
Create the
bittorrent
tarball if it is not available yet:admin# cm image refresh --bittorrent -i ubu-22.04.1
Get the IP address of a SU leader node. The example below uses
leader1
, but any SU leader node can be used:admin# ssh leader1 ctdb ip
Pick one of the IP addresses returned in the previous step, and assign it to the nodes:
admin# cm node set -n n1 --su-leader 172.23.255.1
Install HPE Cray Programming Environment on SUSE Linux Enterprise Server
OBJECTIVE
This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system.
See the Release information section for systems running Slurm.
OPTIONAL
For HPE Cray Supercomputing EX or HPE Cray Supercomputing systems with GPU compute nodes that are not running (COS), the
rocm/x.x.x
orcudatoolkit/x.x.x
GPU toolkit modulefile is required. Go online here for environment modulefiles and pkg-config file templates.Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.
PROCEDURE
Before you begin:
Obtain required tar cpe-<CPE_RELEASE>-sles15-<spX>-hpcm-<CPE_VERSION>.tar.gz files.
Enable repositories (for the installation of cpe-support), including:
SLE Module Basesystem
SLE Module HPC
Note that to use UCX with Cray-MPICH:
Cray-MPICH using the UCX netmod is supported on SLES 15 <SPX>, or other, systems with the HPCM installer.
HPE does not distribute UCX directly.
Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distro packages provide a functional, although not necessarily performant, alternative.
Extract the CPE tarball:
admin# tar xf cpe-<CPE_RELEASE>-sles15-<spX>-hpcm-<CPE_VERSION>.tar.gz
Run the
install.sh
script, passing additional compiler environments to be installed as arguments:admin# cpe-<CPE_RELEASE>-sles15-<spX>/install.sh <amd> <aocc> <intel> <nvidia>
Enable the CPE repository:
admin# cm repo select cpe-<CPE_RELEASE>-sles15-<spX>
Import the RPM public key:
admin# rpm --root /opt/clmgr/image/images/<IMAGE_NAME> \ --import cpe-<CPE_RELEASE>-sles15-<spX>/*.asc
Install the
cm-pe-integration
andcpe-support
RPMs, included in the CPE repository, into the image:admin# cm image zypper -i <IMAGE_NAME> install cm-pe-integration cpe-support
(Use the
cm image show
command to display available images.)(Optional) Modify
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases
to indicate which CPE releases to install. The first release in the list is the default. For example, the following contents indicate that CPE 21.09 and 21.07 are to be installed with 21.09 as the default release:21.09 21.07
(Optional) While you can use either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:
In
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.csh
, changeset module_prog = environment modules
toset module_prog = lmod
In
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.sh
, changemodule_prog = environment modules
tomodule_prog = lmod
Complete CPE Installation on SLES
Disable the CPE repository:
admin# cm repo unselect cpe-<CPE_RELEASE>-sles15-<spX>
Update revision history with a comment:
admin# cm image revision commit -i <IMAGE_NAME> -m "Update CPE to <CPE_RELEASE>"
Activate the image for disk-less nodes only:
admin# cm image activate -i <IMAGE_NAME>
Verify the CPE installation:
Reboot one compute node:
admin# cm node provision -i <IMAGE_NAME> -n nid0001
Connect to the booted compute node, and verify that CPE modules are loaded.
Example:
admin# ssh nid0001 nid0001# module list Currently Loaded Modules: 1) craype-x86-rome 5) cce/17.0.0 9) cray-libsci/23.12.5 2) libfabric/1.15.2.0 6) craype/2.7.30 10) PrgEnv-cray/8.5.0 3) craype-network-ofi 7) cray-dsmml/0.2.2 4) perftools-base/23.12.0 8) cray-mpich/8.1.28
See the HPE HPCM release announcement for current CPE release product versions.
Reboot the remaining compute nodes on the system if the installation appears correct.
Install HPE Cray Programming Environment on Red Hat Enterprise Linux
OBJECTIVE
This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system
OPTIONAL
For HPE Cray EX or HPE Cray supercomputer systems with GPU compute nodes and not running the Cray Operating System (COS):
If a
rocm/x.x.x
orcudatoolkit/x.x.x
GPU toolkit modulefile is required, refer to the GPU Tookit Modulefile Templates for Cray PE, which provides environment modulefiles and pkg-config file templates.Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.
IMPORTANT: Throughout this procedure, replace instances of:
<CPE_RELEASE>
<CPE_VERSION>
<RHELX-X>
with the values specified in Release Information.
PROCEDURE
Before you begin:
Obtain the cpe-<CPE_RELEASE>-<RHELX-X>-hpcm-<CPE_VERSION>.tar.gz files.
Enable repositories (for the installation of cpe-support), including:
RHEL BaseOS
RHEL AppStream
RHEL CRB
Note that to use UCX with Cray-MPICH:
Cray-MPICH using the UCX netmod is supported on RHEL 8.7 systems with the HPCM installer.
HPE does not distribute UCX directly.
Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distro packages provide a functional, although not necessarily performant, alternative.
Extract the CPE tarball:
admin# tar xf cpe-<CPE_RELEASE>-<RHELX-X>-hpcm-<CPE_VERSION>.tar.gz
Run the
install.sh
script, passing the compiler environments to be installed as arguments:admin# cpe-<CPE_RELEASE>-<RHELX-X>/install.sh <amd> <aocc> <intel> <nvidia>
Enable the CPE repository:
admin# cm repo select cpe-<CPE_RELEASE>-<RHELX-X>
Import the RPM public key:
admin# rpm --root /opt/clmgr/image/images/<IMAGE_NAME> \ --import cpe-<CPE_RELEASE>-<RHELX-X>/*.asc
Install the
cm-pe-integration
andcpe-support
RPMs, included in the CPE repository, into the image:admin# cm image dnf -i <IMAGE_NAME> install cm-pe-integration cpe-support
(Use the
cm image show
command to display available images.)(Optional) Modify
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases
to indicate which CPE releases to install. The first release in the list is the default. For example, the following contents indicate that CPE 21.09 and 21.07 are to be installed with 21.09 as the default release:21.09 21.07
(Optional) While you can establish the use of either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:
In
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.csh
, changeset module_prog = environment modules
toset module_prog = lmod
In
/opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.sh
, changemodule_prog = environment modules
tomodule_prog = lmod
Complete CPE Installation on RHEL
Disable the CPE repository:
admin# cm repo unselect cpe-<CPE_RELEASE>-<RHELX-X>
Update revision history with a comment:
admin# cm image revision commit -i <IMAGE_NAME> -m "Update CPE to <CPE_RELEASE>"
Activate the image for disk-less nodes only:
admin# cm image activate -i <IMAGE_NAME>
Verify the CPE installation:
Reboot one compute node:
admin# cm node provision -i <IMAGE_NAME> -n nid0001
Connect to the booted compute node, and verify that CPE modules are loaded.
Example:
admin# ssh nid0001 nid0001# module list Currently Loaded Modules: 1) craype-x86-rome 5) cce/17.0.0 9) cray-libsci/23.12.5 2) libfabric/1.15.2.0 6) craype/2.7.30 10) PrgEnv-cray/8.5.0 3) craype-network-ofi 7) cray-dsmml/0.2.2 4) perftools-base/23.12.0 8) cray-mpich/8.1.28
See the HPE HPCM release announcement for current CPE release product versions.
Reboot the remaining compute nodes on the system if the installation appears correct.
Configuring the ATP Slurm SPANK plugin (Conditional)
ATP requires a Slurm plugin to start analysis tools alongside job launches. Before HPE CPE 22.10, ATP included a global Slurm plugin file. This global plugin file had to be recompiled to match the updated Slurm plugin API when Slurm was updated. To resolve this requirement, since HPE CPE 22.10, ATP has been designed to build and configure its plugin as part of the module loading process instead of relying on a single global plugin. If your Slurm system is configured to use the global ATP Slurm plugin and job launches are working as expected, it is not necessary to remove it from the system configuration.
Note that the following error might occur:
srun: error: spank: /opt/cray/pe/atp/libAtpDispatch.so: Incompatible plugin version
If you see the above error when running Slurm jobs, remove the include /etc/plugstack.conf.d/*
line from your Slurm plugin configuration file
to disable the global ATP plugin. This modification disables the use of
the potentially outdated ATP plugin. Users will still have the correct
plugin built and configured when loading the ATP module.
Create Modulefiles for third-party products
PREREQUISITES
Download and install third-party packages before initiating this procedure.
OBJECTIVE
These instructions describe how to create a modulefile for third-party
products and use crypkg-gen
to create a modulefile for a specific
version of a supported third-party product. This usage allows a site to
set a specific version as default.
PROCEDURE
The following steps are necessary and can be embedded in a script where a third-party product is being installed.
Load
craypkg-gen
module:admin# source /opt/cray/pe/modules/default/init/bash admin# module use /opt/cray/pe/modulefiles admin# module load craypkg-gen
Generate module and set default scripts for products:
AMD Optimizing C/C++ Compiler: (requires
craypkg-gen
>= 1.3.16)admin# craypkg-gen -m /opt/AMD/aocc-compiler-<MODULE_VERSION>/
NVIDIA HPC SDK (requires
craypkg-gen
>= 1.3.16)admin# craypkg-gen -m /opt/nvidia/hpc_sdk/Linux_x86_64/<MODULE_VERSION>/
Intel oneAPI
admin# craypkg-gen -m /opt/intel/oneapi/compilers/<MODULE_VERSION>/
Note: The Intel compiler must be installed in a directory or a symbolic link that follows the
<PREFIX>/oneapi/compiler/<VERSION>
format beforecraypkg-gen
can create an Intel modulefile. Thecraypkg-gen
utility creates theintel
,intel-classic
, andintel-oneapi
modulefiles after the process completes successfully.AMD ROCm
admin# craypkg-gen -m /opt/rocm-<MODULE_VERSION>
Run a
set default
script.admin# /opt/admin-pe/set_default_craypkg/set_default_<MODULE_NAME>_<MODULE_VERSION>
Lmod Custom Dynamic Hierarchy
Lmod enables a user to dynamically modify their user environment through Lua modules. The CPE implementation of Lmod capitalizes on its hierarchical structure, including the Lmod module auto-swapping functionality. This capability means that module dependencies determine the branches of the tree-like hierarchy. Lmod allows static and dynamic hierarchical module paths. Lmod provides full support for static paths, which build the hierarchy based on the current set of modules loaded. Alongside static paths, CPE implements dynamic paths for a subset of the Lmod hierarchy (compilers, networks, CPUs, and MPIs). Dynamic paths give an advanced level of flexibility for detecting multiple dependency paths and allow custom paths to join existing Lmod hierarchy in CPE without modifying customer modulefiles.
Static Lmod Hierarchy
Modules dependent on one or more modules being loaded are not visible to
a user until their prerequisite modules are loaded. When the
prerequisite modules are loaded, it adds the static paths of the
dependent modules to the MODULEPATH
environment variable, thereby
exposing the dependent modules to the user. For more detailed
information on the Lmod static module hierarchy, consult the User
Guide for
Lmod.
Dynamic Lmod Hierarchy
The CPE Lmod custom dynamic hierarchy abbreviates the overall Lmod
hierarchy tree by relying on compatibility and not directly on a
prerequisite version. Therefore, dependent modules do not need to exist
in a new branch every time their prerequisite modules change versions.
Instead, dynamic paths use a compatibility version that increases when a
new prerequisite module version breaks compatibility in some way. The
number following the path alias of the module (for example, 1.0
in
x86-rome/1.0
and ofi/1.0
) identifies the compatible version.
Module Path Aliases and Current Compatibility Versions
Compatible versions listed in the following tables include the minimum supported versions.
Compiler |
RHEL Module Alias/Compatible Version |
SLES Module Alias/Compatible Version |
---|---|---|
|
amd/4.0 |
amd/4.0 |
|
crayclang/16.0 |
crayclang/17.0 |
|
gnu/10.0 |
gnu/12.0 |
|
aocc/4.1 |
aocc/4.1 |
|
intel/2023.2 |
intel/2023.2 |
|
nvidia/20 |
nvidia/20 |
|
nvidia/23.11 |
nvidia/23.11 |
Network |
Module Alias/Compatible Version |
---|---|
|
none/1.0 |
|
ofi/1.0 |
|
ucx/1.0 |
CPU |
Module Alias/Compatible Version |
---|---|
|
x86-milan/1.0 |
|
x86-rome/1.0 |
|
x86-trento/1.0 |
MPI |
Module Alias/Compatible Version |
---|---|
|
cray-mpich/8.0 |
|
cray-mpich/8.0 |
|
cray-mpich/8.0 |
|
cray-mpich/8.0 |
|
cray-mpich/8.0 |
|
cray-mpich/8.0 |
Custom Dynamic Hierarchy
The CPE custom dynamic hierarchy extension allows custom module paths to join the existing CPE Lmod hierarchy implementation without modifying customer modulefiles. The custom dynamic module types that CPE supports include:
Compiler
Network
CPU
MPI
Compiler/Network
Compiler/CPU
Compiler/Network/CPU/MPI
As each custom dynamic module type loads, a handshake occurs using
special pre-defined environment variables. When all hierarchical
prerequisites are met, the paths of the dependent modulefiles are added
to the MODULEPATH
environment variable, thereby exposing the dependent
modules to the user.
For Lmod to assist a user optimally, load the compiler, network, CPU, and MPI module. Lmod cannot detect modules hidden in dynamic paths without one of each type of module being loaded.
Create a custom dynamic hierarchy
PREREQUISITES
Set Lmod as the default module handling system before initiating this procedure.
OBJECTIVE
For the CPE custom dynamic hierarchy to detect the desired Lmod module path, one or more custom dynamic environment variables must be created according to the requirements defined within this procedure.
PROCEDURE
To create a custom dynamic environment variable:
Begin the environment variable name with
LMOD_CUSTOM_
.Append the descriptor of the module type that the environment variable will represent. The module types and descriptors are:
Module Type
Descriptor
Compiler
COMPILER_
Network
NETWORK_
CPU
CPU_
MPI
MPI_
Compiler/Network
COMNET_
Compiler/CPU
COMCPU_
Compiler/Network/CPU/MPI
CNCM_
Example: The custom dynamic environment variable for the combined compiler and CPU module begins with
LMOD_CUSTOM_COMCPU_
.Following the descriptor, append all prerequisite module aliases along with their respective compatible versions. See Module Path Aliases and Current Compatibility Versions for more information. The format of the module path alias/compatible version string for each module type is shown below. Note that due to publishing issues, long module alias/compatible version strings are split across two lines as indicated below.
Module Type: Module Path Alias/Compatible Version String
Compiler: <compiler_name>/<compatible_version>
Network: <network_name>/<compatible_version>
CPU: <cpu_name>/<compatible_version>
MPI:
<compiler_name>/<compatible_version>/<network_name>/<compatible_version>/
<mpi_name>/<compatible_version>
Compiler/Network: <compiler_name>/<compatible_version/<network_name>/<compatible_version>
Compiler/CPU: <compiler_name>/<compatible_version>/<cpu_name>/<compatible_version>
Compiler/Network/CPU/MPI:
<compiler_name>/<compatible_version>/<network_name>/<compatible_version>/
<cpu_name>/<compatible_version>/<mpi_name>/<compatible_version>
To create an acceptably formatted environment variable name, replace all slashes and dots in the module alias/compatible version string with underscores. Also, all letters must be in uppercase format.
Example Module Path Alias/Compatible Version Strings:
Compiler =
cce
The path alias/compatible version string (values found in Module Path Aliases and Current Compatibility Versions) is
crayclang/10.0
; therefore, the text added to the environment variable name is:CRAYCLANG_10_0
Network =
craype-network-ofi
The path alias/compatible version string is
ofi/1.0
; therefore, the environment variable text is:OFI_1_0
CPU =
craype-x86-rome
The path alias/compatible version string is
x86-rome/1.0
; therefore, the environment variable text is:X86_ROME_1_0
MPI =
cray-mpich
cray-mpich
has two prerequisite module types (compiler and network). Therefore, the environment variable must include the alias/compatible version for the desired compiler, network, and MPI. For acray-mpich
module dependent oncce
andcraype-network-ofi
, the path alias/compatible version string iscrayclang/10.0/ofi/1.0/cray_mpich/8.0
; therefore, the environment variable text is:CRAYCLANG_10_0_OFI_1_0_CRAY_MPICH_8_0
.Compiler/Network =
cce
withcraype-network-ofi
The path alias/compatible version string is
crayclang/10.0/ofi/1.0
; therefore, the environment variable text is:CRAYCLANG_10_0_OFI_1_0
Compiler/CPU =
cce
withcraype-x86-rome
The path alias/compatible version string is
crayclang/10.0/x86-rome/1.0
; therefore, the environment variable text is:CRAYCLANG_10_0_X86_ROME_1_0
Compiler/Network/CPU/MPI =
cce
,craype-network-ofi
,craype-x86-rome
, andcray-mpich
The path alias/compatible version string is
crayclang/10.0/ofi/1.0/x86-rome/1.0/cray-mpich/8.0
; therefore, the environment variable text is:CRAYCLANG_10_0_OFI_1_0_X86_ROME_1_0_CRAY_MPICH_8_0
Append
_PREFIX
following the final module/compatibility text instance:Example: Network =
craype-network-ofi
The custom dynamic environment variable is
LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX
.Creation of the custom dynamic environment variable is now complete.
Add the custom dynamic environment variable to the user environment by exporting it with its value set to the Lmod module path:
# export LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX=<lmod_module_path>
Example: Network =
craype-network-ofi
All modulefiles in
<lmod_module_path>
are shown to users whenevercraype-network-ofi
is loaded.