About the HPE CPE installation guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing systems

The HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (S-8022) includes procedures to install HPE Cray Programming Environment (CPE) and Parallel Application Launch Service (PALS) with HPE Performance Cluster Manager (HPCM) on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems.

This publication is intended for system administrators receiving their first release of this product or upgrading from a previous release. The information assumes that the administrator has a good understanding of Linux system administration and HPCM.

Release information

This publication supports installing CPE 24.07 on HPE Cray Supercomputing EX systems with HPCM 1.11, HPE Cray Supercomputing Operating System (COS) 24.7 (COS Base 3.1.0/USS 1.1.0), and:

  • SLES15 SP5 (X86), or

  • SLES15 SP5 (AArch64),

This publication also supports installing CPE 24.07 on HPE Cray Supercomputing EX systems with HPCM 1.11 and:

  • RHEL 8.10,

  • RHEL 9.4 (X86), or

  • RHEL 9.4 (AArch64).

COS 23.11 (and later) components comprise:

  • COS Base

  • HPE Cray Supercomputing User Services Software (USS)

  • HPE SUSE Linux Enterprise Server

Variable substitutions

Use the following variable substitutions throughout the included procedures.

  • <CPE_RELEASE> = 24.07

  • <CPE_VERSION> = 24.07

  • <spX> or <SPX> = sp5 or SP5 (applicable systems running HPE Cray Operating System 24.7)

  • <RHELX-X> = rhel-8.10

  • <RHELX-X> = rhel-9.4 (applicable systems running X86 or AArch64)

Record of revision

New in the CPE 24.07 publication

New in the CPE 24.03 publication

New in the CPE 23.12 publication

New in the CPE 23.09 publication

New in the CPE 23.05 publication

  • Updated the Release information and Variable substitutions sections.

  • Updated the Configure PBS on SLES for systems with Slingshot and HPE 200GB NICs and Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs sections. The procedures now clarify node types in the instructions.

  • Updated instructions on how to configure vnid, palsd, and hodagd in the Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs section.

  • Updated the procedure in the Generating Slurm topology configuration section.

  • Updated the procedure in the Create modulefiles for third-party products section.

  • Added the new Troubleshooting slurmctld segfault on startup section.

  • Added the new Installation Prerequisites section.

  • Added the new Enabling PALS to use the Spindle package section.

  • Added the new Configuring Slingshot traffic classes in PBS section.

  • Updated the compatible compiler version table in the Module Path Aliases and Current Compatibility Versions section. Added the SLES compatible compiler versions alongside of the RHEL compiler version, updated the RHEL cce and gcc compatible compiler versions, and added the RHEL and compatible compiler version.

  • Made minor edits throughout the guide.

New in the CPE 23.02 (Rev. A) publication

  • Updated the Create modulefiles for third-party products section to provide additional detail on loading the crypkg-gen utility for Intel systems.

  • Added the Enabling PALS to use the Spindle package section.

  • Updated the Configure PBS on RHEL for systems with Slingshot and HPE 200GB NICs and Configure PBS on SLES for Systems with Slingshot and HPE 200GB NICs sections.

New in the CPE 23.02 publication

  • Updated the SwitchParameters option listing and resources listing in Configuring Slurm for Systems with Slingshot and HPE 200GB NICs.

Publication Title

Date

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray Supercomputing EX and HPE Cray Supercomputing Systems (24.07) S-8022

August 2024

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (24.03) S-8022

May 2024

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.12) S-8022

December 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.09) S-8022

September 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.05) S-8022

June 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02-Rev A) S-8022

March 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (23.02) S-8022

February 2023

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.12) S-8022

December 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.11) S-8022

November 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.10) S-8022

October 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.09) S-8022

September 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08 Rev A) S-8022

August 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.08) S-8022

August 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.06) S-8022

June 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.05) S-8022

May 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.04) S-8022

April 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.03) S-8022

March 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (22.02) S-8022

February 2022

HPE Cray Programming Environment Installation Guide: HPCM on HPE Cray EX and HPE Cray Supercomputer Systems (21.02 - 21.12) S-8022

Feb - Dec 2021

Downloading HPE Cray Supercomputing EX software

To download HPE Cray Supercomputing EX software, refer to the HPE Support Center or download it directly from My HPE Software Center. The HPE Support Center contains a wealth of documentation, training videos, knowledge articles, and alerts for HPE Cray Supercomputing EX systems. It provides the most detailed information about a release as well as direct links to product firmware, software, and patches available through My HPE Software Center.

Downloading the software through the HPE Support Center

HPE recommends downloading software through the HPE Support Center because of the many other resources available on the website.

  1. Visit the HPE Cray Supercomputing EX product page on the HPE Support Center.

  2. Search for specific product info, such as the full software name or recipe name and version.

    For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”

  3. Find the desired software in the search results and select it to review details.

  4. Select Obtain Software and select Sign in Now when prompted.

    If a customer’s Entitlement Order Number (EON) is tied to specific hardware rather than software, the software is available without providing account credentials. Access the software instead by selecting Download Software and skip the next step in this procedure.

  5. Enter account credentials when prompted and accept the HPE License Terms.

    To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties on My HPE Software Center. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.

  6. Choose the needed software and documentation files to download and select curl Copy to access the files.

    Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.

    HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.

    To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download.

  7. Save the text file to a central location.

  8. On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.

    For example:

    ncn-m001# bash -x <TEXT_FILE_PATH>
    

    The -x option in this example tracks the download progress of each curl command in the text file.

Downloading the software directly from the My HPE Software Center

Users already familiar with a release can save time by downloading software directly from My HPE Software Center.

  1. Visit My HPE Software Center and select Sign in.

  2. Enter account credentials when prompted and select Software in the left navigation bar.

  3. Search for specific product info, such as the full software name or recipe name and version.

    For example, search for “Slingshot 2.1” or “Cray System Software with CSM 24.3.0.”

  4. Find the desired software in the search results and review details by selecting Product Details under the Action column.

    Image of Product Detailsoption

  5. Select Go To Downloads Page and accept the HPE License Terms.

    To download software, customers must ensure their Entitlement Order Number (EON) is active under My Contracts & Warranties. If customers have trouble with the EON or are not entitled to a product, they must contact their HPE contract administrator or sales representative for assistance.

  6. Choose the needed software and documentation files to download and select curl Copy to access the files.

    Just like the software files, the documentation files change with each release. In addition to the official documentation, valuable information for a release is often available in files that include the phrase README in their name. Be sure to select and review these files in detail.

    HPE recommends the curl Copy option, which downloads a single text file with curl commands to use on the desired system. You must run the curl commands within 24 hours of downloading them or download new commands if more than 24 hours have passed.

    To validate the security of the downloads, you can later compare the files on the desired system against the checksums provided by HPE underneath each selected download

  7. Save the text file to a central location.

  8. On the system where the software will be downloaded, run a shell script to execute the text file that includes the curl commands.

    For example:

    ncn-m001# bash -x <TEXT_FILE_PATH>
    

    The -x option in this example tracks the download progress of each curl command in the text file.

Installation Prerequisites

For systems that include Scalable Unit (SU) leaders, HPE only supports NFS for CPE deployments using leader aliases. HPCM does not support exporting the filesystem from the admin node if leaders are in place. All nodes deploying CPE subsequently must have a leader IP alias assigned. Use the scalable bittorrent transport to accommodate these requirements. Compute nodes on an HPE Cray Supercomputing EX system are often already set up with SU leader aliases during auto-discovery. Any other nodes must be manually configured. Note also that multiple nodes can be managed together by passing a list or range of nodes to -n.

  1. Set up the transport for service nodes. Use bittorrent for the transport for service nodes.

    IMPORTANT: This setup is required, even if you are not reinstalling the node. The running node is not affected; the node is affected only if it is provisioned again in the future.

    admin# cm node set --transport bittorrent -n n1

  2. Create the bittorrent tarball if it is not available yet:

    admin# cm image refresh --bittorrent -i ubu-22.04.1

  3. Get the IP address of a SU leader node. The example below uses leader1, but any SU leader node can be used:

    admin# ssh leader1 ctdb ip

  4. Pick one of the IP addresses returned in the previous step, and assign it to the nodes:

    admin# cm node set -n n1 --su-leader 172.23.255.1

Install HPE Cray Programming Environment on SUSE Linux Enterprise Server

OBJECTIVE

This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system.

See the Release information section for systems running Slurm.

OPTIONAL

  • For HPE Cray Supercomputing EX or HPE Cray Supercomputing systems with GPU compute nodes that are not running (COS), the rocm/x.x.x or cudatoolkit/x.x.x GPU toolkit modulefile is required. Go online here for environment modulefiles and pkg-config file templates.

  • Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.

PROCEDURE

  1. Before you begin:

    1. Obtain required tar cpe-<CPE_RELEASE>-sles15-<spX>-hpcm-<CPE_VERSION>.tar.gz files.

    2. Enable repositories (for the installation of cpe-support), including:

      • SLE Module Basesystem

      • SLE Module HPC

    3. Note that to use UCX with Cray-MPICH:

      • Cray-MPICH using the UCX netmod is supported on SLES 15 <SPX>, or other, systems with the HPCM installer.

      • HPE does not distribute UCX directly.

      • Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distro packages provide a functional, although not necessarily performant, alternative.

  2. Extract the CPE tarball:

    admin# tar xf cpe-<CPE_RELEASE>-sles15-<spX>-hpcm-<CPE_VERSION>.tar.gz 
    
  3. Run the install.sh script, passing additional compiler environments to be installed as arguments:

    admin# cpe-<CPE_RELEASE>-sles15-<spX>/install.sh <amd> <aocc> <intel> <nvidia> 
    
  4. Enable the CPE repository:

    admin# cm repo select cpe-<CPE_RELEASE>-sles15-<spX>
    
  5. Import the RPM public key:

    admin# rpm --root /opt/clmgr/image/images/<IMAGE_NAME> \
    --import cpe-<CPE_RELEASE>-sles15-<spX>/*.asc
    
  6. Install the cm-pe-integration and cpe-support RPMs, included in the CPE repository, into the image:

    admin# cm image zypper -i <IMAGE_NAME> install cm-pe-integration cpe-support
    

    (Use the cm image show command to display available images.)

  7. (Optional) Modify /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases to indicate which CPE releases to install. The first release in the list is the default. For example, the following contents indicate that CPE 21.09 and 21.07 are to be installed with 21.09 as the default release:

    21.09
    21.07
    
  8. (Optional) While you can use either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:

    1. In /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.csh, change set module_prog = environment modules to set module_prog = lmod

    2. In /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.sh, change module_prog = environment modules to module_prog = lmod

Complete CPE Installation on SLES

  1. Disable the CPE repository:

    admin# cm repo unselect cpe-<CPE_RELEASE>-sles15-<spX>
    
  2. Update revision history with a comment:

    admin# cm image revision commit -i <IMAGE_NAME> -m "Update CPE to <CPE_RELEASE>"
    
  3. Activate the image for disk-less nodes only:

    admin# cm image activate -i <IMAGE_NAME>
    
  4. Verify the CPE installation:

    1. Reboot one compute node:

      admin# cm node provision -i <IMAGE_NAME> -n nid0001
      
    2. Connect to the booted compute node, and verify that CPE modules are loaded.

      Example:

      admin# ssh nid0001
      nid0001# module list
      Currently Loaded Modules:
      1) craype-x86-rome            5) cce/17.0.0             9) cray-libsci/23.12.5
      2) libfabric/1.15.2.0         6) craype/2.7.30         10) PrgEnv-cray/8.5.0
      3) craype-network-ofi         7) cray-dsmml/0.2.2
      4) perftools-base/23.12.0     8) cray-mpich/8.1.28
      

      See the HPE HPCM release announcement for current CPE release product versions.

  5. Reboot the remaining compute nodes on the system if the installation appears correct.

Install HPE Cray Programming Environment on Red Hat Enterprise Linux

OBJECTIVE

This procedure provides instructions for installing Cray Programming Environment (CPE), and optionally sets Lmod as the default module handling system

OPTIONAL

For HPE Cray EX or HPE Cray supercomputer systems with GPU compute nodes and not running the Cray Operating System (COS):

  • If a rocm/x.x.x or cudatoolkit/x.x.x GPU toolkit modulefile is required, refer to the GPU Tookit Modulefile Templates for Cray PE, which provides environment modulefiles and pkg-config file templates.

  • Systems running COS typically have GPU toolkit modulefiles pre-installed and ready for use.

IMPORTANT: Throughout this procedure, replace instances of:

  • <CPE_RELEASE>

  • <CPE_VERSION>

  • <RHELX-X>

with the values specified in Release Information.

PROCEDURE

  1. Before you begin:

    1. Obtain the cpe-<CPE_RELEASE>-<RHELX-X>-hpcm-<CPE_VERSION>.tar.gz files.

    2. Enable repositories (for the installation of cpe-support), including:

      • RHEL BaseOS

      • RHEL AppStream

      • RHEL CRB

    3. Note that to use UCX with Cray-MPICH:

      • Cray-MPICH using the UCX netmod is supported on RHEL 8.7 systems with the HPCM installer.

      • HPE does not distribute UCX directly.

      • Mellanox provides a UCX solution as a part of their HPC-X software toolkit. This solution is the recommended path. Open source and Linux distro packages provide a functional, although not necessarily performant, alternative.

  2. Extract the CPE tarball:

    admin# tar xf cpe-<CPE_RELEASE>-<RHELX-X>-hpcm-<CPE_VERSION>.tar.gz
    
  3. Run the install.sh script, passing the compiler environments to be installed as arguments:

    admin# cpe-<CPE_RELEASE>-<RHELX-X>/install.sh <amd> <aocc> <intel> <nvidia>
    
  4. Enable the CPE repository:

    admin# cm repo select cpe-<CPE_RELEASE>-<RHELX-X>
    
  5. Import the RPM public key:

    admin# rpm --root /opt/clmgr/image/images/<IMAGE_NAME> \
    --import cpe-<CPE_RELEASE>-<RHELX-X>/*.asc
    
  6. Install the cm-pe-integration and cpe-support RPMs, included in the CPE repository, into the image:

    admin# cm image dnf -i <IMAGE_NAME> install cm-pe-integration cpe-support
    

    (Use the cm image show command to display available images.)

  7. (Optional) Modify /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/pe_releases to indicate which CPE releases to install. The first release in the list is the default. For example, the following contents indicate that CPE 21.09 and 21.07 are to be installed with 21.09 as the default release:

    21.09
    21.07
    
  8. (Optional) While you can establish the use of either the default CPE Module Environment system or Lmod on a sitewide basis (as the systems are mutually exclusive and cannot both run on the same system), set Lmod as the default module handling system in the image:

    1. In /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.csh, change set module_prog = environment modules to set module_prog = lmod

    2. In /opt/clmgr/image/images/<IMAGE_NAME>/etc/cray-pe.d/cray-pe-configuration.sh, change module_prog = environment modules to module_prog = lmod

Complete CPE Installation on RHEL

  1. Disable the CPE repository:

    admin# cm repo unselect cpe-<CPE_RELEASE>-<RHELX-X>
    
  2. Update revision history with a comment:

    admin# cm image revision commit -i <IMAGE_NAME> -m "Update CPE to <CPE_RELEASE>"
    
  3. Activate the image for disk-less nodes only:

    admin# cm image activate -i <IMAGE_NAME>
    
  4. Verify the CPE installation:

    1. Reboot one compute node:

      admin# cm node provision -i <IMAGE_NAME> -n nid0001
      
    2. Connect to the booted compute node, and verify that CPE modules are loaded.

      Example:

      admin# ssh nid0001
      nid0001# module list
      Currently Loaded Modules:
      1) craype-x86-rome            5) cce/17.0.0             9) cray-libsci/23.12.5
      2) libfabric/1.15.2.0         6) craype/2.7.30         10) PrgEnv-cray/8.5.0
      3) craype-network-ofi         7) cray-dsmml/0.2.2
      4) perftools-base/23.12.0     8) cray-mpich/8.1.28
      

      See the HPE HPCM release announcement for current CPE release product versions.

    1. Reboot the remaining compute nodes on the system if the installation appears correct.

Configuring the ATP Slurm SPANK plugin (Conditional)

ATP requires a Slurm plugin to start analysis tools alongside job launches. Before HPE CPE 22.10, ATP included a global Slurm plugin file. This global plugin file had to be recompiled to match the updated Slurm plugin API when Slurm was updated. To resolve this requirement, since HPE CPE 22.10, ATP has been designed to build and configure its plugin as part of the module loading process instead of relying on a single global plugin. If your Slurm system is configured to use the global ATP Slurm plugin and job launches are working as expected, it is not necessary to remove it from the system configuration.

Note that the following error might occur:

srun: error: spank: /opt/cray/pe/atp/libAtpDispatch.so: Incompatible plugin version

If you see the above error when running Slurm jobs, remove the include /etc/plugstack.conf.d/* line from your Slurm plugin configuration file to disable the global ATP plugin. This modification disables the use of the potentially outdated ATP plugin. Users will still have the correct plugin built and configured when loading the ATP module.

Create Modulefiles for third-party products

PREREQUISITES

Download and install third-party packages before initiating this procedure.

OBJECTIVE

These instructions describe how to create a modulefile for third-party products and use crypkg-gen to create a modulefile for a specific version of a supported third-party product. This usage allows a site to set a specific version as default.

PROCEDURE

The following steps are necessary and can be embedded in a script where a third-party product is being installed.

  1. Load craypkg-gen module:

    admin# source /opt/cray/pe/modules/default/init/bash
    admin# module use /opt/cray/pe/modulefiles
    admin# module load craypkg-gen
    
  2. Generate module and set default scripts for products:

    AMD Optimizing C/C++ Compiler: (requires craypkg-gen >= 1.3.16)

    admin# craypkg-gen -m /opt/AMD/aocc-compiler-<MODULE_VERSION>/
    

    NVIDIA HPC SDK (requires craypkg-gen >= 1.3.16)

    admin# craypkg-gen -m /opt/nvidia/hpc_sdk/Linux_x86_64/<MODULE_VERSION>/
    

    Intel oneAPI

    admin# craypkg-gen -m /opt/intel/oneapi/compilers/<MODULE_VERSION>/
    

    Note: The Intel compiler must be installed in a directory or a symbolic link that follows the <PREFIX>/oneapi/compiler/<VERSION> format before craypkg-gen can create an Intel modulefile. The craypkg-gen utility creates the intel, intel-classic, and intel-oneapi modulefiles after the process completes successfully.

    AMD ROCm

    admin# craypkg-gen -m /opt/rocm-<MODULE_VERSION>
    
  3. Run a set default script.

    admin# /opt/admin-pe/set_default_craypkg/set_default_<MODULE_NAME>_<MODULE_VERSION>
    

Lmod Custom Dynamic Hierarchy

Lmod enables a user to dynamically modify their user environment through Lua modules. The CPE implementation of Lmod capitalizes on its hierarchical structure, including the Lmod module auto-swapping functionality. This capability means that module dependencies determine the branches of the tree-like hierarchy. Lmod allows static and dynamic hierarchical module paths. Lmod provides full support for static paths, which build the hierarchy based on the current set of modules loaded. Alongside static paths, CPE implements dynamic paths for a subset of the Lmod hierarchy (compilers, networks, CPUs, and MPIs). Dynamic paths give an advanced level of flexibility for detecting multiple dependency paths and allow custom paths to join existing Lmod hierarchy in CPE without modifying customer modulefiles.

Static Lmod Hierarchy

Modules dependent on one or more modules being loaded are not visible to a user until their prerequisite modules are loaded. When the prerequisite modules are loaded, it adds the static paths of the dependent modules to the MODULEPATH environment variable, thereby exposing the dependent modules to the user. For more detailed information on the Lmod static module hierarchy, consult the User Guide for Lmod.

Dynamic Lmod Hierarchy

The CPE Lmod custom dynamic hierarchy abbreviates the overall Lmod hierarchy tree by relying on compatibility and not directly on a prerequisite version. Therefore, dependent modules do not need to exist in a new branch every time their prerequisite modules change versions. Instead, dynamic paths use a compatibility version that increases when a new prerequisite module version breaks compatibility in some way. The number following the path alias of the module (for example, 1.0 in x86-rome/1.0 and ofi/1.0) identifies the compatible version.

Module Path Aliases and Current Compatibility Versions

Compatible versions listed in the following tables include the minimum supported versions.

Compiler

RHEL Module Alias/Compatible Version

SLES Module Alias/Compatible Version

amd

amd/4.0

amd/4.0

cce

crayclang/16.0

crayclang/17.0

gcc

gnu/10.0

gnu/12.0

aocc

aocc/4.1

aocc/4.1

intel

intel/2023.2

intel/2023.2

nvidia(x86)

nvidia/20

nvidia/20

nvidia(aarch64)

nvidia/23.11

nvidia/23.11

Network

Module Alias/Compatible Version

craype-network-none

none/1.0

craype-network-ofi

ofi/1.0

craype-network-ucx

ucx/1.0

CPU

Module Alias/Compatible Version

craype-x86-milan

x86-milan/1.0

craype-x86-rome

x86-rome/1.0

craype-x86-trento

x86-trento/1.0

MPI

Module Alias/Compatible Version

cray-mpich

cray-mpich/8.0

cray-mpich-abi

cray-mpich/8.0

cray-mpich-abi-pre-intel-5.0

cray-mpich/8.0

cray-mpich-ucx

cray-mpich/8.0

cray-mpich-ucx-abi

cray-mpich/8.0

cray-mpich-ucx-abi-pre-intel-5.0

cray-mpich/8.0

Custom Dynamic Hierarchy

The CPE custom dynamic hierarchy extension allows custom module paths to join the existing CPE Lmod hierarchy implementation without modifying customer modulefiles. The custom dynamic module types that CPE supports include:

  • Compiler

  • Network

  • CPU

  • MPI

  • Compiler/Network

  • Compiler/CPU

  • Compiler/Network/CPU/MPI

As each custom dynamic module type loads, a handshake occurs using special pre-defined environment variables. When all hierarchical prerequisites are met, the paths of the dependent modulefiles are added to the MODULEPATH environment variable, thereby exposing the dependent modules to the user.

For Lmod to assist a user optimally, load the compiler, network, CPU, and MPI module. Lmod cannot detect modules hidden in dynamic paths without one of each type of module being loaded.

Create a custom dynamic hierarchy

PREREQUISITES

Set Lmod as the default module handling system before initiating this procedure.

OBJECTIVE

For the CPE custom dynamic hierarchy to detect the desired Lmod module path, one or more custom dynamic environment variables must be created according to the requirements defined within this procedure.

PROCEDURE

To create a custom dynamic environment variable:

  1. Begin the environment variable name with LMOD_CUSTOM_.

  2. Append the descriptor of the module type that the environment variable will represent. The module types and descriptors are:

    Module Type

    Descriptor

    Compiler

    COMPILER_

    Network

    NETWORK_

    CPU

    CPU_

    MPI

    MPI_

    Compiler/Network

    COMNET_

    Compiler/CPU

    COMCPU_

    Compiler/Network/CPU/MPI

    CNCM_

    Example: The custom dynamic environment variable for the combined compiler and CPU module begins with LMOD_CUSTOM_COMCPU_.

  3. Following the descriptor, append all prerequisite module aliases along with their respective compatible versions. See Module Path Aliases and Current Compatibility Versions for more information. The format of the module path alias/compatible version string for each module type is shown below. Note that due to publishing issues, long module alias/compatible version strings are split across two lines as indicated below.

    Module Type: Module Path Alias/Compatible Version String

    Compiler: <compiler_name>/<compatible_version>

    Network: <network_name>/<compatible_version>

    CPU: <cpu_name>/<compatible_version>

    MPI:

    <compiler_name>/<compatible_version>/<network_name>/<compatible_version>/

    <mpi_name>/<compatible_version>

    Compiler/Network: <compiler_name>/<compatible_version/<network_name>/<compatible_version>

    Compiler/CPU: <compiler_name>/<compatible_version>/<cpu_name>/<compatible_version>

    Compiler/Network/CPU/MPI:

    <compiler_name>/<compatible_version>/<network_name>/<compatible_version>/

    <cpu_name>/<compatible_version>/<mpi_name>/<compatible_version>

    To create an acceptably formatted environment variable name, replace all slashes and dots in the module alias/compatible version string with underscores. Also, all letters must be in uppercase format.

    Example Module Path Alias/Compatible Version Strings:

  • Compiler = cce

    The path alias/compatible version string (values found in Module Path Aliases and Current Compatibility Versions) is crayclang/10.0; therefore, the text added to the environment variable name is:

    CRAYCLANG_10_0

  • Network = craype-network-ofi

    The path alias/compatible version string is ofi/1.0; therefore, the environment variable text is:

    OFI_1_0

  • CPU = craype-x86-rome

    The path alias/compatible version string is x86-rome/1.0; therefore, the environment variable text is:

    X86_ROME_1_0

  • MPI = cray-mpich

    cray-mpich has two prerequisite module types (compiler and network). Therefore, the environment variable must include the alias/compatible version for the desired compiler, network, and MPI. For a cray-mpich module dependent on cce and craype-network-ofi, the path alias/compatible version string is crayclang/10.0/ofi/1.0/cray_mpich/8.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0_CRAY_MPICH_8_0.

  • Compiler/Network = cce with craype-network-ofi

    The path alias/compatible version string is crayclang/10.0/ofi/1.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0

  • Compiler/CPU = cce with craype-x86-rome

    The path alias/compatible version string is crayclang/10.0/x86-rome/1.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_X86_ROME_1_0

  • Compiler/Network/CPU/MPI = cce, craype-network-ofi, craype-x86-rome, and cray-mpich

    The path alias/compatible version string is crayclang/10.0/ofi/1.0/x86-rome/1.0/cray-mpich/8.0; therefore, the environment variable text is:

    CRAYCLANG_10_0_OFI_1_0_X86_ROME_1_0_CRAY_MPICH_8_0

  1. Append _PREFIX following the final module/compatibility text instance:

    Example: Network = craype-network-ofi

    The custom dynamic environment variable is LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX.

    Creation of the custom dynamic environment variable is now complete.

  2. Add the custom dynamic environment variable to the user environment by exporting it with its value set to the Lmod module path:

    # export LMOD_CUSTOM_NETWORK_OFI_1_0_PREFIX=<lmod_module_path>
    

    Example: Network = craype-network-ofi

    All modulefiles in <lmod_module_path> are shown to users whenever craype-network-ofi is loaded.