Running XGC

Overview

Prepare three files (input, adios2cfg.xml, and petsc.rc) and place them in a run directory. Additional data files can be located in a separate directory, which is given by the input Fortran namelist parameter sml_input_file_dir=[PATH_TO_INPUT_DIR]. Create a subdirectory restart_dir in the run directory for the checkpoint-restart functionality.

Preparation of a run directory

Three files are required in the run directory.

Additional files are located in the directory specified by the input parameter sml_input_file_dir:

  • .eqd file: Magnetic equilibrium file generated from EFIT geqdsk file. Toroidal and poloidal magnetic field (I=R*Bt and psi=poloidal flux / 2pi), magnetic axis and X-point location are stored. This file can be generated by utils/efit2eqd from EFIT geqdsk file.

  • Mesh files: Triangle mesh files (*.node and *.ele). The file formats are described here. To generate the mesh files for XGC, you can use RPI’s XGC mesh generator. You also need a *.flx.aif file, which describes flux surfaces information. RPI’s XGC mesh code generates the .flx.aif file together with .node and .ele files.

  • Profile files: Initial equilibrium profiles (density and temperature) can be specified by an analytic formula or read from text files. The file format is simple text (ascii) with the number of datapoints, two-column data for normalized poloidal flux and profile, and an end flag (-1). The normalized poloidal flux is unity at the magnetic separatrix and zero on the magnetic axis. The units of the profiles are \(m^{-3}\) for density and eV for temperature. When toroidal flow (toroidal angular velocity) profile is required, its unit is rad/s. Below is a sample Fortran code that reads the profile data in the expected format.

    read(funit,*) num
    allocate(psi(num),var(num))
    do i=1, num
       read(funit,*) psi(i),var(i)
    enddo
    read(funit,*) flag
    if(flag/=-1) then
       print *, 'error in profile ftn init : invalid number of data or ending flag -1 is not set correctly', flag, ftn%filename
       stop
    endif
    

For the checkpoint-restart functionality, a subdirectory called restart_dir must be present in your run directory. You can create it manually or have your execution script take care of this.

ADIOS2 configuration

XGC’s outputs are written with ADIOS2 and can be configured and controlled by ADIOS2’s XML config file (adios2cfg.xml). A default ADIOS2 config file is provided in the repository (XGC_core/adios2cfg.xml). For better performance and tuning, users can update the ADIOS2 config file before launching a run.

Each XGC output has a unique ADIOS I/O name (e.g., restart, restartf0, f3d, etc.) and users can specify ADIOS2 engine and its parameters. An example is shown here:

<?xml version="1.0"?>
<adios-config>
    <io name="restart">
        <engine type="BP4">
            <parameter key="Profile" value="Off"/>
        </engine>
    </io>
</adios-config>

In the ADIOS2 config file, users can specify the following parameters for each io block:

  • I/O name: the ADIOS2 name of XGC’s output (e.g., restart)

  • Engine name: the name of the ADIOS2 I/O engine. ADIOS2 provides various engines for file outputs, data streaming, etc. Examples are BP4 (native ADIOS file format), HDF5, and SST (for in memory data exchange and streaming).

  • Engine parameters: ADIOS2 provides a wide range of options to control the engine. Users can provide the key-value pairs of options in the XML. The full list of engines and options can be found in ADIOS2 user manual.

Here we list a few commonly used parameters for BP4 engine:

  1. Profile: Turn ON or OFF the writing of ADIOS profiling information.

  2. BurstBufferPath: Redirect output file to another location. This feature can be used on machines that have local NVMe/SSDs. On Summit at OLCF, use “/mnt/bb/<username>” for the path where <username> is your user account name. Temporary files on the accelerated storage will be automatically deleted after the application closes the output and ADIOS drains all data to the file system, unless draining is turned off by BurstBufferDrain parameter.

  3. BurstBufferDrain: Turn ON or OFF the burst buffer draining.

  4. NumAggregators: Control the number of sub-files to be written. It ranges between 1 and the number of writing MPI processes. As of ADIOS2 version 2.7 release (Jan 2021), ADIOS2 set the default aggregation mode to be 1 sub-file per compute node (from one sub-file per process). It is recommended not to specify NumAggregators param anymore. The following is for fine tuning only. Choosing the optimum number of aggregators is not straightforward. It depends on the degree of parallelization and the filesystem specification. But, the rule of thumb is to use 1, 2, or 4 aggregators per node when running a large-scale simulation (e.g., more than 256 nodes on Summit). For example, if you are trying to run a 1024 Summit node job using 6 MPI processes per node, try to set NumAggregators as 1024 or 2048.

PETSc configuration

XGC uses the PETSc library for solving matrix equations of Poisson and Ampere’s equations. For more details, see the PETSc homepage.

An example petsc.rc configuration file is found in the XGC_core source subdirectory.

Examples of batch scripts

Different clusters will generally require a different batch script to run the simulation. Below are example scripts users could use to run on several HPC systems, including: Cori (NERSC), Summit (ORNL), Theta (ANL), and Traverse (Princeton U.).

Cori KNL

To submit the job on Cori KNL, copy the following batch script file to a file named batch.sh, and modify it according to your job size, memory requirement, etc. This is straightforward to adjust for the Haswell partition.

#!/bin/bash
#SBATCH -A m499
#SBATCH -C knl
#SBATCH --qos=regular
#SBATCH --nodes=6
#SBATCH --ntasks=96
#SBATCH --cpus-per-task=16
#SBATCH --time=4:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=EMAIL_ADDRESS@pppl.gov
#SBATCH --job-name=JOB_NAME
#SBATCH -o OUTPUT_NAME.%j.out
#SBATCH -e OUTPUT_NAME.%j.err

# load modules used to build XGC
source /project/projectdirs/m499/Software/bin/nersc_config
source /project/projectdirs/m499/Software/bin/haswell2knl

# For avoiding time-out errors with large node counts
export PMI_MMAP_SYNC_WAIT_TIME=1800

export OMP_NUM_THREADS=16
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_STACKSIZE=2G
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# path of XGC executable
export xgc_bin_path=XGC_executable_path

export n_mpi_ranks_per_node=16
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}

srun --wait 200 --bcast=/tmp/xgc-tmp-exec --cpu_bind=cores $xgc_bin_path >& LOG_${SLURM_JOB_ID}.log

The following items are to be modified as necessary:

  • #SBATCH -A m499: job charging account name.

  • #SBATCH --nodes 6: number of nodes used; as the Conventional Delta-f ITG simulation using XGC1 test case is used as example here, we only used a small number of Cori KNL nodes to run it.

  • #SBATCH --ntasks=96: total number of tasks, i.e. number of MPI ranks.

  • #SBATCH --cpus-per-task=16: number of cpu cores assigned for a task (one MPI rank). Usually this is set in accordance with –nnodes and –ntasks to accommodate the memory requirement of a job. If the memory requirement for a specific job is large, we may increase –cpus-per-task. If we want to maintain the same number of tasks (MPI ranks) specified using –ntasks , we may use more nodes by changing –nodes to a larger value. Alternatively, we could specify #SBATCH –tasks-per-node=8: the number of tasks (MPI ranks) per computing node. To avoid oversubscribing physical hardware, choose OMP_NUM_THREADS \(\leq\) cpus-per-task.

  • #SBATCH --time=4:00:00: how long the job will be run. Here we requested 4 hours wall time. Maximum job time limit is 48 hours on the regular queue. Details on job submission queue limit could be found on NERSC queue policy page.

  • #SBATCH --mail-user=EMAIL_ADDRESS@pppl.gov: user email address to send the job begin, end and fail notification; replace EMAIL_ADDRESS@pppl.gov with your email address.

  • #SBATCH --job-name=JOB_NAME: job name specification; replace JOB_NAME with the name you want.

  • #SBATCH -o OUTPUT_NAME.%j.out: job history output file name; replace OUTPUT_NAME with desired name.

  • export xgc_bin_path=XGC_executable_path: path of XGC executable file; replace XGC_executable_path with the actual path of the executable file.

Submit the job using the following sbatch command:

sbatch batch.sh

More info here.

Frontier

An example job script, where modules.sh contains the environment used while building:

#!/bin/bash
#SBATCH -A phy122-ecp
#SBATCH -J XGC_ES_2_planes
#SBATCH -t 01:00:00
#SBATCH -N 16
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
#SBATCH --threads-per-core=2

source modules.sh
export FI_CXI_RX_MATCH_MODE=software
export OMP_PROC_BIND=true
export OMP_NUM_THREADS=14

# 8 MPI processes per compute node
srun -n128 -c14 --gpus-per-task=1 --gpu-bind=closest ../xgc-es-cpp-gpu

Perlmutter GCC

April 23, 2023 update: the GNU/GCC compilers are the recommended compilers for running on Perlmutter GPU compute nodes.

The assumption is that one is starting with the default list of modules before the following module load/unload commands, i.e. no module load/unload commands in the user’s .bashrc/.bash_profile/etc.

#!/bin/bash
#SBATCH -A m499_g
#SBATCH -C gpu
#SBATCH -q regular
#SBATCH -t 1:30:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME

# load modules used to build XGC
module load cray-fftw
module unload darshan
module unload cray-libsci

export OMP_STACKSIZE=2G   # required for GNU build to prevent a segfault
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=32
export xgc_bin_path=XGC_executable_path
# disable GPU-aware MPI for PETSc
export PETSC_OPTIONS='-use_gpu_aware_mpi 0'
# enable GPU-aware MPI for PETSc
# export PETSC_OPTIONS='-use_gpu_aware_mpi 1'

export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}
echo ''

srun -N ${SLURM_JOB_NUM_NODES} -n ${n_mpi_ranks} -c ${OMP_NUM_THREADS} --cpu-bind=cores --ntasks-per-node=${n_mpi_ranks_per_node} --gpus-per-task=1 --gpu-bind=single:1 $xgc_bin_path

To use the Ginkgo GPU solver rather than LAPACK CPU solver for the collision operator, please add the following three lines to the input file:

&performance_param
collisions_solver='ginkgo'
/

Perlmutter CPU GCC

April 23, 2023 update: the GNU/GCC compilers are the recommended compilers for running on Perlmutter CPU-only compute nodes.

The assumption is that one is starting with the default list of modules before the following module load/unload commands, i.e. no module load/unload commands in the user’s .bashrc/.bash_profile/etc.

#!/bin/bash
#SBATCH -A m499
#SBATCH -C cpu
#SBATCH -q regular
#SBATCH -t 1:30:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME

# load modules used to build XGC
module unload gpu
module load cray-fftw
module unload darshan
module unload cray-libsci

export FI_CXI_RX_MATCH_MODE=hybrid  # prevents crash for large number of MPI processes, e.g. > 4096

export OMP_STACKSIZE=2G   # required for GNU build to prevent a segfault

# Perlmutter CPU-only nodes have dual-socket AMD EPYC, each with 64 cores (128 HT)
# For each CPU-only node, want (MPI ranks)*${OMP_NUM_THREADS}=256
# Recommend OMP_NUM_THREADS=8 or 16
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export OMP_NUM_THREADS=8
export xgc_bin_path=XGC_executable_path

export n_mpi_ranks_per_node=32
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}
echo ''

srun -N ${SLURM_JOB_NUM_NODES} -n ${n_mpi_ranks} -c ${OMP_NUM_THREADS} --cpu-bind=cores --ntasks-per-node=${n_mpi_ranks_per_node} $xgc_bin_path

Perlmutter Nvidia

April 23, 2023 update: the recommendation is to use the GCC compilers for building XGC on Perlmutter. Still sorting out some issues with nvidia 23.1 + cuda 12.0.

#!/bin/bash
#SBATCH -A m499_g
#SBATCH -C gpu
#SBATCH -q regular
#SBATCH -t 1:30:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME

# load modules used to build XGC
module unload gpu
module load PrgEnv-nvidia
module swap nvidia nvidia/23.1
module load cudatoolkit/12.0
module load cray-fftw
module unload darshan
module unload cray-libsci

export OMP_PLACES=cores
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=16
export xgc_bin_path=XGC_executable_path
# turn off GPU aware MPI for now
export PETSC_OPTIONS='-use_gpu_aware_mpi 0'

export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}
echo ''

srun -N ${SLURM_JOB_NUM_NODES} -n 8 -c 32 --cpu-bind=cores --ntasks-per-node=4 --gpus-per-task=1 --gpu-bind=single:1 $xgc_bin_path

Perlmutter CPU Nvidia

For Perlmutter CPU-only compute nodes.

April 23, 2023 update: the recommendation is to use the GCC compilers for building XGC on Perlmutter.

#!/bin/bash
#SBATCH -A m499
#SBATCH -C cpu
#SBATCH -q regular
#SBATCH -t 1:30:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME

# load modules used to build XGC
module unload gpu
module load PrgEnv-nvidia
module swap nvidia nvidia/23.1
module load cray-fftw
module unload cray-libsci
module unload darshan

export FI_CXI_RX_MATCH_MODE=hybrid  # prevents crash for large number of MPI processes, e.g. > 4096

# Perlmutter CPU-only nodes have dual-socket AMD EPYC, each with 64 cores (128 HT)
# For each CPU-only node, want (MPI ranks)*${OMP_NUM_THREADS}=256
# Recommend OMP_NUM_THREADS=8 or 16
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export OMP_NUM_THREADS=8
export xgc_bin_path=XGC_executable_path

export n_mpi_ranks_per_node=32
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}
echo ''

srun -N ${SLURM_JOB_NUM_NODES} -n 64 -c 8 --cpu-bind=cores --ntasks-per-node=32 $xgc_bin_path

Polaris

#!/bin/bash
#PBS -A TokamakITER
#PBS -l nodes=32
#PBS -l walltime=02:00:00
#PBS -N xgcBench32
#PBS -k doe
#PBS -o ./em2_planes.stdout
#PBS -e ./em2_planes.stderr
#PBS -q prod
#PBS -l filesystems=home:eagle

# This is important: go to working directory
cd ${PBS_O_WORKDIR}

# load modules used to build XGC
module use /soft/modulefiles
module load cmake kokkos cabana cray-fftw

export OMP_NUM_THREADS=16
export OMP_PROC_BIND=spread
export OMP_PLACES=threads

# Path of XGC executable
export xgc_bin_path=[path to xgc binary]

# For GPU-Aware MPI, when XGC supports it
#export MPICH_GPU_SUPPORT_ENABLED=1

NNODES=`wc -l < $PBS_NODEFILE`
NRANKS_PER_NODE=4  # Number of MPI ranks per node
NDEPTH=16          # Number of hardware threads per rank, spacing between MPI ranks on a node

NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))

mpiexec --np ${NTOTRANKS} -ppn ${NRANKS_PER_NODE} -d ${NDEPTH} --cpu-bind numa -envall set_affinity_gpu_polaris.sh ${xgc_bin_path}

Set the file set_affinity_gpu_polaris.sh to contain the following:

#!/bin/bash
num_gpus=4
gpu=$((${PMI_LOCAL_RANK} % ${num_gpus}))
export CUDA_VISIBLE_DEVICES=$gpu
echo “RANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}exec "$@"

Stellar

Stellar is a CPU cluster available to the Princeton University community upon approval. See https://researchcomputing.princeton.edu/systems/stellar for more details.

#!/bin/bash
#SBATCH -A pppl
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=96
#SBATCH --ntasks-per-socket=24
#SBATCH --cpus-per-task=1
#SBATCH -t 00:30:00
#SBATCH --qos=stellar-debug
#SBATCH --mail-type=BEGIN,END,FAIL

source /scratch/gpfs/xgc/STELLAR/Software/bin/set_up_xgc.stellar

export OMP_PROC_BIND=false
export OMP_NUM_THREADS=1

export n_mpi_ranks_per_node=96
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
export XGC_EXEC=${HOME}/path_to_executable

echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${XGC_EXEC}

OUTFILE=xgc_${SLURM_JOB_ID}.log
srun ${XGC_EXEC} &> ${OUTFILE}

Submit the job using the following submission command:

sbatch batch.sh

Summit

Similarly, to submit the job on the ORNL computer Summit, copy the following batch script file to a file named batch.sh, and modifying it accordingly.

#!/bin/bash
#BSUB -P PHY122
#BSUB -W 2:00
#BSUB -nnodes 8
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov

# load modules used to build XGC
module load nvhpc/22.5
module load spectrum-mpi
module load python
module load netlib-lapack
module load hypre
module load fftw
module load hdf5
module load cmake/3.20.2
module load libfabric/1.12.1-sysrdma
module swap cuda/nvhpc cuda/11.7.1

# create restart file directory
mkdir -p restart_dir

export OMP_NUM_THREADS=14
export xgc_bin_path=XGC_Executable_PATH
export OMPI_MCA_coll_ibm_collselect_mode_barrier=failsafe
jsrun -n 48 -r 6 -a 1 -g 1 -c 7 -b rs $xgc_bin_path

Modify the following options as necessary:

  • #BSUB -P PHY122: computing job charge account name.

  • #BSUB -W 2:00: wall time; here we are requesting 2 hours.

  • #BSUB -nnodes 8: resources to request; here we are requesting 8 Summit nodes.

  • #BSUB -J JOB_NAME: job name; replace JOB_NAME with the name of your choice.

  • #BSUB -o OUTPUT_NAME.%J: job history output file name; replace OUTPUT_NAME with the desired name; %J appends the job number to the file name for ease of identification.

  • #BSUB -e ERROR_OUTPUT_NAME.%J: error message output file name; replace ERROR_OUTPUT_NAME with the name you want.

  • #BSUB -N EMAIL_ADDRESS@pppl.gov: set the email address for notification; replace EMAIL_ADDRESS@pppl.gov with your email address.

  • export xgc_bin_path=XGC_Executable_PATH: set the XGC executable file path; replace XGC_Executable_PATH with your executable file path.

  • jsrun -n 48 -r 6 -a 1 -g 1 -c 7 -b rs $xgc_bin_path: here we are running with 48 total resource sets using 8 nodes, with 6 resource sets per computing node, 1 task per resource set, 1 GPU per resource set, 7 CPUs (cores) per resource set, and bind to cores in resource set. The details of the submission script could be found in Summit user guide webpage, under the batch-scripts and common-jsrun-options sections.

  • For shorter delays in output to stdout, you may insert stdbuf -oL -eL between rs and $xgc_bin_path. Refer to the man-page of the stdbuf command for more details.

Submit the job using the following submission command:

bsub batch.sh

Sunspot

#!/bin/bash
#PBS -l select=16:system=sunspot,place=scatter
#PBS -l filesystems=home
#PBS -A XGC_aesp_CNDA
#PBS -l walltime=00:20:00
#PBS -N xgcESITER16n
#PBS -k doe
#PBS -o ./es2_planes.stdout
#PBS -e ./es2_planes.stderr

# load modules used to build XGC
module load spack tmux cmake oneapi/eng-compiler/2023.05.15.006 kokkos/2023.05.15.006/eng-compiler/sycl_intel_aot cabana/2023.05.15.006/eng-compiler/sycl_intel

export TZ='/usr/share/zoneinfo/US/Central'
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=16
export CPU_BIND=depth
unset OMP_PLACES

export MPIR_CVAR_ENABLE_GPU=0 # disable gpu-aware mpich
export FI_MR_CACHE_MONITOR=memhooks
export MPIR_CVAR_ALLREDUCE_INTRA_ALGORITHM=recursive_doubling

export MPIR_CVAR_ENABLE_GPU=0 # gpu-aware mpich
export FI_MR_CACHE_MONITOR=memhooks # Servesh suggestion
export MPIR_CVAR_ALLREDUCE_INTRA_ALGORITHM=recursive_doubling
unset MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_CH4_POSIX_COLL_SELECTION_TUNING_JSON_FILE

export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 # Patrick suggestion to avoid segv at end

NNODES=`wc -l < $PBS_NODEFILE`
RANKS_PER_NODE=12
NDEPTH=$(( 208 / RANKS_PER_NODE ))
NTOTRANKS=$(( NNODES * RANKS_PER_NODE )) # assumes fully-populated nodes (using all GPUs)

mpiexec --np ${NTOTRANKS} -ppn ${RANKS_PER_NODE} -d ${NDEPTH} --cpu-bind ${CPU_BIND} -envall gpu_tile_compact.sh $xgc_bin_path

Theta

This is an example of a batch job script using two files. You can have freedom to modify runscript after you submit job while waiting in the queue (e.g., to point your run to a different run directory), or reusing the same script for different configurations.

#!/bin/bash
#COBALT -t 6:00:00
#COBALT -n 1024
#COBALT --attrs mcdram=cache:numa=quad
#COBALT -A TokamakITER

./runscript

Optional arguments include:

  • #COBALT -q debug-flat-quad

  • #COBALT -M [YOUR_EMAIL_ADDRESS]

  • #COBALT -q debug-cache-quad

The following ‘runscript’ file actually launches XGC :

#!/bin/bash

echo "Starting Cobalt job script"

export OMP_NUM_THREADS=32
export OMP_STACKSIZE=8000000
export OMP_MAX_ACTIVE_LEVELS=2

export n_nodes=$COBALT_JOBSIZE
export n_mpi_ranks_per_node=8
export n_mpi_ranks=$(($n_nodes * $n_mpi_ranks_per_node))
export n_openmp_threads_per_rank=${OMP_NUM_THREADS}
export n_hyperthreads_per_core=4
export n_hyperthreads_skipped_between_ranks=32

OUTFILE=xgc_${COBALT_JOBID}.out
SWDIR=/projects/TokamakITER/Software/camtimers/DEFAULT/DEFAULT

echo 'Number of nodes: '                  ${COBALT_JOBSIZE}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${n_openmp_threads_per_rank}
echo 'Number of hyperthreads/core: '      $n_hyperthreads_per_core
echo 'Number of hyperthreads btw ranks: ' $n_hyperthreads_skipped_between_ranks

mkdir -p restart_dir

aprun -n $n_mpi_ranks -N $n_mpi_ranks_per_node \
      --env OMP_STACKSIZE=${OMP_STACKSIZE} \
      --env OMP_NUM_THREADS=$n_openmp_threads_per_rank -cc depth \
      -d $n_hyperthreads_skipped_between_ranks \
      -j $n_hyperthreads_per_core \
      ../XGC-Devel/xgc_build/xgc-es-cpp >& ${OUTFILE}

Options:

  • n_mpi_ranks_per_node: Number of MPI ranks per compute node

  • n_openmp_threads_per_rank: Number of OpenMP threads per MPI rank (should be equal to OMP_NUM_THREADS)

  • n_hyperthreads_per_core: Number of hyperthreads per physical compute core (max. 4)

  • n_hyperthreads_skipped_between_ranks: Number of hyperthreads per MPI rank (should be equal to OMP_NUM_THREADS)

Submit the job using the following submission command:

sbatch batch.sh

Traverse

Traverse is a GPU-enabled cluster available to the Princeton University community upon approval. See https://researchcomputing.princeton.edu/traverse for more details.

#!/bin/bash
#SBATCH -A pppl
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --cpus-per-task=32
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --gpus-per-task=1
#SBATCH -t 8:00:00
#SBATCH --mail-type=all

source /home/rhager/Software/bin/set_up_xgc.traverse

export OMP_NUM_THREADS=32
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_STACKSIZE=2G

export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
export XGC_EXEC=${HOME}/path_to_executable

echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${XGC_EXEC}

OUTFILE=xgc_${SLURM_JOB_ID}.log
srun ${XGC_EXEC} &> ${OUTFILE}

Optionally, for smaller debugging runs (4 nodes or less for less than 1 hour), add:

#SBATCH --reservation=test

Submit the job using the following submission command:

sbatch batch.sh

XGC Examples

XGC examples (https://github.com/PrincetonUniversity/XGC-Examples.git) are maintained separately from the code repository to avoid large data files in the code repository.

For a list of examples, click here.