Running XGC

Overview

Prepare three files (input, adios2cfg.xml, and petsc.rc) and place them in a run directory. Additional data files can be located in a separate directory, which is given by the input Fortran namelist parameter sml_input_file_dir=[PATH_TO_INPUT_DIR]. Create a subdirectory restart_dir in the run directory for the checkpoint-restart functionality.

Preparation of a run directory

Three files are required in the run directory.

Additional files are located in the directory specified by the input parameter sml_input_file_dir:

  • .eqd file: Magnetic equilibrium file generated from EFIT geqdsk file. Toroidal and poloidal magnetic field (I=R*Bt and psi=poloidal flux / 2pi), magnetic axis and X-point location are stored. This file can be generated by utils/efit2eqd from EFIT geqdsk file.

  • Mesh files: Triangle mesh files (*.node and *.ele). The file formats are described here. To generate the mesh files for XGC, you can use RPI’s XGC mesh generator. You also need a *.flx.aif file, which describes flux surfaces information. RPI’s XGC mesh code generates the .flx.aif file together with .node and .ele files.

  • Profile files: Initial equilibrium profiles (density and temperature) can be specified by an analytic formula or read from text files. The file format is simple text (ascii) with the number of datapoints, two-column data for normalized poloidal flux and profile, and an end flag (-1). The normalized poloidal flux is unity at the magnetic separatrix and zero on the magnetic axis. The units of the profiles are $m^-3$ for density and eV for temperature. When toroidal flow (toroidal angular velocity) profile is required, its unit is rad/s. Below is a sample Fortran code that reads the profile data in the expected format.

    read(funit,*) num
    allocate(psi(num),var(num))
    do i=1, num
       read(funit,*) psi(i),var(i)
    enddo
    read(funit,*) flag
    if(flag/=-1) then
       print *, 'error in profile ftn init : invalid number of data or ending flag -1 is not set correctly', flag, ftn%filename
       stop
    endif
    

For the checkpoint-resrtart functionality, a subdirectory called restart_dir must be present in your run directory. You can create it manually or have your execution script take care of this.

ADIOS2 configuration

XGC’s outputs are written with ADIOS2 and can be configured and controlled by ADIOS2’s XML config file (adios2cfg.xml). A default ADIOS2 config file is provided for XGC1 and XGCa (XGC1/adios2cfg.xml and XGCa/adios2cfg.xml). For better performance and tuning, users can update the ADIOS2 config file before launching a run.

Each XGC output has a unique ADIOS I/O name (e.g., restart, restartf0, f3d, etc.) and users can specify ADIOS2 engine and its parameters. An example is shown here:

<?xml version="1.0"?>
<adios-config>
    <io name="restart">
        <engine type="BP4">
            <parameter key="Profile" value="Off"/>
        </engine>
    </io>
</adios-config>

In the ADIOS2 config file, users can specify the following parameters for each io block:

  • I/O name: the ADIOS2 name of XGC’s output (e.g., restart)

  • Engine name: the name of the ADIOS2 I/O engine. ADIOS2 provides various engines for file outputs, data streaming, etc. Examples are BP4 (native ADIOS file format), HDF5, and SST (for in memory data exchange and streaming).

  • Engine parameters: ADIOS2 provides a wide range of options to control the engine. Users can provide the key-value pairs of options in the XML. The full list of engines and options can be found in ADIOS2 user manual.

Here we list a few commonly used parameters for BP4 engine:

  1. NumAggregators: Control the number of sub-files to be written. It ranges between 1 and the number of writing MPI processes. Choosing the right number of aggregators is not straightforward. It depends on the degree of parallelization and the filesystem specification. But, the rule of thumb is to use 1, 2, or 4 aggregators per node when running a large-scale simulation (e.g., more than 256 nodes on Summit). For an example, if you are trying to run a 1024 Summit node job using 6 MPI processes per node, try to set NumAggregators as 1024 or 2048.

  2. Profile: Turn ON or OFF the writing of ADIOS profiling information.

  3. BurstBufferPath: Redirect output file to another location. This feature can be used on machines that have local NVMe/SSDs. On Summit at OLCF, use “/mnt/bb/<username>” for the path where <username> is your user account name. Temporary files on the accelerated storage will be automatically deleted after the application closes the output and ADIOS drains all data to the file system, unless draining is turned off by BurstBufferDrain parameter.

  4. BurstBufferDrain: Turn ON or OFF the burst buffer draining.

PETSc configuration

XGC uses the PETSc library for solving matrix equations of Poisson and Ampere’s equations. For more details, see the PETSc homepage.

An example petsc.rc configuration file is found in the XGCa source subdirectory.

Examples of batch scripts

Different clusters will generally require a different batch script to run the simulation. Below are example scripts users could use to run on several HPC systems, including: Cori (NERSC), Summit (ORNL), Theta (ANL), and Traverse (Princeton U.).

Cori KNL

To submit the job on Cori KNL, copy the following batch script file to a file named batch.sh, and modify it according to your job size, memory requirement, etc.

#!/bin/bash
#SBATCH -A m499
#SBATCH -C knl
#SBATCH --qos=regular
#SBATCH --nodes=6
#SBATCH --ntasks=96
#SBATCH --cpus-per-task=16
#SBATCH --time=4:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=EMAIL_ADDRESS@pppl.gov
#SBATCH --job-name=JOB_NAME
#SBATCH -o OUTPUT_NAME.%j.out
#SBATCH -e OUTPUT_NAME.%j.err

# load modules used to build XGC
module swap craype-haswell craype-mic-knl
module load cray-hdf5-parallel
module load cray-fftw

export OMP_NUM_THREADS=8
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_STACKSIZE=2G
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# path of XGC executable
export xgc_bin_path=XGC_executable_path

export n_mpi_ranks_per_node=16
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${xgc_bin_path}

srun --cpu_bind=cores $xgc_bin_path >& LOG_${SLURM_JOB_ID}.log

The following items are to be modified as necessary:

  • #SBATCH -A m499: job charging account name.

  • #SBATCH --nodes 6: number of nodes used; as the Conventional Delta-f ITG simulation using XGC1 test case is used as example here, we only used a small number of Cori KNL nodes to run it.

  • #SBATCH --ntasks=96: total number of tasks, i.e. number of MPI ranks.

  • #SBATCH --cpus-per-task=16: number of cpu cores assigned for a task (one MPI rank). Usually this is set in accordance with –nnodes and –ntasks to accommodate the memory requirement of a job. If the memory requirement for a specific job is large, we may increase –cpus-per-task. If we want to maintain the same number of tasks (MPI ranks) specified using –ntasks , we may use more nodes by changing –nodes to a larger value. Alternatively, we could specify #SBATCH –tasks-per-node=8: the number of tasks (MPI ranks) per computing node. To avoid oversubscribing physical hardware, choose OMP_NUM_THREADS$leq$cpus-per-task.

  • #SBATCH --time=4:00:00: how long the job will be run. Here we requested 4 hours wall time. Maximum job time limit is 48 hours on the regular queue. Details on job submission queue limit could be found on NERSC queue policy page.

  • #SBATCH --mail-user=EMAIL_ADDRESS@pppl.gov: user email address to send the job begin, end and fail notification; replace EMAIL_ADDRESS@pppl.gov with your email address.

  • #SBATCH --job-name=JOB_NAME: job name specification; replace JOB_NAME with the name you want.

  • #SBATCH -o OUTPUT_NAME.%j.out: job history output file name; replace OUTPUT_NAME with desired name.

  • export xgc_bin_path=XGC_executable_path: path of XGC executable file; replace XGC_executable_path with the actual path of the executable file.

Submit the job using the following sbatch command:

sbatch batch.sh

More info here.

Summit

Similarly, to submit the job on the ORNL computer Summit, copy the following batch script file to a file named batch.sh, and modifying it accordingly.

#!/bin/bash
#BSUB -P PHY122
#BSUB -W 2:00
#BSUB -nnodes 8
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov

# load modules used to build XGC
module load pgi/19.10
module load spectrum-mpi/10.3.1.2-20200121
module load cuda/10.1.105
module load netlib-lapack/3.8.0
module load hypre/2.13.0
module load fftw/3.3.8
module load hdf5/1.10.4
module load pgi-cxx14

# create restart file directory
mkdir -p restart_dir

export OMP_NUM_THREADS=14
export xgc_bin_path=XGC_Executable_PATH
jsrun -n 48 -r 6 -a 1 -g 1 -c 7 -b rs $xgc_bin_path

Modify the following options as necessary:

  • #BSUB -P PHY122: computing job charge account name.

  • #BSUB -W 2:00: wall time; here we are requesting 2 hours.

  • #BSUB -nnodes 8: resources to request; here we are requesting 8 Summit nodes.

  • #BSUB -J JOB_NAME: job name; replace JOB_NAME with the name of your choice.

  • #BSUB -o OUTPUT_NAME.%J: job history output file name; replace OUTPUT_NAME with the desired name; %J appends the job number to the file name for ease of identification.

  • #BSUB -e ERROR_OUTPUT_NAME.%J: error message output file name; replace ERROR_OUTPUT_NAME with the name you want.

  • #BSUB -N EMAIL_ADDRESS@pppl.gov: set the email address for notification; replace EMAIL_ADDRESS@pppl.gov with your email address.

  • export xgc_bin_path=XGC_Executable_PATH: set the XGC executable file path; replace XGC_Executable_PATH with your executable file path.

  • jsrun -n 48 -r 6 -a 1 -g 1 -c 7 -b rs $xgc_bin_path: here we are running with 48 total resource sets using 8 nodes, with 6 resource sets per computing node, 1 task per resource set, 1 GPU per resource set, 7 CPUs (cores) per resource set, and bind to cores in resource set. The details of the submission script could be found in Summit user guide webpage, under the batch-scripts and common-jsrun-options sections.

  • For shorter delays in output to stdout, you may insert stdbuf -oL -eL between rs and $xgc_bin_path. Refer to the man-page of the stdbuf command for more details.

Submit the job using the following submission command:

bsub batch.sh

Theta

This is an example of a batch job script using two files. You can have freedom to modify runscript after you submit job while waiting in the queue (e.g., to point your run to a different run directory), or reusing the same script for different configurations.

#!/bin/bash
#COBALT -t 6:00:00
#COBALT -n 1024
#COBALT --attrs mcdram=cache:numa=quad
#COBALT -A TokamakITER

./runscript

Optional arguments include:

  • #COBALT -q debug-flat-quad

  • #COBALT -M [YOUR_EMAIL_ADDRESS]

  • #COBALT -q debug-cache-quad

The following ‘runscript’ file actually launches XGC :

#!/bin/bash

echo "Starting Cobalt job script"

export OMP_NUM_THREADS=32
export OMP_STACKSIZE=8000000
export OMP_MAX_ACTIVE_LEVELS=2

export n_nodes=$COBALT_JOBSIZE
export n_mpi_ranks_per_node=8
export n_mpi_ranks=$(($n_nodes * $n_mpi_ranks_per_node))
export n_openmp_threads_per_rank=${OMP_NUM_THREADS}
export n_hyperthreads_per_core=4
export n_hyperthreads_skipped_between_ranks=32

OUTFILE=xgc_${COBALT_JOBID}.out
SWDIR=/projects/TokamakITER/Software/camtimers/DEFAULT/DEFAULT

echo 'Number of nodes: '                  ${COBALT_JOBSIZE}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${n_openmp_threads_per_rank}
echo 'Number of hyperthreads/core: '      $n_hyperthreads_per_core
echo 'Number of hyperthreads btw ranks: ' $n_hyperthreads_skipped_between_ranks

mkdir -p restart_dir

aprun -n $n_mpi_ranks -N $n_mpi_ranks_per_node \
      --env OMP_STACKSIZE=${OMP_STACKSIZE} \
      --env OMP_NUM_THREADS=$n_openmp_threads_per_rank -cc depth \
      -d $n_hyperthreads_skipped_between_ranks \
      -j $n_hyperthreads_per_core \
      ../XGC-Devel/xgc_build/xgc-es-cpp >& ${OUTFILE}

Options:

  • n_mpi_ranks_per_node: Number of MPI ranks per compute node

  • n_openmp_threads_per_rank: Number of OpenMP threads per MPI rank (should be equal to OMP_NUM_THREADS)

  • n_hyperthreads_per_core: Number of hyperthreads per physical compute core (max. 4)

  • n_hyperthreads_skipped_between_ranks: Number of hyperthreads per MPI rank (should be equal to OMP_NUM_THREADS)

Submit the job using the following submission command:

sbatch batch.sh

Traverse

Traverse is a GPU-enabled cluster available to the Princeton University community upon approval. See https://researchcomputing.princeton.edu/traverse for more details.

#!/bin/bash
#SBATCH -A pppl
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --cpus-per-task=32
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --gpus-per-task=1
#SBATCH -t 8:00:00
#SBATCH --mail-type=all

module purge
module load pgi/19.9/64
module load openmpi/pgi-19.9/4.0.4/64
module load cudatoolkit/10.1
module load hdf5/pgi-19.9/openmpi-4.0.4/1.10.6
module load fftw/gcc/openmpi-4.0.1/3.3.8

export NVCC_WRAPPER_DEFAULT_COMPILER=/opt/pgi/19.9/linuxpower/19.9/bin/pgc++
export CXX=/home/as85/Software/install/kokkos/pgi19.9/bin/nvcc_wrapper
export XGC_PLATFORM=traverse
export OMP_NUM_THREADS=32
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_STACKSIZE=2G

export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
export XGC_EXEC=${HOME}/path_to_executable

echo 'Number of nodes: '                  ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): '                $n_mpi_ranks
echo 'MPI ranks per node: '               $n_mpi_ranks_per_node
echo 'Number of OMP threads: '            ${OMP_NUM_THREADS}
echo 'XGC executable: '                   ${XGC_EXEC}

OUTFILE=xgc_${SLURM_JOB_ID}_${j}.log
srun xgca &> ${OUTFILE}

Optionally, for smaller debugging runs (4 nodes or less for less than 1 hour), add:

  • #SBATCH –reservation=test

Submit the job using the following submission command:

sbatch batch.sh

XGC Examples

XGC examples (https://github.com/PrincetonUniversity/XGC-Examples.git) are maintained separately from the code repository to avoid large data files in the code repository.

Currently we have three examples: