Running XGC¶
Note: If you are new to XGC or a new platform, our Quickstart may be helpful by providing a functioning example to work from.
Overview¶
Prepare three files (
input
,adios2cfg.xml
, andpetsc.rc
) and place them in a run directory. Additional data files can be located in a separate directory, which is given by the input Fortran namelist parametersml_input_file_dir=[PATH_TO_INPUT_DIR]
. Create a subdirectoryrestart_dir
in the run directory for the checkpoint-restart functionality.
Preparation of a run directory¶
Three files are required in the run directory.
input
: Fortran namelist file grouped by categories. More detailsadios2cfg.xml
: ADIOS2 configuration file. More detailspetsc.rc
: PETSc resource file. More details
Additional files are located in the directory specified by the input parameter sml_input_file_dir
:
.eqd
file: Magnetic equilibrium file generated from EFIT geqdsk file. Toroidal and poloidal magnetic field (I=R*Bt and psi=poloidal flux / 2pi), magnetic axis and X-point location are stored. This file can be generated by utils/efit2eqd from EFIT geqdsk file.Mesh files: Triangle mesh files (
*.node
and*.ele
). The file formats are described here. To generate the mesh files for XGC, you can use RPI’s XGC mesh generator. You also need a*.flx.aif
file, which describes flux surfaces information. RPI’s XGC mesh code generates the.flx.aif
file together with.node
and.ele
files.Profile files: Initial equilibrium profiles (density and temperature) can be specified by an analytic formula or read from text files. The file format is simple text (ascii) with the number of datapoints, two-column data for normalized poloidal flux and profile, and an end flag (-1). The normalized poloidal flux is unity at the magnetic separatrix and zero on the magnetic axis. The units of the profiles are \(m^{-3}\) for density and eV for temperature. When toroidal flow (toroidal angular velocity) profile is required, its unit is rad/s. Below is a sample Fortran code that reads the profile data in the expected format.
read(funit,*) num allocate(psi(num),var(num)) do i=1, num read(funit,*) psi(i),var(i) enddo read(funit,*) flag if(flag/=-1) then print *, 'error in profile ftn init : invalid number of data or ending flag -1 is not set correctly', flag, ftn%filename stop endif
For the checkpoint-restart functionality, a subdirectory called restart_dir
must be present in your run directory. You can create it manually or have your execution script take care of this.
ADIOS2 configuration¶
XGC’s outputs are written with ADIOS2 and can be configured and controlled by ADIOS2’s XML config file (adios2cfg.xml
). A default ADIOS2 config file is provided in the repository (quickstart/inputs/adios2cfg.xml
). For better performance and tuning, users can update the ADIOS2 config file before launching a run.
Each XGC output has a unique ADIOS I/O name (e.g., restart
, restartf0
, f3d
, etc.) and users can specify ADIOS2 engine and its parameters. An example is shown here:
<?xml version="1.0"?>
<adios-config>
<io name="restart">
<engine type="BP4">
<parameter key="Profile" value="Off"/>
</engine>
</io>
</adios-config>
In the ADIOS2 config file, users can specify the following parameters for each io
block:
I/O name: the ADIOS2 name of XGC’s output (e.g.,
restart
)Engine name: the name of the ADIOS2 I/O engine. ADIOS2 provides various engines for file outputs, data streaming, etc. Examples are
BP4
(native ADIOS file format),HDF5
, andSST
(for in memory data exchange and streaming).Engine parameters: ADIOS2 provides a wide range of options to control the engine. Users can provide the key-value pairs of options in the XML. The full list of engines and options can be found in ADIOS2 user manual.
Here we list a few commonly used parameters for BP4 engine:
Profile
: Turn ON or OFF the writing of ADIOS profiling information.BurstBufferPath
: Redirect output file to another location. This feature can be used on machines that have local NVMe/SSDs. On Summit at OLCF, use “/mnt/bb/<username>” for the path where <username> is your user account name. Temporary files on the accelerated storage will be automatically deleted after the application closes the output and ADIOS drains all data to the file system, unless draining is turned off by BurstBufferDrain parameter.BurstBufferDrain
: Turn ON or OFF the burst buffer draining.NumAggregators
: Control the number of sub-files to be written. It ranges between 1 and the number of writing MPI processes. As of ADIOS2 version 2.7 release (Jan 2021), ADIOS2 set the default aggregation mode to be 1 sub-file per compute node (from one sub-file per process). It is recommended not to specify NumAggregators param anymore. The following is for fine tuning only. Choosing the optimum number of aggregators is not straightforward. It depends on the degree of parallelization and the filesystem specification. But, the rule of thumb is to use 1, 2, or 4 aggregators per node when running a large-scale simulation (e.g., more than 256 nodes on Summit). For example, if you are trying to run a 1024 Summit node job using 6 MPI processes per node, try to setNumAggregators
as 1024 or 2048.
PETSc configuration¶
XGC uses the PETSc library for solving matrix equations of Poisson and Ampere’s equations. For more details, see the PETSc homepage.
An example petsc.rc
configuration file is found in the quickstart/inputs
source subdirectory.
Examples of batch scripts¶
Different clusters will generally require a different batch script to run the simulation. Example scripts for our target platforms are found inside the repository in the directory quickstart/job_scripts
, and are reproduced below.
Note that these example job scripts call three other scripts: modules.sh
, which is specific to the platform and found in quickstart/modules_and_env
under the name of the platform; and two convenient scripts print_jobinfo.sh
and setup_rundir.sh
, which can both be found in quickstart/job_scripts
.
Aurora¶
#!/bin/bash
#PBS -l nodes=2
#PBS -q EarlyAppAccess
#PBS -W run_count=17
#PBS -j oe
#PBS -k doe
#PBS -A XGC_aesp_CNDA
#PBS -l walltime=00:30:00
#PBS -N myCustomJobName
# Executable path and where to execute simulation
EXE_BIN=./xgca-cpp
RUN_DIR=rundir
ABS_EXE_BIN=$(realpath "$EXE_BIN")
# Load modules used to build XGC as well as runtime environment
source modules.sh
# Set number of MPI ranks
NNODES=`wc -l < $PBS_NODEFILE`
export n_mpi_ranks_per_node=12
export n_mpi_ranks=$((${NNODES} * ${n_mpi_ranks_per_node}))
# Print basic job characteristics
sh print_jobinfo.sh ${NNODES} $n_mpi_ranks_per_node $ABS_EXE_BIN
# Copy gpu_tile_compact.sh file from hard-coded path (you may want to do this just once so it is consistent)
cp /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh .
# Prep run directory inside project directory
sh setup_rundir.sh $RUN_DIR
# Enter run directory and launch simulation
cd $RUN_DIR
mpiexec --np ${n_mpi_ranks} -ppn ${n_mpi_ranks_per_node} --cpu-bind ${CPU_BIND} -envall ../gpu_tile_compact.sh $ABS_EXE_BIN
Frontier¶
#!/bin/bash
#SBATCH -A phy122
#SBATCH -J xgc_kin
#SBATCH -t 01:00:00
#SBATCH -N 2
#SBATCH -o %x_%j.out
#SBATCH -e %x_%j.err
#SBATCH --threads-per-core=2
# Executable path and where to execute simulation
EXE_BIN=./xgca-cpp
RUN_DIR=rundir
ABS_EXE_BIN=$(realpath "$EXE_BIN")
# Load modules used to build XGC as well as runtime environment
source modules.sh
# Set number of MPI ranks
export n_mpi_ranks_per_node=8
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
# Print basic job characteristics
sh print_jobinfo.sh ${SLURM_JOB_NUM_NODES} $n_mpi_ranks_per_node $ABS_EXE_BIN
# Prep run directory inside project directory
sh setup_rundir.sh $RUN_DIR
# Enter run directory and launch simulation
cd $RUN_DIR
srun -N ${SLURM_JOB_NUM_NODES} -n ${n_mpi_ranks} -c ${OMP_NUM_THREADS} --ntasks-per-node=${n_mpi_ranks_per_node} --gpus-per-task=1 --gpu-bind=closest $ABS_EXE_BIN
# To submit job:
# sbatch jobscript.sh
Perlmutter (GPU)¶
#!/bin/bash
#SBATCH -A m499_g
#SBATCH -C gpu
#SBATCH -q regular
#SBATCH -t 1:00:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME
# Executable path and where to execute simulation
EXE_BIN=./xgca-cpp
RUN_DIR=rundir
ABS_EXE_BIN=$(realpath "$EXE_BIN")
# Load modules used to build XGC as well as runtime environment
source modules.sh
# Set number of MPI ranks
export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
# Print basic job characteristics
sh print_jobinfo.sh ${SLURM_JOB_NUM_NODES} $n_mpi_ranks_per_node $ABS_EXE_BIN
# Prep run directory inside project directory
sh setup_rundir.sh $RUN_DIR
# Enter run directory and launch simulation
cd $RUN_DIR
srun -N ${SLURM_JOB_NUM_NODES} -n ${n_mpi_ranks} -c ${OMP_NUM_THREADS} --ntasks-per-node=${n_mpi_ranks_per_node} --cpu-bind=cores --gpus-per-node=${n_mpi_ranks_per_node} bash -c "export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID)); $ABS_EXE_BIN"
# To submit job:
# sbatch jobscript.sh
Perlmutter (CPU)¶
#!/bin/bash
#SBATCH -A m499
#SBATCH -C cpu
#SBATCH -q regular
#SBATCH -t 1:00:00
#SBATCH -N 2
#SBATCH --job-name=JOB_NAME
# Executable path and where to execute simulation
EXE_BIN=./xgca-cpp
RUN_DIR=rundir
ABS_EXE_BIN=$(realpath "$EXE_BIN")
# Load modules used to build XGC as well as runtime environment
source modules.sh
# Set number of MPI ranks
export n_mpi_ranks_per_node=32
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
# Print basic job characteristics
sh print_jobinfo.sh ${SLURM_JOB_NUM_NODES} $n_mpi_ranks_per_node $ABS_EXE_BIN
# Prep run directory inside project directory
sh setup_rundir.sh $RUN_DIR
# Enter run directory and launch simulation
cd $RUN_DIR
srun -N ${SLURM_JOB_NUM_NODES} -n ${n_mpi_ranks} -c ${OMP_NUM_THREADS} --ntasks-per-node=${n_mpi_ranks_per_node} --cpu-bind=cores $ABS_EXE_BIN
# To submit job:
# sbatch jobscript.sh
Polaris¶
#!/bin/bash
#PBS -l nodes=16
#PBS -l filesystems=home:eagle
#PBS -l walltime=01:00:00
#PBS -A TokamakITER
#PBS -N xgcBench32
#PBS -j oe
#PBS -k doe
#PBS -q prod
# This is important: go to working directory
cd ${PBS_O_WORKDIR}
# Executable path and where to execute simulation
EXE_BIN=./xgca-cpp
RUN_DIR=rundir
ABS_EXE_BIN=$(realpath "$EXE_BIN")
# Load modules used to build XGC as well as runtime environment
source modules.sh
# Set number of MPI ranks
NNODES=`wc -l < $PBS_NODEFILE`
NDEPTH=16 # Number of hardware threads per rank, spacing between MPI ranks on a node
export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${NNODES} * ${n_mpi_ranks_per_node}))
# Print basic job characteristics
sh print_jobinfo.sh ${NNODES} $n_mpi_ranks_per_node $ABS_EXE_BIN
# Create set_affinity_gpu_polaris.sh file
echo '#!/bin/bash
num_gpus=4
gpu=$((${PMI_LOCAL_RANK} % ${num_gpus}))
export CUDA_VISIBLE_DEVICES=$gpu
echo “RANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}”
exec "$@"' > set_affinity_gpu_polaris.sh
chmod +x set_affinity_gpu_polaris.sh
# Prep run directory inside project directory
sh setup_rundir.sh $RUN_DIR
# Enter run directory and launch simulation
cd $RUN_DIR
mpiexec --np ${n_mpi_ranks} -ppn ${n_mpi_ranks_per_node} --cpu-bind depth -d ${NDEPTH} -envall ../set_affinity_gpu_polaris.sh $ABS_EXE_BIN
# To submit job:
# qsub jobscript.sh
Stellar¶
Stellar is a CPU cluster available to the Princeton University community upon approval. See https://researchcomputing.princeton.edu/systems/stellar for more details.
#!/bin/bash
#SBATCH -A pppl
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=96
#SBATCH --ntasks-per-socket=24
#SBATCH --cpus-per-task=1
#SBATCH -t 00:30:00
#SBATCH --qos=stellar-debug
#SBATCH --mail-type=BEGIN,END,FAIL
source /projects/XGC/STELLAR/Software/bin/set_up_xgc.stellar
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=1
export n_mpi_ranks_per_node=96
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
export XGC_EXEC=${HOME}/path_to_executable
echo 'Number of nodes: ' ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): ' $n_mpi_ranks
echo 'MPI ranks per node: ' $n_mpi_ranks_per_node
echo 'Number of OMP threads: ' ${OMP_NUM_THREADS}
echo 'XGC executable: ' ${XGC_EXEC}
OUTFILE=xgc_${SLURM_JOB_ID}.log
srun ${XGC_EXEC} &> ${OUTFILE}
Submit the job using the following submission command:
sbatch batch.sh
Traverse¶
Traverse is a GPU-enabled cluster available to the Princeton University community upon approval. See https://researchcomputing.princeton.edu/traverse for more details.
#!/bin/bash
#SBATCH -A pppl
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --cpus-per-task=32
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --gpus-per-task=1
#SBATCH -t 8:00:00
#SBATCH --mail-type=all
source /projects/XGC/TRAVERSE/Software/bin/set_up_xgc.traverse
export OMP_NUM_THREADS=32
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_STACKSIZE=2G
export n_mpi_ranks_per_node=4
export n_mpi_ranks=$((${SLURM_JOB_NUM_NODES} * ${n_mpi_ranks_per_node}))
export XGC_EXEC=${HOME}/path_to_executable
echo 'Number of nodes: ' ${SLURM_JOB_NUM_NODES}
echo 'MPI ranks (total): ' $n_mpi_ranks
echo 'MPI ranks per node: ' $n_mpi_ranks_per_node
echo 'Number of OMP threads: ' ${OMP_NUM_THREADS}
echo 'XGC executable: ' ${XGC_EXEC}
OUTFILE=xgc_${SLURM_JOB_ID}.log
srun ${XGC_EXEC} &> ${OUTFILE}
Optionally, for smaller debugging runs (4 nodes or less for less than 1 hour), add:
#SBATCH --reservation=test
Submit the job using the following submission command:
sbatch batch.sh
XGC Examples¶
XGC examples (https://github.com/PrincetonUniversity/XGC-Examples.git) are maintained separately from the code repository to avoid large data files in the code repository.
For a list of examples, click here.
XGC performance¶
XGC weak scaling of electrostatic ITER simulation using Frontier, Perlmutter(GPU), and Summit. XGC shows good weak scaling up to the full GPU nodes. The scaling study has been done by Aaron Scheinberg at Jubiliee Development and Paul Lin at Lawrence Berkeley National Laboratory.
Running XGC-S¶
Three folders are required in the run directory:
equilibrium
folder: contains a subfolder that incluces equilibrium magnetic field and mesh files..dat
file: set simulation domain \(R_\min,R_\max,Z_\min,Z_\max\) on each plane. Also set number of grid points (mr,mz,mphi) in \((R,Z,\varphi)\) for bicubic interpolation and stellarator turn number kturn..flux
file: 3D bicubic interpolation coefficients for the radial label \(\psi\), which is normalized and set to one at an arbitrary flux surface..mag1
file: 3D bicubic interpolation coefficients for \(B_R\)..mag2
file: 3D bicubic interpolation coefficients for \(B_\varphi\)..mag3
file: 3D bicubic interpolation coefficients for \(B_Z\)..node
and.ele
files: same mesh files for XGC1..grid
file: number of flux surfaces and number of nodes on each surface, similar to.flx.aif
file for XGC1..iota
file: read but not used.Except for
.grid
and.iota
, all other files are defined on 2*mp+1 different planes, namely, 2*mp+1 different files.
profile
folder: contains a profile data file, which gives \(n_i, \partial_\psi n_i, T_i, \partial_\psi T_i\) in the region \(\psi\in[0,1]\) where \(\psi\) is the normalized radial label consistent with that defined in the.dat
and.flux
files.A third folder with arbitrary name: actual run directory containing batch script (including path to
xgc-s
),input
,petsc.rc
. Also needadios2cfg.xml
if ADIOS2 is used.
Examples of batch scripts on Perlmutter CPU nodes and Stellar can be found above. Note that currently xgc-s
does not fully support OpenMP and export OMP_NUM_THREADS=1
should be used in the batch scripts.