# Kernels and Tests¶

XGC’s major components can be run independently from the full code to aid in testing and development. The kernels can be run in test mode for verification. There is also a unit test suite, UnitTests-cpu (currently CPU only) which tests individual functions throughout the code; as such, there is no corresponding kernel.

## Compiling¶

The kernels are built by default if you are building XGC (see Build Instructions). However, this can be inconvenient since the kernels require only a subset of the XGC dependencies. If you only want the kernels and tests, use the following instructions instead.

All kernels/tests require Kokkos and Cabana. The Collisions kernel requires LAPACK. Tests will only configure if GTest is found. For instructions on installing these dependencies, see 3rd Party Software Installations.

Once Kokkos, Cabana, and optionally GTest are installed, you can configure and build with:

mkdir build; cd build
cmake \
-DBUILD_FULL_XGC=Off \
-DKokkos_ROOT=<path to Kokkos> \
-DCabana_ROOT=<path to Cabana> \
-DLAPACK_ROOT=<path to LAPACK> \
-DGTEST_ROOT=<path to GTest> \
..
make -j


The executables have the form {component}Kernel-{cpu,gpu}, for example: collisionsKernel-gpu. Commonly used components:

• collisions

• electron_push

• ion_scatter

## Running¶

All component kernels require input files. If security settings permit it, running

ctest


in the build directory will download the directory containing these files from the Kitware website, then run all CPU tests. However, this automatic download fails in many systems. In that case, you must download the data manually, taking the most recent* tarball from here. The tarball contains a directory called SmallExample.

Run the executables from the directory that contains SmallExample. (i.e. the executables expect to find files of the form ./SmallExample/example.txt)

To test the kernels, use --test, e.g.:

./electron_pushKernel-cpu --test


This mode will run a small, predefined example and compares against expected results. For collisions, it is the same calculation as -n_nodes 3. For the other kernels, it is the same calculation as -n_ptl 37. GoogleTest (GTest) is used to confirm correctness of results.

Outside of test mode, results are not verified. The kernel scale must be user-specified via a command-line input, e.g.:

./electron_pushKernel-cpu -n_ptl 50000


will run with 50000 particles. -n_ptl is the input for all kernels except the collision kernel, since the collision kernel does not involve particles. In that kernel, scaling is done by adding mesh nodes. The syntax is:

./collisionsKernel-cpu -n_nodes 3000


If further customization of the collision kernel inputs is desired, one can supply a file in Fortran namelist format, specifiying its location with -file.

Currently, all kernels and tests are single-process (i.e. no MPI).

* More robustly, use the tarball with the sha512 found in your repo’s version of utils/regression_tests/SmallExample.tar.gz.sha512

## Versions and Performance¶

XGC’s performance-critical kernels are under continual development. The following table benchmarks performance on different architectures. The kernel version used for the benchmark is found in the hyperlink. The version number provided is not related to XGC version numbers. The problem sizes are large enough that the kernels saturate the architecture; increasing problem size should increase time linearly, leaving time per particle or node constant. The timing data is taken from the Camtimers output file.

Kernel

Version

Problem size

Architecture

Timing

collisions

2.0

3e3 nodes

V100 + POWER9 (Summit)

2.6 ms/node

collisions

2.0

3e3 nodes

A100 + EPYC (Perlmutter)

1.2 ms/node

electron_push

2.0

50e6 ptl

V100 (Summit)

602 ns/ptl

electron_push

2.0

100e6 ptl

A100 (Perlmutter)

418 ns/ptl

ion_scatter

2.0

50e6 ptl

V100 (Summit)

2.2 ns/ptl

ion_scatter

2.0

100e6 ptl

A100 (Perlmutter)

1.7 ns/ptl

## Testing XGC¶

To confirm the full code is behaving as expected, one can run a full XGC version with the --test command line argument, e.g.:

./xgc-es-cpp --test


in the following downloadable reference cases. Run XGC from inside the directory:

XGC version

Flags

Name (URL)

xgc-es-cpp

None

XGC1Example

xgca-cpp

None

XGCaExample

xgc-eem-cpp

None

XGC-EMExample

xgc-eem-cpp

-DDELTAF_CONV=On

XGC-EMECBCExample

The xgc-es-cpp and xgc-eem-cpp tests (and their GPU equivalents) must be run with 4 MPI ranks and export OMP_NUM_THREADS=4. The xgca-cpp test must be run with 1 MPI rank (MPI is still required) and export OMP_NUM_THREADS=4.

If you want to ensure that two XGC binaries behave the same in an XGC case beyond the examples above, you can run any XGC case as follows:

./xgc-es-cpp_my_reference_binary --update-test
./xgc-es-cpp_my_new_binary --test


The first line will write all particle positions to a file at the end of the run. The second line (run in the same directory) will compare the resulting particle positions against the first run, and output the largest difference found between the two runs for each particle property. Acceptable tolerances vary depending on the case; a typical tolerance is hard-coded to determine whether the test passed, but one can judge for themselves as well.

Note: This test currently only compares particle positions. Diagnostic output and additional aspects of the simulation state (like the 5D grid representation of the distribution function) are not verified here.

Note: Depending on the restart settings in your input file, the second run may try to start from where the first run left off. If present, remove timestep.dat between the two runs to ensure this doesn’t happen.