Kernels and Tests¶
XGC’s major components can be run independently from the full code to aid in testing and development. The kernels can be run in test mode for verification. There is also a unit test suite, UnitTests-cpu (currently CPU only) which tests individual functions throughout the code; as such, there is no corresponding kernel.
Compiling¶
The kernels are built by default if you are building XGC (see Build Instructions). However, this can be inconvenient since the kernels require only a subset of the XGC dependencies. If you only want the kernels and tests, use the following instructions instead.
All kernels/tests require Kokkos
and Cabana
. The Collisions kernel requires LAPACK
. Tests will only configure if GTest
is found. For instructions on installing these dependencies, see 3rd Party Software Installations.
Once Kokkos, Cabana, and optionally GTest are installed, you can configure and build with:
mkdir build; cd build
cmake \
-DBUILD_FULL_XGC=Off \
-DKokkos_ROOT=<path to Kokkos> \
-DCabana_ROOT=<path to Cabana> \
-DLAPACK_ROOT=<path to LAPACK> \
-DGTEST_ROOT=<path to GTest> \
..
make -j
The executables have the form {component}Kernel-{cpu,gpu}, for example: collisionsKernel-gpu. Commonly used components:
collisions
electron_push
ion_scatter
Running¶
All component kernels require input files. If security settings permit it, running
ctest
in the build directory will download the directory containing these files from the Kitware website, then run all CPU tests. However, this automatic download fails in many systems. In that case, you must download the data manually, taking the most recent* tarball from here. The tarball contains a directory called SmallExample.
Run the executables from the directory that contains SmallExample. (i.e. the executables expect to find files of the form ./SmallExample/example.txt)
To test the kernels, use --test
, e.g.:
./electron_pushKernel-cpu --test
This mode will run a small, predefined example and compares against expected results. For collisions, it is the same calculation as -n_nodes 3
. For the other kernels, it is the same calculation as -n_ptl 37
. GoogleTest (GTest) is used to confirm correctness of results.
Outside of test mode, results are not verified. The kernel scale must be user-specified via a command-line input, e.g.:
./electron_pushKernel-cpu -n_ptl 50000
will run with 50000 particles. -n_ptl
is the input for all kernels except the collision kernel, since the collision kernel does not involve particles. In that kernel, scaling is done by adding mesh nodes. The syntax is:
./collisionsKernel-cpu -n_nodes 3000
If further customization of the collision kernel inputs is desired, one can supply a file in Fortran namelist format, specifiying its location with -file
.
Currently, all kernels and tests are single-process (i.e. no MPI).
* More robustly, use the tarball with the sha512 found in your repo’s version of utils/regression_tests/SmallExample.tar.gz.sha512
Versions and Performance¶
XGC’s performance-critical kernels are under continual development. The following table benchmarks performance on different architectures. The kernel version used for the benchmark is found in the hyperlink. The version number provided is not related to XGC version numbers. The problem sizes are large enough that the kernels saturate the architecture; increasing problem size should increase time linearly, leaving time per particle or node constant. The timing data is taken from the Camtimers output file.
Kernel |
Version |
Problem size |
Architecture |
Timing |
---|---|---|---|---|
collisions |
3e3 nodes |
V100 + POWER9 (Summit) |
2.6 ms/node |
|
collisions |
2.0 |
3e3 nodes |
A100 + EPYC (Perlmutter) |
1.2 ms/node |
electron_push |
2.0 |
50e6 ptl |
V100 (Summit) |
602 ns/ptl |
electron_push |
2.0 |
100e6 ptl |
A100 (Perlmutter) |
418 ns/ptl |
ion_scatter |
2.0 |
50e6 ptl |
V100 (Summit) |
2.2 ns/ptl |
ion_scatter |
2.0 |
100e6 ptl |
A100 (Perlmutter) |
1.7 ns/ptl |
Testing XGC¶
To confirm the full code is behaving as expected, one can run a full XGC version with the --test
command line argument, e.g.:
./xgc-es-cpp --test
in the following downloadable reference cases. Run XGC from inside the directory:
XGC version |
Flags |
Name (URL) |
---|---|---|
xgc-es-cpp |
None |
|
xgca-cpp |
None |
|
xgc-eem-cpp |
None |
|
xgc-eem-cpp |
|
The xgc-es-cpp and xgc-eem-cpp tests (and their GPU equivalents) must be run with 4 MPI ranks and export OMP_NUM_THREADS=4
. The xgca-cpp test must be run with 1 MPI rank (MPI is still required) and export OMP_NUM_THREADS=4
.
If you want to ensure that two XGC binaries behave the same in an XGC case beyond the examples above, you can run any XGC case as follows:
./xgc-es-cpp_my_reference_binary --update-test
./xgc-es-cpp_my_new_binary --test
The first line will write all particle positions to a file at the end of the run. The second line (run in the same directory) will compare the resulting particle positions against the first run, and output the largest difference found between the two runs for each particle property. Acceptable tolerances vary depending on the case; a typical tolerance is hard-coded to determine whether the test passed, but one can judge for themselves as well.
Note: This test currently only compares particle positions. Diagnostic output and additional aspects of the simulation state (like the 5D grid representation of the distribution function) are not verified here.
Note: Depending on the restart settings in your input file, the second run may try to start from where the first run left off. If present, remove timestep.dat
between the two runs to ensure this doesn’t happen.