Kernels and Tests
=================

XGC's major components can be run independently from the full code to aid in testing and development. The kernels can be run in test mode for verification. There is also a unit test suite, UnitTests-cpu (currently CPU only) which tests individual functions throughout the code; as such, there is no corresponding kernel.

Compiling
---------

The kernels are built by default if you are building XGC (see :doc:`cmake_build_instructions`). However, this can be inconvenient since the kernels require only a subset of the XGC dependencies. If you only want the kernels and tests, use the following instructions instead.

All kernels/tests require ``Kokkos`` and ``Cabana``. The Collisions kernel requires ``LAPACK``. Tests will only configure if ``GTest`` is found. For instructions on installing these dependencies, see :doc:`third_party_software`.

Once Kokkos, Cabana, and optionally GTest are installed, you can configure and build with:

.. code-block:: bash

   mkdir build; cd build
   cmake \
        -DBUILD_FULL_XGC=Off \
        -DKokkos_ROOT=<path to Kokkos> \
        -DCabana_ROOT=<path to Cabana> \
        -DLAPACK_ROOT=<path to LAPACK> \
        -DGTEST_ROOT=<path to GTest> \
        ..
   make -j
   

The executables have the form {component}Kernel-{cpu,gpu}, for example: collisionsKernel-gpu. Commonly used components:

- collisions
- electron_push
- ion_scatter

Running 
---------

All component kernels require input files. If security settings permit it, running

.. code-block:: bash

   ctest

in the build directory will download the directory containing these files from the Kitware website, then run all CPU tests. However, this automatic download fails in many systems. In that case, you must download the data manually, taking the most recent* tarball from `here <https://data.kitware.com/#user/5f19b3089014a6d84e36139b/folder/5f19b3099014a6d84e36139f>`_. The tarball contains a directory called SmallExample.

Run the executables from the directory that contains SmallExample. (i.e. the executables expect to find files of the form ./SmallExample/example.txt)

To test the kernels, use ``--test``, e.g.:

.. code-block:: bash

   ./electron_pushKernel-cpu --test

This mode will run a small, predefined example and compares against expected results. For collisions, it is the same calculation as ``-n_nodes 3``. For the other kernels, it is the same calculation as ``-n_ptl 37``. GoogleTest (GTest) is used to confirm correctness of results.

Outside of test mode, results are not verified. The kernel scale must be user-specified via a command-line input, e.g.:

.. code-block:: bash

   ./electron_pushKernel-cpu -n_ptl 50000

will run with 50000 particles. ``-n_ptl`` is the input for all kernels except the collision kernel, since the collision kernel does not involve particles. In that kernel, scaling is done by adding mesh nodes. The syntax is:

.. code-block:: bash

   ./collisionsKernel-cpu -n_nodes 3000

If further customization of the collision kernel inputs is desired, one can supply a file in Fortran namelist format, specifiying its location with ``-file``.

Currently, all kernels and tests are single-process (i.e. no MPI).

\* More robustly, use the tarball with the sha512 found in your repo's version of utils/regression_tests/SmallExample.tar.gz.sha512

Versions and Performance
------------------------

XGC's performance-critical kernels are under continual development. The following table benchmarks performance on different architectures. The kernel version used for the benchmark is found in the hyperlink. The version number provided is not related to XGC version numbers. The problem sizes are large enough that the kernels saturate the architecture; increasing problem size should increase time linearly, leaving time per particle or node constant. The timing data is taken from the Camtimers output file.

=============  =======================================================================================================   ============== ========================= =============
Kernel         Version                                                                                                   Problem size   Architecture              Timing
=============  =======================================================================================================   ============== ========================= =============
collisions     `2.0 <https://github.com/PrincetonUniversity/XGC-Devel/tree/ab8734511ecd3d0cda87e554374fa084120305e5>`_   3e3 nodes      V100 + POWER9 (Summit)    2.6 ms/node
collisions     2.0                                                                                                       3e3 nodes      A100 + EPYC (Perlmutter)  1.2 ms/node
electron_push  2.0                                                                                                       50e6 ptl       V100 (Summit)             602 ns/ptl
electron_push  2.0                                                                                                       100e6 ptl      A100 (Perlmutter)         418 ns/ptl
ion_scatter    2.0                                                                                                       50e6 ptl       V100 (Summit)             2.2 ns/ptl
ion_scatter    2.0                                                                                                       100e6 ptl      A100 (Perlmutter)         1.7 ns/ptl
=============  =======================================================================================================   ============== ========================= =============


Testing XGC
-----------

To confirm the full code is behaving as expected, one can run a full XGC version with the ``--test`` command line argument, e.g.:

.. code-block:: bash

   ./xgc-es-cpp --test

in the following downloadable reference cases. Run XGC from inside the directory:

===========  ============================= ================================================================================================
XGC version  Flags                         Name (URL)
===========  ============================= ================================================================================================
xgc-es-cpp   None                          `XGC1Example <https://data.kitware.com/api/v1/file/618c07762fa25629b9854ba0/download>`_
xgca-cpp     None                          `XGCaExample <https://data.kitware.com/api/v1/file/61e6d7ca4acac99f425cd653/download>`_
xgc-eem-cpp  None                          `XGC-EMExample <https://data.kitware.com/api/v1/file/618c07732fa25629b9854b92/download>`_
xgc-eem-cpp  ``-DDELTAF_CONV=On``          `XGC-EMECBCExample <https://data.kitware.com/api/v1/file/618c076d2fa25629b9854b7e/download>`_
===========  ============================= ================================================================================================

The xgc-es-cpp and xgc-eem-cpp tests (and their GPU equivalents) must be run with 4 MPI ranks and ``export OMP_NUM_THREADS=4``. The xgca-cpp test must be run with 1 MPI rank (MPI is still required) and ``export OMP_NUM_THREADS=4``.

If you want to ensure that two XGC binaries behave the same in an XGC case beyond the examples above, you can run any XGC case as follows:

.. code-block:: bash

   ./xgc-es-cpp_my_reference_binary --update-test
   ./xgc-es-cpp_my_new_binary --test

The first line will write all particle positions to a file at the end of the run. The second line (run in the same directory) will compare the resulting particle positions against the first run, and output the largest difference found between the two runs for each particle property. Acceptable tolerances vary depending on the case; a typical tolerance is hard-coded to determine whether the test passed, but one can judge for themselves as well.

Note: This test currently only compares particle positions. Diagnostic output and additional aspects of the simulation state (like the 5D grid representation of the distribution function) are not verified here.

Note: Depending on the restart settings in your input file, the second run may try to start from where the first run left off. If present, remove ``timestep.dat`` between the two runs to ensure this doesn't happen.