Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • DAPHNE4NFDI/sassena
1 result
Show changes
Showing
with 2123 additions and 830 deletions
docs/config/pics/pasted2.png

7.61 KiB

CI Pipeline
==================================
===========
Sassena has a CI.
Sassena has a CI piepline which runs on the GitLab instance when a new commit is
pushed. It compiles the project, runs unit and integration
tests and can launch deployments for the documentation and binaries.
Building the Docker Image
-------------------------
Sassena uses a docker container to build the program and run the unit tests on
the CI server.
Firstly, may need to create a "Personal Access Token". This can be done in the
Gitlab settings under your personal profile settings. Once this is done, with your
credentials, run the command
Sassena uses the same docker container to build the program and run the unit tests on
the CI server as well as for VSCode dev containers. Over time, packages may become need updating in the container, which can be done using the following steps.
The container which builds Sassena is defined in ``Dockerfile``, in the root of
the project directory ``/``. Additionally, there is a second container which supports CUDA and
which lives in ``.devcontainer/cuda/Dockerfile``. These files are
docker-specific shell scripts on how to build the container. They are both based on
Ubuntu but they download and install additional dependencies such as Boost and
Sphinx, which are needed for either compiling Sassena or during the CI pipeline.
Updating the docker containers is relatively simple. Firstly, you may need to
create a "Personal Access Token". This can be done in the Gitlab settings under
your personal profile settings. Once this is done, with your credentials, run
the command:
.. code-block::
docker login registry.hzdr.de
The docker image can be build on a local machine using the command
.. code-block::
Docker images are built on a local machine using the command:
.. code-block:: bash
docker build -t registry.hzdr.de/daphne4nfdi/sassena .
If you have made changes to the docker container and want to push it to the CI,
simply update the Sassena registry using the command
.. code-block::
This will build the `Dockerfile` in the current directory and store it with
the name ``registry.hzdr.de/daphne4nfdi/sassena``. If you have made changes to
the docker container and want to push it to the GitLab container registry and therefore CI, simply use the command:
.. code-block:: bash
docker push registry.hzdr.de/daphne4nfdi/sassena
GitLab CI Configuration
-----------------------
The CI pipeline is defined by the file ``.gitlab-ci.yml``, which resides in the
root of the project. It is a YAML file and further information on its structure
can be found on the
`GitLab CI Documentation <https://docs.gitlab.com/ee/ci/>`_.
The file defines three main steps in the pipeline. Firstly, the project is
compiled at the pushed commit. This is done in exactly the same way as specified
in the README or User Guide section of the docs. Docs can also be built if the
user has Doxygen and Sphinx installed and passes the ``SASS_BUILD_DOCS=ON``
argument to CMake (see user guide for further information).
The test section of the pipeline runs unit and integration tests,
which could also be run locally if developer mode is enabled and the user runs
``ninja test`` (or similar if using ``make``) to run the ``test`` target. The
integration tests are simply ``scatter.xml`` files which live in the
``tests/waterdyn4/ci_configs`` directory and are compared to a known good
``signal.h5`` file on the CI using a Python script.
The final stage of the pipeline is the deployment step. For documentation, this
corresponds to copying the output of the docs from the build stage into a
specific directory for GitLab Pages. The deployment step also contains
a job which builds an AppImage of Sassena and uploads it to the "Releases"
section of the project using ``curl``.
AppImages are made using the utility `linuxdeploy
<https://github.com/meefik/linuxdeploy>`_, which is included in the Sassena
container. If you want to create your own AppImages locally, we recommend you
download and install linuxdeploy locally or use the existing one provided (e.g.
through VSCode's Dev Containers). AppImages are created according to the
`AppImage user
guide <https://docs.appimage.org/packaging-guide/from-source/native-binaries.html#cmake>`_
and then running linuxdeploy in the directory.
Introduction
------------
Profiling with VTune
====================
This documentation is intended to help you configure ``Sassena`` and to
profile its performance by using profiling tool
......@@ -12,38 +12,6 @@ Goals
#. To help installing Intel Basic oneAPI Toolkit
#. To help using VTune for profiling *Sassena*
Virtual Machine
---------------
Virtual Machines can be easily created through
``Oracle VM Virtual Box Manager``, which can be downloaded and installed
as follows:
1) Download the correct ``Virtual Box`` for your operating system
`here <https://www.virtualbox.org/wiki/Downloads>`__ and follow the
installation instructions.
2) Open the ``Virtual Box`` and follow the steps below to create a
virtual machine using Windows OS (similar steps must be followed to
create VM with any other Operating System):
+----------------+----------------+------------------+----------------+
| Step 01 - | Step 02 - | Step 03 - | Step 04 - Hard |
| Click on “New” | Create VM | Allocate Memory | Drive Setup |
| to create new | | | |
| VM | | | |
+================+================+==================+================+
| |image1| | |image2| | |image3| | |image4| |
+----------------+----------------+------------------+----------------+
+----------------+----------------+-----------------+-----------------+
| Step 05 - | Step 06 - | Step 07 - | Step 08 - |
| Select HD File | Select Storage | Location and | Install OS |
| Type | on HD | Size Setup | |
+================+================+=================+=================+
| |image5| | |image6| | |image7| | |image8| |
+----------------+----------------+-----------------+-----------------+
Intel oneAPI Base Toolkit
-------------------------
......@@ -59,182 +27,6 @@ Intel oneAPI Base Toolkit
please check out
`here <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/overview.html>`__.
Quick Instructions for Installation
------------------------------------
**Prerequisites:**
#. CPUs REQUIREMENTS:
* Intel® Core™ processor family or higher
* Intel® Xeon® processor family
* Intel® Xeon® Scalable processor family
* Intel® Core Ultra processors
#. GPUs REQUIREMENTS:
* Intel® UHD Graphics for 11th generation Intel processors or newer
* Intel® Iris® Xe graphics
* Intel® Arc™ graphics
* Intel® Server GPU
* Intel® Data Center GPU Flex Series
* Intel® Data Center GPU Max Series
* NVIDIA or AMD GPUs using plug-ins from Codeplay
#. DISK SPACE:
* ~3 GB of disk space (minimum) if only installing compiler and its
libraries: Intel oneAPI DPC++/C++ Compiler, Intel® DPC++
Compatibility Tool, Intel® oneAPI DPC++ Library and Intel® Threading
Building Block
* Maximum of ~24 GB diskspace if installing all components
#. MEMORY:
* 8 GB RAM recommended
#. LINUX OPERATING SYSTEM REQUIREMENTS (with oneAPI 2024.1):
* Red Hat Enterprise Linux 8.x, 9.x
* Ubuntu 20.04, 22.04
* Fedora 38, 39
* SuSE LINUX Enterprise Server 15 SP3, SP4, SP5
* Debian 11
* Amazon Linux 2022
* Rocky Linux 9
* WSL 2 (except oneCCL)
#. SET UP YOUR SYSTEM FOR GPU:
* Install GPU drivers
* Check that you have fulfilled the requirements of Intel® Graphics
Compute Runtime for oneAPI Level Zero and OpenCL™ Driver. Make sure
that you have permissions to access the /dev/dri/renderD and
/dev/dri/card files.
* For HPC use cases, adjust driver defaults by setting udev rules as
described in Set Up User Permissions for Using the Device files for
Intel GPUs.
* If you plan to use the Intel® Distribution for GDB on Linux OS, make
sure to configure debugger access.
**Installation:**
#. Intel® oneAPI Base Toolkit
- Binary Installer Online/Offline: Click on `this
link <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`__
and follow the instructions.
- Package Manager:
`YUM <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/yum-dnf-zypper.html#GUID-9209A521-A17E-4253-B6AD-73666AB9E90C>`__,
`DNF <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/yum-dnf-zypper.html#GUID-9209A521-A17E-4253-B6AD-73666AB9E90C>`__,
`Zypper <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/yum-dnf-zypper.html#GUID-9209A521-A17E-4253-B6AD-73666AB9E90C>`__,
`APT <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/yum-dnf-zypper.html#GUID-9209A521-A17E-4253-B6AD-73666AB9E90C>`__
- Docker Container: `Docker
Hub <https://hub.docker.com/r/intel/oneapi-basekit>`__, Docker Pull
Command ``docker pull intel/oneapi-basekit``
#. Install with GUI:
- Download the Intel® oneAPI Base Toolkit installation package using
the offline installer option from
`here <https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit>`__
- Launch the installer with one of the following commands:
1. root: ``sudo sh ./l_[Toolkit Name]Kit_[version].sh``
2. user: ``sh ./l_[Toolkit Name]Kit_[version].sh``
- Follow the installer instructions
- Verify that the toolkit is installed to the correct installation
directory:
1. root: ``/opt/intel/oneapi``
2. user: ``~/intel/oneapi``
#. Install with Command Line:
- Download the Intel® oneAPI Base Toolkit installation package using
the offline installer option from
`here <https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit>`__
- Run the non-interactive installation:
1. root:
``sudo sh ./l_[Toolkit Name]Kit_[version].sh -a --silent --eula accept``
2. user:
``sh ./l_[Toolkit Name]Kit_[version].sh -a --silent --eula accept``
- If you are using GPU, you need to install intel GPU drivers
`here <https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2024-1/install-gpu-drivers.html#INSTALL-INTEL-GPU-DRIVERS>`__.
Installing Sassena
-------------------
* Once the virtual machine is created, you can install ``Sassena`` in
it:
* Sassena repository can be accessed by `this
link <https://codebase.helmholtz.cloud/DAPHNE4NFDI/sassena>`__.
* In the repository it is possible to find more information about the
software on the ``README`` and ``./docs/config-doc-v1.4.0-rev4.pdf``
files.
Compiling Sassena
------------------
To compile Sassena, it is necessary to install Boost Library
(*boost\_[Version_Number]*, e.g *boost_1.82.0*). There is an *apt
repository* problem with Linux Ubuntu with issue raised
`here <https://codebase.helmholtz.cloud/DAPHNE4NFDI/sassena/-/issues/1>`__.
It is recommended to follow the history of the issue for a correct
installation.
* For linux systems, download the *.tar* file of the boost library from
*www.boost.org/users/history/version\_[Version_Number]*. You can also
find a history of boost versions right
`here <https://www.boost.org/users/history/>`__.
* Unzip the *.tar* file in the same directory of Sassena.
* In the command line, type:
#. cd *boost\_[Version_Number]*
#. mkdir *build*
#. *./bootstrap.sh*
Install additional packages before compiling:
1. sudo apt install *build-essential, libfftw3-dev, openmpi-bin,
libblas-dev, liblapack-dev, gfortran libxml2-dev,zlib1g-dev,
libhdf5-dev*
2. Edit *project-config.jam* file and insert *``using mpi ;``* in the
last line.
3. *./b2 –build-dir=./build –with-regex –with-mpi –with-thread
–with-serialization –with-system –with-filesystem
–with-program_options*
4. cd *../sassena/*
5. mkdir *compile_gcc*
Sassena Help:
-------------
1. Type ``sassena --help`` on the terminal for more information on
Sassena’s commands and flags.
**Install with Debug mode off:**
1. cd *compile_gcc/*
2. cmake .. *-D BOOST_ROOT=/home/boost_1_82_0/stage/*
**Install with debug mode on:**
1. cd compile_debug/
2. cmake .. *-D BOOST_ROOT=/home/boost_1_82_0/stage/ -D DEBUG=ON*
| Main Option:
| 3. make -j
| Alternative Option:
| 4. cmake –build .
Profiling with VTune GUI
------------------------
......@@ -275,88 +67,6 @@ Profiling with VTune GUI
- ``mpirun -np 8 /opt/intel/oneapi/vtune/2024.0/bin64/vtune -r /intel/vtune/projects/sassenaVtune/res cohe2 -collect threading -target-duration-type=medium -data-limit=10000 -trace-mpi --app-working-dir=/home/newcode/intel/vtune/projects/sassenaVtune --/home/newcode/projects/helmholtz/sassena/compile_debug/sassena --limits.computation.threads 2 --config/home/newcode/projects/helmholtz/sassena/coherent/n str coh.xml``
Working Machine
---------------
A computer with the following parameters was used for profiling Sassena
with VTune:
.. raw:: html
<center>
===================== ==============
Characteristic ID
===================== ==============
Architecture *x86_64*
Number of Cores (CPU) *8*
Vendor *GenuineIntel*
Virtualization Type Full
===================== ==============
.. raw:: html
</center>
- Other features from the used machine can be seen in the picture
below:
|image12| ## Results
- Coherent Case:
+-----------+-----------+-----------+-----------+-----------+-----------+
| | Ranks 0 | Ranks 1 | Ranks 3 | Ranks 5 | Effective |
| | and 6 | and 2 | and 4 | and 7 | CPU |
| | | | | | Ut |
| | | | | | ilization |
+===========+===========+===========+===========+===========+===========+
| MPI alone | |image13| | |image14| | |image15| | |image16| | |image17| |
| with 8 | | | | | |
| P | | | | | |
| rocessors | | | | | |
+-----------+-----------+-----------+-----------+-----------+-----------+
+-----------+-----------+-----------+-----------+-----------+-----------+
| | Ranks 0 | Ranks 1 | Ranks 2 | Ranks 5 | Effective |
| | and 4 | and 3 | and 6 | and 7 | CPU |
| | | | | | Ut |
| | | | | | ilization |
+===========+===========+===========+===========+===========+===========+
| MPI + | |image18| | |image19| | |image20| | |image21| | |image22| |
| Threading | | | | | |
| with 8 | | | | | |
| p | | | | | |
| rocessors | | | | | |
+-----------+-----------+-----------+-----------+-----------+-----------+
- Incoherent Case:
+-----------+-----------+-----------+-----------+-----------+-----------+
| | Ranks 0 | Ranks 1 | Ranks 3 | Ranks 5 | Effective |
| | and 6 | and 2 | and 4 | and 7 | CPU |
| | | | | | Ut |
| | | | | | ilization |
+===========+===========+===========+===========+===========+===========+
| MPI Alone | |image23| | |image24| | |image25| | |image26| | |image27| |
| with 8 | | | | | |
| P | | | | | |
| rocessors | | | | | |
+-----------+-----------+-----------+-----------+-----------+-----------+
+-----------+-----------+-----------+-----------+-----------+-----------+
| | Ranks 0 | Ranks 2 | Ranks 3 | Ranks 3 | Effective |
| | and 4 | and 5 | and 6 | and 7 | CPU |
| | | | | | Ut |
| | | | | | ilization |
+===========+===========+===========+===========+===========+===========+
| MPI + | |image28| | |image29| | |image30| | |image31| | |image32| |
| Threading | | | | | |
| with 8 | | | | | |
| p | | | | | |
| rocessors | | | | | |
+-----------+-----------+-----------+-----------+-----------+-----------+
.. |image1| image:: img/virtual_box/virtual_box_01.png
.. |image2| image:: img/virtual_box/virtual_box_02.png
.. |image3| image:: img/virtual_box/virtual_box_03.png
......
=====================
File Map
=====================
========
The main directory contains a few subdirectories which provide the structural
framework for code maintenance and compilation. It usually contains:
* build.env
* build-dev
* cmake
* CMakeLists.txt
......@@ -15,14 +13,6 @@ framework for code maintenance and compilation. It usually contains:
* tools
* vendor
build.env
--------------
build.env contains sample bash scripts which can be used to initialize the
cmake build environment. They are named by the machine name for which they are
written. These files are highly system dependent and are not necessarily
accurate, since systems continuously get upgraded. However, they can be used to
bootstrap the compilation environment more efficiently.
build-dev
--------------
......@@ -62,19 +52,7 @@ cmake
cmake directory. This directory contains the instruction set for
cmake. The main entry point for cmake (CMakeLists.txt) resides in the root
directory and refers to files within the cmake subdirectory for further
instructions. Currently the CMakeLists.txt file checks for the definition of a
static flag ( -D STATIC=1 ) and then either includes CMakeLists.txt.static or
CMakeLists.txt.shared. The files are: CMakeLists.txt.depends contains
instructions for including external library dependencies into the compilation
environment. CMakeLists.txt.intern contains instructions for building the
sub-libraries of the sassena software. CMakeLists.txt.executables contains build
target instructions and performs the linking of any binaries with the sassena
sub-libraries. CMakeLists.txt.packages is included but not used at the moment.
It eventually will provide a mechanism for cross-compiling software packages.
CMakeLists.txt.version contains instructions for incorporating version
information into the software through automatically generated C++ header files
(SassenaConfig.hpp).
instructions.
include and src
---------------
......@@ -119,12 +97,12 @@ modules exist and use the specified subdirectories to implement the C++ classes:
- scatter_devices
The main directory in the src folder is
The main directory in the ``src/core`` folder is
dedicated to hold the single file entry points for any binary executables. They
should only take charge of initializing the software environment and executing
the various sections of the modules. The entry point for the software sassena is
sassena.cpp in the src/main folder. Additionally every module includes the
single file set incude/common.hpp and src/common.cpp, which provides a mechanism
``sassena.cpp`` in the ``src/app`` folder. Additionally every module includes the
single file set ``src/core/common.hpp`` and ``src/core/common.cpp``, which provides a mechanism
to set type information across all modules.
* MATH. The MATH module provides functions to implement mathematical routines (element wise squaring of an array,
......
**************
For Developers
===================
**************
.. toctree::
:maxdepth: 2
:caption: Contents
using
configuring
file_map
ci
Using Sassena
==================================
Here we explain how to use sassena
Command line options
--------------------
Sassena is configured by an xml configuration file.
By default, Sassena will look for a file scatter.xml
in the current directory. Some settings of the
configuration file can be overridden by command line
options.
``--help``
show help message
``--config=FILENAME``
xml configuration file name (default: ./scatter.xml)
``--sample.structure.file=FILENAME``
Structure file name (default: ./sample.pdb)
``--sample.structure.format=FORMAT``
Structure file format (default: pdb)
``--stager.file=FILENAME``
Trajectory file name (default: ./dump.dcd)
``--stager.format=FORMAT``
Trajectory file format (default: dcd)
``--scattering.signal.file=FILENAME``
Output file name (default: ./signal.h5)
``--stager.target=SELECTION``
Atom selection producing the signal (default: system)
``--stager.dump=BOOL``
Do/Don't dump the postprocessed coordinates to a file (default: false)
``--limits.computation.threads=NUMBER``
Number of threads per process (default: 1)
This diff is collapsed.
stager
------
.. note::
====== ====== ========= ======== =========================
stager
====== ====== ========= ======== =========================
name type instances default allowed
target string 0..1 system any valid selection
mode string 0..1 frames frames, atoms
dump bool 0..1 false true, false
file string 0..1 dump.dcd any valid filename string
format string 0..1 dcd dcd
====== ====== ========= ======== =========================
This section mainly effects the staging procedure. However, the
scattering type (all/self) may enforce a specific staging mode. The
seperation of the staging mode into its own section allows future
extension of the software towards other uses and allows the use of the
data staging modes without analysis for distinct purpose (the tool
s_stager simply stages data without analysis), e.g. parallel reading and
processing of the trajectory data with a subsequent parallel write, or
the inversion of the trajectory data layout (atoms <-> frames).
stager.target
-------------
This section allows the definition of a target selection. This enables
the user to compute the scattering or perform an analysis on a subgroup
of atoms without the need of producing a reduced trajectory. The default
is to include all atoms (system). The selection has to be defined within
the section sample.selections.
.. code-block:: XML
<target>SelectedAtoms</target>
declares that only atoms of the selection with name “SelectedAtoms” are
considered when computing the scattering diagram.
stager.mode
-----------
Mode can be either “atoms” or “frames”. The mode is usually enforced by
the specific analysis (thus overwritten). However, data processing which
only operates on the trajectory data (like the tool s_stager), requires
the specification of the proper staging mode. Mode “frames” distributes
complete frames among the available nodes, while “atoms” assigns atoms
to nodes.
.. code-block:: XML
<mode>frames</mode>
declares that the data will be staged by frames, UNLESS the analysis
enforces a specific staging mode.
stager.dump
-----------
The in-memory trajectory data can be written to a new file. This allows
the use of the sassena package to extract and post-process trajectory
data efficiently. If the target only specified a sub-selection of atoms
in the system, then the written trajectory only contains those.
.. code-block:: XML
<dump>true</dump>
<file>dump.dcd</file>
<format>dcd</format>
writes the in-memory trajectory data to the file dump.dcd in DCD format.
If the staging mode is “atoms” the trajectory is in effect transposed,
which means that the first frame of the new trajectory contains the
positions of the first atom, the second frame contains all positions of
the second atom, and so on.
database
--------
.. note::
======== ====== ========= ======= =========================
database
======== ====== ========= ======= =========================
name type instances default allowed
type string 0..1 file file
file string 0..1 db.xml any valid filename string
format string 0..1 xml xml
======== ====== ========= ======= =========================
This section contains parameters affecting database selection. A
database selection has 3 attributes: type, type-specific locator,
type-specific format. Currently only the file type is supported.
.. code-block:: XML
<database>
<type>file</type>
<file>db-special.xml</file>
<format>xml</format>
</database>
selects database parameters from the xml file db-special.xml
scattering
----------
.. note::
========== ===================== ========= ======= =========
scattering
========== ===================== ========= ======= =========
name type instances default allowed
type string 0..1 all all, self
dsp scattering.dsp 0..1 - -
average scattering.average 0..1 - -
vectors scattering.vectors 0..1 - -
background scattering.background 0..1 - -
signal scattering.signal 0..1 - -
========== ===================== ========= ======= =========
The section contains parameters which affect the resulting scattering
signal.
scattering.target
-----------------
obsolete! has been moved to the stager section.
scattering.type
---------------
Two types of scattering functions are currently supported: Coherent
(all) and Incoherent (self) scattering. The calculation schemes for the
two types of scattering are fundamentally different.
.. code-block:: XML
<type>all</type>
declares that the computed scattering diagram represents coherent
scattering.
scattering.vectors
------------------
.. note::
scattering.vectors
====== ======================== ========= ============= =======================
name type instances default allowed
====== ======================== ========= ============= =======================
type string 0..1 single single, scans, file
single vector 0..1 x=0, y=0, z=1 valid vector definition
scans scattering.vectors.scans 0..1 - -
file string 0..1 qvectors.txt any valid filename
====== ======================== ========= ============= =======================
.. note::
scattering.vectors.scans
==== ============================= ========= ======= =======
name type instances default allowed
==== ============================= ========= ======= =======
scan scattering.vectors.scans.scan 0..3 - -
==== ============================= ========= ======= =======
.. note::
scattering.vectors.scans.scan
======== ====== ========= ============= =========================
name type instances default allowed
======== ====== ========= ============= =========================
from double 0..1 0 any floating point number
to double 0..1 1 any floating point number
points int 0..1 100 positive int
exponent double 0..1 1.0 any floating point number
base vector 0..1 x=1, y=0, z=0 valid vector definition
======== ====== ========= ============= =========================
The scattering diagram contains scattering intensities as a function of
the direction of observation (q vector). This section allows the
defintion of q vectors which are used to compute the scattering diagram.
The vectors type defines how the q vectors are generated/supplied.
scattering.vectors.single
~~~~~~~~~~~~~~~~~~~~~~~~~
When using the type single, the scattering diagram contains only one q
vector.
.. code-block:: XML
<vectors>
<type>single</type>
<x>1</x>
<y>0</y>
<z>0</z>
</vectors>
defines the q vector to be :math:`\vec{q}=(\begin{array}{ccc}
1 & 0 & 0\end{array}` )
scattering.vectors.scans
~~~~~~~~~~~~~~~~~~~~~~~~
The scans type currently allows the definition of up to three scan
elements, which provide ranges of q vectors. The different scan elements
are combined to yield multi-dimensional scans. This may yield a large
number of q vectors. For instance, when defining three scan elements
along the x, y and z axis, respectively, with 100 points each, the total
number of q vectors will be 1000000. Additionally, the exponent element
allows the generation of non-uniform q vector ranges. This is helpful in
cases where some q regions have to be more densly sampled than others. Q
vectors are generated from a range by setting the first and last point
to the supplied from and last. Other q vector values are determined
based on their point assignment :math:`i`:
:math:`q_{i}=(\frac{i}{N})^{E}\cdot(q_{to}-q_{from})`. The direction of
each q vector is determined by the supplied base vectors (normalized).
.. code-block:: XML
<vectors>
<type>scans</type>
<scans>
<scan>
<x>1</x><y>0</y><z>0</z>
<from>-2</from>
<to>2</to>
<points>50</points>
</scan>
<scan>
<x>0</x><y>1</y><z>0</z>
<from>-2</from>
<to>2</to>
<points>50</points>
</scan>
</scans>
</vectors>
creates a list of q vectors, corresponding to a scattering diagram in
the xy plane. The resulting diagram has a 50 pixel resolution in each
direction with 49 steps of 4/49 from -2 to 2.
.. code-block:: XML
<vectors>
<type>scans</type>
<scans>
<scan>
<x>1</x><y>0</y><z>0</z>
<from>0.001</from>
<to>1</to>
<points>100</points>
<exponent>3</exponent>
</scan>
</scans>
</vectors>
creates a list of q vectors, corresponding to a scattering diagram in
the along the x axis. The resulting diagram has a 100 pixel resolution
99 steps. The step size is non-linear with an exponent of 3.
scattering.vectors.file
~~~~~~~~~~~~~~~~~~~~~~~
The file type allows to read q vectors from a source text file (
line-by-line, whitespace delimited). This way the user can use their own
algorithms to generate complex sets of q vectors.
.. code-block:: XML
<vectors>
<type>file</type>
<file>qvectors.txt</file>
</vectors>
creates 5 q vectors based on the contents in file “qvectors.txt”.
::
1 0 0
1 1 1
1 0 1
0 1 0
0.3 0.3 0.1
scattering.average
------------------
.. note::
scattering.average
=========== ============================== ========= ======= =======
name type instances default allowed
=========== ============================== ========= ======= =======
orientation scattering.average.orientation 0..1 - -
=========== ============================== ========= ======= =======
.. note::
scattering.average.orientation
========= ======================================== ========= ======= ==================
name type instances default allowed
========= ======================================== ========= ======= ==================
type string 0..1 vectors vectors, multipole
vectors scattering.average.orientation.vectors 0..1 - -
multipole scattering.average.orientation.multipole 0..1 - -
========= ======================================== ========= ======= ==================
.. note::
scattering.average.orientation.vectors
========== ====== ========= ======================== ========================================================
name type instances default allowed
========== ====== ========= ======================== ========================================================
type string 0..1 sphere sphere, cylinder, file
algorithm string 0..1 boost_uniform_on_sphere if type=sphere: boost_uniform_on_sphere
algorithm string 0..1 boost_uniform_on_sphere if type=cylinder: boost_uniform_on_sphere, raster_linear
file string 0..1 qvector-orientations.txt any valid filename
seed int 0..1 0 positive int
resolution int 0..1 100 positive int
axis vector 0..1 x=0, y=0, z=1 valid vector definition
========== ====== ========= ======================== ========================================================
.. note::
scattering.average.orientation.multipole
========== ====== ========= ============= =======================
name type instances default allowed
========== ====== ========= ============= =======================
type string 0..1 sphere sphere, cylinder
resolution int 0..1 20 positive int
axis vector 0..1 x=0, y=0, z=1 valid vector definition
========== ====== ========= ============= =======================
This section defines the type of averaging procedures which are applied
in-place. Currently only orientational averaging is supported. There are
two types of orientational averaging procedures (Monte Carlo,
Multipole). The Monte Carlo scheme performs oriental averaging by
recomputing and integrating the scattering signal for a set of random
directions. This corresponds to stochastic integration, which is
generally good when the intensities do not vary signficantly. For highly
crystalline samples, which feature strong Bragg peaks, a large number of
vectors may be necessary to reach convergence ( O(5)-O(6) ). The
Multipole scheme employs a multipole expansion of the exponential terms
involved in the scattering calculation. It performs superior for low q
values ( q<<1 ). At large q values, multipole moments of high order
become dominant, which requires to increase resolution incrementally.
scattering.average.vectors
~~~~~~~~~~~~~~~~~~~~~~~~~~
The vectors type triggers the Monte Carlo scheme for orientational
averaging and the parsing of the vectors sections in
scattering.average.orientations. The q vector orientations are
determined in one of three ways. When using file type, the orientations
are determined from a file, similar to “qvectors.txt” in section
scattering.vectors.file. The other two methods allow spherical or
cylindrical averaging using an internal algorithm to generate random
orientations. The resolution specifies the number of directions which
contribute to the integral of the orientationally averaged scattering
intensity. In each case, the q vector length is taken from the original
q vector. When computing orientationally averaged scattering diagrams
with more than one q vector, the same set of random orientations is used
in each case.
.. code-block:: XML
<average>
<orientation>
<type>vectors</type>
<vectors>
<type>sphere</type>
<algorithm>boost_uniform_on_sphere</algorithm>
<resolution>1000</resolution>
<seed>5</seed>
</vectors>
<orientation>
</average>
triggers isotropic (sphere) orientational averaging, using 1000 random
directions and a seed value of 5 for the random number generator.
.. code-block:: XML
<average>
<orientation>
<type>vectors</type>
<vectors>
<type>cylinder</type>
<algorithm>boost_uniform_on_sphere</algorithm>
<resolution>1000</resolution>
<seed>5</seed>
<axis>
<x>1</x><y>1</y><z>1</z>
<axis>
</vectors>
<orientation>
</average>
triggers anisotropic (cylindrical) orientational averaging, using 1000
random directions and a seed value of 5 for the random number generator.
The cylinder axis points towards :math:`(\begin{array}{ccc}
1 & 1 & 1\end{array}).`
scattering.average.multipole
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TODO
::
multipole
type = "sphere"
moments
type
resolution = 20
file = "moments.txt"
axis
x = 0
y = 0
z = 1.0
exact
type = "sphere"
scattering.dsp
--------------
.. note::
scattering.dsp
====== ====== ========= ============= ============================
name type instances default allowed
====== ====== ========= ============= ============================
type string 0..1 autocorrelate autocorrelate, square, plain
method string 0..1 fftw direct, fftw
====== ====== ========= ============= ============================
In the first stage the software computes complex scattering amplitudes.
For coherent scattering (all) the results on each nodes are then
communicated and gathered on a selected node. For incoherent scattering
(self) this is not necessary. The aggregated data corresponds to the
full time series of the complex scattering amplitue for the system and
the individual atoms, for coherent and incoherent scattering,
respectively. At this stage, the user may employ a time series analysis
or manipulation of the data. Currently two types of routines are
implemented, one computing the autocorrelation and the other one doing
an element-wise complex conjugate multiplication.
scattering.dsp.type.autocorrelate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When using the dsp type autocorrelate, the signal is replaced by its
autorrelation. Autocorrelation can be either computed with a direct
algorithm or with fftw routines. The fftw routines usually feature a
superior scaling for large number of timesteps.
.. code-block:: XML
<dsp>
<type>autocorrelate</type>
<method>fftw</method>
</dsp>
will trigger the autocorrelation of the scattering signal. The fftw
method is used.
scattering.dsp.type.square
~~~~~~~~~~~~~~~~~~~~~~~~~~
When using the dsp type square, each element of the signal is multiplied
with its conjugate complex value. The resulting signal is pure real and
can be regarded as the time series of the zero time delay value of the
signal autocorrelation.
.. code-block:: XML
<dsp>
<type>square</type>
</dsp>
triggers the squaring of the scattering signal. The resulting signal
will be purely real valued.
scattering.signal
-----------------
.. note::
scattering.signal
==== ====== ========= ========= ===========
name type instances default allowed
==== ====== ========= ========= ===========
file string 0..1 signal.h5 any valid filename
fqt bool 0..1 true true, false
fq0 bool 0..1 true true, false
fq bool 0..1 true true, false
fq2 bool 0..1 true true, false
==== ====== ========= ========= ===========
Each q vector yields a complete time series of the scattering intensity
with the exact form depending on any settings in the dsp section. For
some scattering calculations the time dependent information can be
eliminated and replaced by a mean value, i.e. the scattering diagram
becomes a function of only the q vector. In that case the total
time-dependent signal “fqt” may be discarded and only some aspects of
this function be preserved. Currently 3 additional values are computed
for each “fqt”: “fq” which is the total time integral of “fqt”, “fq0”
which is the zero-time element (for correlation, the zero time delay
element) of “fqt” and “fq2”, which corresponds to the complex conjugate
multiplication of “fq”. Whether or not these data element are written to
the final signal file, can be triggered by activating their
corresponding values. The default is that each value is written to the
output signal file. The output file is written in the hdf5 format.
.. code-block:: XML
<signal>
<file>mysignal.h5</file>
<fqt>false</fqt>
<fq0>true</fq0>
<fq>true</fq>
<fq2>false</fq2>
</signal>
will write dataset entries for fq0 and fq, but not for fqt and fq2. The
output ist stored in hdf5 format in the file with name “mysignal.h5”.
scattering.background
---------------------
.. note::
scattering.background
====== ============================ ========= ======= =========================
name type instances default allowed
====== ============================ ========= ======= =========================
factor double 0..1 0 any floating point number
kappas scattering.background.kappas 0..1 - -
====== ============================ ========= ======= =========================
.. note::
scattering.background.kappas
===== ================================== ========= ======= =======
name type instances default allowed
===== ================================== ========= ======= =======
kappa scattering.background.kappas.kappa 0..1 - -
===== ================================== ========= ======= =======
.. note::
scattering.background.kappas.kappa
========= ====== ========= ======= =========================
name type instances default allowed
========= ====== ========= ======= =========================
selection string 0..1 system any predefined selection
value double 0..1 1.0 any floating point number
========= ====== ========= ======= =========================
Each atom has an assigned atomic scattering length. In case of x-ray
scattering, the scattering length depends on the q vector length. Since
the scattering from molecular structure data only incorporates atoms
which are explicitly modeled, the final scattering diagram is missing
scattering from the surrounding and the solvent. For small values of q,
the surrounding can be approximated by substracting an effective
scattering length density of the typical system from the individual
atomic scattering lengths. The correction requires to approximate the
excluded volume effect of the particular atom. One of two major
contributions comes from the size of the particular atom, the other from
the molecular phase it is incorporated in. The database defines excluded
volumes for common atom types. Additionally the user may scale these
volumes dependent on the particular material the atoms are incorporated
in by specifying a kappa value (scaling coefficient) and the respective
atom selection. For instance an oxygen atom within water may displace
more volume than an average oxygen atom within a protein.
A major idea of this type of correction is that the scattering of a
disordered system, e.g. water, should not produce a scattering intensity
for low q values (q=0). However, the scattering calculation of a finite
box of water will result in a non-zero scattering intensity at low q
values, which is an artefact due to the missing surrounding. The
surrounding can be approximated by offsetting the individual atomic
scattering lengths so that the overall scattering becomes zero at low q
values.
.. code-block:: XML
<background>
<factor>0.005</factor>
<kappas>
<kappa>
<selection>Water</selection>
<value>1.42</value>
</kappa>
</kappas>
</background>
set the background scattering length density to 0.005 and scales the
volumes for atoms incorpated in the selection “Water” by a factor of
1.42.
limits
------
.. note::
limits
============= ==================== ========= ======= =======
name type instances default allowed
============= ==================== ========= ======= =======
stage limits.stage 0..1 - -
signal limits.signal 0..1 - -
computation limits.computation 0..1 - -
services limits.services 0..1 - -
decomposition limits.decomposition 0..1 - -
============= ==================== ========= ======= =======
The parameters specified in the limits section allow to adjust threshold
values and performance figures. Threshold values exist to guarantee that
the software does not crash due to resource starvation. It also protects
the compute nodes from abusive configurations. However, the threshold
value might not fit any possible use case, in which cases the user may
may want to overwrite theses values. Some values have limited lifetime
within the application and/or limited scope. The section limits is
organized into contexts. Each context has a certain lifetime during the
application, see Figure X for details. Parameters are usually declared
in the context in which they are instantiated. However, the lifetime of
the particular parameter may exceed the lifetime of the context (e.g.
setting the buffer size for coordinates during staging, which will
remain during the computation). The default values are tuned to allow
for a wide range of use case and applicability on the state-of-the art
cluster designs. When changing the default values, the user should take
care to guarantee that the available hardware resources match the
computational requirements.
limits.stage
------------
.. note::
limits.stage
====== =================== ========= ======= =======
name type instances default allowed
====== =================== ========= ======= =======
memory limits.stage.memory 0..1 - -
====== =================== ========= ======= =======
.. note::
limits.stage.memory
====== ==== ========= ================= ============
name type instances default allowed
====== ==== ========= ================= ============
data int 0..1 524288000 = 500MB positive int
buffer int 0..1 104857600 = 100MB positive int
====== ==== ========= ================= ============
Before any computation is performed, the cartesian coordinates are read
into local memory. This staging of the data is split into two phases.
In the first phase, the first partition reads the trajectory data from
the storage device (disk,network) and stores them into the internal
buffer for the coordinates data (limits.stage.memory.data). When
computing coherent scattering, the data alignment in the trajectory
files coiincites with the partitioning scheme, allowing each node to
read the coordinates directly into the local buffer. For incoherent
scattering, the data has to be aligned by atoms, thus requiring a
tranposition of the data during the initial read. This requires
additional buffers (limits.stage.memory.buffer). The transposition of
the data is carried out through a collective MPI all-to-all, which
results in a synchronization point. To minimize the number of
synchronization points, the internal buffer is filled completely, before
the data gets transposed. The internal buffer has to have a minimum size
to hold at least one frame of the data. The size of the partition
determines the number of nodes which access the trajectory data in
parallel, thus increasing the partition size results in a more
aggressive IO behavior.
In the second phase, the coordinates stored in the first partition are
cloned to all other partitions. This is implemented through the MPI
collective broadcast.
.. code-block:: XML
<stage>
<memory>
<data>600000000</data>
<buffer>50000000</buffer>
</memory>
</stage>
The available memory for the local storage of coordinates is set to
about 600MB. The buffer during data exchange is about 50MB.
limits.signal
-------------
.. note::
limits.signal
========= ==== ========= ======= ============
name type instances default allowed
========= ==== ========= ======= ============
chunksize int 0..1 10000 positive int
========= ==== ========= ======= ============
The output file is written in hdf5 format. Parameters which are related
to the content of the signal file are given in scattering.signal. The
parameters in this section (limits.stage) determine “how” the signal
file is written. This can affect overall performance. The default
parameters should yield good performance in most cases. Currently, only
the chunksize parameter is adjustable. It determines the minimum size of
a data element existiting on the disk. Please refer to the HDF5 manual
for details on chunks. The default value is 10000 (corresponds to
complex value entries, e.g. 16 bytes each). For the “fqt” signal, the
cunks are aligned the time dimension. If the time dimension has
significantly less than 10000 entries, the chunks contain more than one
q vector. In general large chunksizes are prefered for large datasets,
because each chunk element has to be managed within the HF5 file.
Millions of chunks may slow down the reading and writing of the data
considerably. For acceptable performance, the number of chunk elements
should be kept to be smaller than 50000. With a default chunksize of
10000 (160kbyte) this corresponds to a file size of 8GB. If larger
datasets have to be stored, the chunksize should be increased.
.. code-block:: XML
<signal>
<chunksize>20000</chunksize>
</signal>
doubles the chunksize
limits.computation
------------------
.. note::
limits.computation
======= ========================= ========= ======= ============
name type instances default allowed
======= ========================= ========= ======= ============
memory limits.computation.memory 0..1 - -
threads int 0..1 1 positive int
======= ========================= ========= ======= ============
.. note::
limits.computation.memory
=============== ==== ========= ================= ============
name type instances default allowed
=============== ==== ========= ================= ============
signal_buffer int 0..1 104857600 = 100MB positive int
result_buffer int 0..1 104857600 = 100MB positive int
exchange_buffer int 0..1 104857600 = 100MB positive int
alignpad_buffer int 0..1 209715200 = 200MB positive int
scale int 0..1 1 positive int
=============== ==== ========= ================= ============
During the calculation the incoherent (self) and coherent (all)
scattering, the total time signal for each q vector orientation has to
be aggregated on one node. Also, each node has to keep a local cache of
the total time signal to avoid unneccessary communication. This can
consume a considerable amount of computer memory, which might lead to
resource starvation. The parameter in section computation.memory
protects the user from accidental memory overconsumption. The current
defaults allows for storage of up to 4 million time steps (frames). If
longer trajectories have to be examined and the necessary hardware
requirements are met, adjusting the parameters under computation.memory
allows for an arbitrary long time signal. The software also has
experimental support for threads. By default only one worker thread per
MPI node is active. Setting computation.threads to higher values allows
the use of multiple threads to utilize local parallelism. To avoid
synchronization between the threads, each thread has own its own memory
space. Thus the use of threads may be memory limited. The utility of
threads is scoped to averaging prodecures, i.e. it enables parallelism
for the computation of orientational averages. Not all buffers are used
at the same time and by all modes. Sassena has a memory check routine
which anticipates the memory use and provides guidance on the
recommended limits. It is thus recommended to only increase the limits
if the software asks for it. The convience parameter “scale” provides a
means to simply increase the memory limits by the specified factor,
which is useful for underallocation of compute nodes for the sake of
providing more memory per MPI process (e.g. a 12-core computer node may
have 12GB RAM. Allocating 12 MPI processes would provide a maximum of
1GB memory to each process. If allocating 3 MPI processes we allow for
4GB per process and simply increase the sassena software memory limits
by setting “scale” to 4.)
.. code-block:: XML
<computation>
<memory>
<signal_buffer>200000000<signal_buffer>
<result_buffer>200000000<result_buffer>
<exchange_buffer>200000000<exchange_buffer>
<alignpad_buffer>200000000<alignpad_buffer>
<scale>4</scale>
</memory>
<threads>4</threads>
</computation>
increases the internal memory threshold for each buffer to ~800MB. Also
the number of worker threads is set to 4.
limits.services
---------------
.. note::
limits.services
====== ====================== ========= ======= =======
name type instances default allowed
====== ====================== ========= ======= =======
signal limits.services.signal 0..1 - -
====== ====================== ========= ======= =======
.. note::
limits.services.signal
====== ============================= ========= ======= =======
name type instances default allowed
====== ============================= ========= ======= =======
memory limits.services.signal.memory 0..1 - -
times limites.services.signal.times 0..1 - -
====== ============================= ========= ======= =======
.. note::
limits.services.signal.memory
====== ==== ========= ========= ============
name type instances default allowed
====== ==== ========= ========= ============
server int 0..1 104857600 positive int
client int 0..1 10485760 positive int
====== ==== ========= ========= ============
.. note::
limits.services.signal.times
=========== ==== ========= ======= ============
name type instances default allowed
=========== ==== ========= ======= ============
serverflush int 0..1 600 positive int
clientflush int 0..1 600 positive int
=========== ==== ========= ======= ============
To avoid synchronization points between the partitions when writing data
to the output file and when reporting progress to the console, the
necessary services have been implemented as seperate network protocols.
When the software starts up, it initializes the services and starts the
corresponding threads. The server threads are located on MPI node rank
0. Each MPI node then sends signal output and progress information via
these interfaces. These interfaces bind to the tcp ethernet. This is OK,
since the amount of progress information and the final signal data fits
well within the capacities of gigabit ethernets. The effect of network
latencies is reduced by the implementation of signal output buffers,
which can be adjusted by the parameters in
limits.services.signal.memory. The parameters in
limits.services.signal.times allows to specificy time intervals in
seconds for which the signal data should be communicated an written to
disk. Using the MPI layer for progress and signal output data, would
allow better performance, however it requires to dedicate MPI nodes
(threading support for MPI is still not supporting on all machines).
.. code-block:: XML
<services>
<signal>
<memory>
<server>30000000</server>
<client>2000000</client>
</memory>
<times>
<serverflush>300</serverflush>
<clientflush>300</clientflush>
</times>
</signal>
</services>
sets the signal output buffer sizes to about 30MB for the server and 2MB
for the clients. The timeouts for flushing data to disk or to the server
is set to 300 seconds for the server and the client, respectivley.
limits.decomposition
--------------------
The efficient utilization of the parallel environment requires the
partitioning of the problem based on some metrics. For coherent (all)
scattering the best partitioning strategy is frame based, for incoherent
(self) it is atom based. However, the number of frames and atoms may
limit the scalability. In that case more than one q vector may be
processed in parallel. The software uses a heuristic deterministic
partitioning scheme to calculate absolute utilization factors. It will
find the parititoning with the global best utilization. When more than
one best solution (utilization) is available, the paritioning algorithm
favors large parititons. The user may want to manually specifiy the
partition size by setting limits.decomposition.partitions.automatic to
false and set the partition size with
limits.decomposition.partitions.size. This may be necessary when a
pariticular partitioning is favored, e.g. to match the number of cores
per machine with the partition size, thus eliminating inter-node
communication. The algorithm does not take this into account. Another
reason to fix the partition size is to achieve a specific IO
performance, since the parallel bandwidth (number of nodes which read
the trajectory) during the staging is determined by the partition size.
.. code-block:: XML
<decomposition>
<partitions>
<automatic>false</automatic>
<size>8</size>
</partitions>
</decomposition>
disables automatic decomposition and set the partition size to 8. Given
that at least as many q vectors are to be computed, this yields a total
of 5 active partitions when using 40 nodes.
debug
-----
::
debug.timer = false // this adds a log message when a timer is started/stopped
debug.barriers = false // this de-/activates collective barriers before each collective operation, this way all nodes are synchronized before the communication takes place. This is an important step towards analysis of timing.
debug.monitor.update = true
debug.iowrite.write = true
debug.iowrite.buffer = true
debug.print.orientations = false
*******************
Config file options
*******************
.. note::
root
========== ========== ========= ======= =======
name type instances default allowed
========== ========== ========= ======= =======
sample sample 0..1 - -
stager stager 0..1 - -
scattering scattering 0..1 - -
limits limits 0..1 - -
debug debug 0..1 - -
========== ========== ========= ======= =======
Some sections may require an element of type vector:
.. note::
vector
===== ====== ========= ======= =========================
name type instances default allowed
===== ====== ========= ======= =========================
x double 0..1 0 any floating point number
y double 0..1 0 any floating point number
z double 0..1 0 any floating point number
===== ====== ========= ======= =========================
The Sassena configuration file enables the user to set mandatory and
optional parameter which are used during execution of the software. Each
optional configuration parameter is supplied with a default value. The
configuration file is based on the XML format. The configuration
parameters are organized into a tree hierarchy which maps a class
inheritance diagram.
The configuration file is parsed for appearances of certain key
sections. Sections which do not map to a valid entry are currently
ignored. *The user should be aware that misspellings of sections which
are not mandatory may result in those sections to exercise their default
behavior.*
Some entries trigger the parsing of other sections. If that is the case,
the user has to make sure that these sections are properly defined.
The configuration file is structured into 6 main sections.
Some sections may contain multiple sections with identical names. In
this case the order by which they appear in the file is preserved.
The four different sections have the following distinct features:
sample
======
contains sections which modify the available data. This includes
coordinates and selection names.
.. toctree::
:maxdepth: 1
config-1-sample
stager
======
contains sections which determines the staging mode of the trajectory
data.
.. toctree::
:maxdepth: 1
config-2-stager
database
========
contains sections relevant for selecting the correct database
.. toctree::
:maxdepth: 1
config-3-database
scattering
==========
contains sections which modify the retrieved signal, but don’t affect
the available data.
.. toctree::
:maxdepth: 1
config-4-scattering
limits
======
contains sections which neither modify the retrieved signal, nor the
available data, but impacts the performance and computational aspects
on how the calculation is carried out.
.. toctree::
:maxdepth: 1
config-5-limits
debug
=====
contains sections which provide control switches and debug
information for different aspects of the software. This section is
intended to allow debugging without the need for recompilation. It
may affect available data, the retrieved signal and the achieved
performance
.. toctree::
:maxdepth: 1
config-6-debug
For Users
============
**********
User Guide
**********
This section is designed to serve as a quick guide for users (who aren't
developers) to run Sassena.
.. toctree::
:maxdepth: 2
:caption: Contents
:glob:
using
installation
running
cmd-options
config-options
Installation
------------
AppImage
~~~~~~~~
An AppImage is provided that can run on Linux (x64, without CUDA support) out of
the box when executed (no installation needed). To use it, download the latest AppImage
from the `releases page
<https://codebase.helmholtz.cloud/DAPHNE4NFDI/sassena/-/releases>`_, mark it as
an executable and run it from the command-line:
.. code-block:: bash
chmod +x Sassena.AppImage
./Sassena.AppImage # --config scatter.xml etc.
# or to use mpi:
mpirun -n 16 ./Sassena.AppImage --config scatter.xml
Compiling from Source
~~~~~~~~~~~~~~~~~~~~~
If you want to target a specific instruction set (e.g. ARM) or want to use the
CUDA implementation, compiling from source is needed.
VSCode
""""""
The easiest way to compile Sassena is by using VSCode's `Dev Containers
<https://code.visualstudio.com/docs/devcontainers/containers>`_. This will
automatically create a Docker container, install dependencies, configure the build and compile Sassena
inside that container. Sassena must then be run form inside this container. This
means that no additional dependencies (e.g. Boost) need to be installed on the
system as they are all encapsulated inside the container.
To use VSCode's Dev Containers, you need to have *Docker* installed. For Linux,
this means Docker Engine and Docker Compose. Please follow the installation
guide for Docker `here <https://docs.docker.com/engine/install/>`_ and note that
you must add your user to the `docker` group so that VSCode can create and run
containers without sudo.
If you want to also build with CUDA support, you must first make sure that your system has the
NVIDIA drivers installed. Additionally, running CUDA applications inside of a
Dev Container requires the installation of the `NVIDIA Container Toolkit
<https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_.
Docker must be configured to use the NVIDIA Container toolkit, so make sure to
follow the `configuration guide
<https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker>`_
as well.
You can then clone the project. Make sure to update the submodules as Sassena
relies upon the header-only library `taskflow <https://taskflow.github.io/>`_, which is distributed as a
submodule in the repo.
.. code-block:: bash
git clone https://codebase.helmholtz.cloud/DAPHNE4NFDI/sassena.git
cd sassena
git submodule update --init --recursive
Once you have the required dependencies, you can open the folder in VSCode and
the editor will then ask whether you want to open the project in a container.
You can select either the CPU or the CPU+CUDA container depending on your needs.
If you don't get any prompt, you can follow the additional instructions in the
`Dev Container Documentation
<https://code.visualstudio.com/docs/devcontainers/attach-container>`_.
Once VSCode has built the container, it should ask you to configure the project,
which corresponds to the CMake configure step. If you only intend to run Sassena
and not develop further, pick either "Release (CPU)" or "Release (CPU+CUDA)".
You can then build the project using build button as normal or F7. Once it has
been built, you can open a terminal in VSCode (Terminal > New Terminal) and a
shell will appear pointing to the current build directory, where the sassena
binaries should be visible.
Other Editors
"""""""""""""
For optimal performance (e.g. using target-specific registers, CUDA), Sassena
can be compiled on the target machine, e.g. on Ubuntu 24.04 without using any
containers.
.. note::
If you want to use the CUDA backend, you will need the NVIDIA drivers (not
the open-source `nouveau`) as well as the `CUDA Toolkit
<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ as
it contains not only the runtime libraries, but also libraries such as
`cuFFT`, which the Sassena CUDA backend relies upon.
The process is in essence the same as building with VSCode, though it requires
manually configuring with CMake and compiling:
.. code-block:: bash
sudo apt install build-essential cmake ninja-build git\
openmpi-bin libblas-dev liblapack-dev \
libxml2-dev libhdf5-dev zlib1g-dev libfftw3-dev\
libboost-regex-dev libboost-mpi-dev\
libboost-thread-dev libboost-serialization-dev libboost-system-dev\
libboost-filesystem-dev libboost-program-options-dev
git clone https://codebase.helmholtz.cloud/DAPHNE4NFDI/sassena.git
cd sassena
git submodule update --init --recursive
cmake --preset=rel-cpu --build # or rel-cuda if you want to build with CUDA
Clusters
""""""""
If you are compiling Sassena on a cluster and are not able to install packages
such as Boost, you can still compile them locally and include them in the
Sassena build.
During the CMake configuration process, Sassena will look for the following
libraries:
* LibXml2
* ZLIB
* HDF5
* LAPACK
* spdlog
* FFTW3
* CudaToolkit (if using CUDA)
* Boost
* MPI
You can compile these separately and pass their locations to CMake. For example,
for Boost, this means setting the ``BOOST_ROOT`` location:
.. code-block:: bash
cmake --preset=dev -DBOOST_ROOT=/opt/boost .
Please note that there is no standardised naming convention for these CMake
options. For each library, you must consult the relevant ``FindX`` page in the
CMake documentation, where ``X`` is the name of the library. For example, for
``HDF5``, the relevant CMake module is called `FindHDF5
<https://cmake.org/cmake/help/latest/module/FindHDF5.html>`_.
CMake Presets
"""""""""""""
Sassena defines several `CMake presets
<https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_ which saves
time over configuring the CMake build manually as it sets default values for the
most common options. All of these values can be viewed or changed in
``CMakePresets.json``, though you can still override them using the command-line
as usual:
.. code-block:: bash
cmake --preset=rel-cpu -DSASS_BUILD_DOCS=ON .
This would use the Release (CPU) preset but also build documentation, which is
usually disabled for a release build.
Depending on your use-case, you may wish to use a different preset, but we
recommend you stick with ``Release (CPU)`` for most cases:
.. list-table:: Sassena's CMake Presets
:widths: 25 25 50
:header-rows: 1
* - Preset Display Name
- Preset CMake Parameter
- Explanation
* - Dev (CPU)
- ``--preset=dev``
- This is the debug build for Sassena with CPU support only. It will have
signficantly worse runtime performance than a release build but will
allow you to debug the program with a debugger like ``gdb``. It also
enables the developer mode, so tests will be built.
* - Dev (CUDA)
- ``--preset=dev-cuda``
- This is also a debug build for Sassena except it also builds the project
with CUDA support. This means you must have a CUDA-compatible GPU, the
NVIDIA Drivers and the CUDA Toolkit installed.
* - Release (CPU)
- ``--preset=rel-cpu``
- This is a release build of Sassena (i.e. optimisations are enabled) and
supports CPU only.
* - Release (CUDA)
- ``--preset=rel-cuda``
- This is a release build of Sassena (i.e. optimisations are enabled) and
supports CUDA, so the same precautions must be taken as with ``dev-cuda``.
Building Unit Tests
"""""""""""""""""""
Sassena defines a CMake option ``USE_DEVELOPER_MODE``, which is automatically
enabled if you use either of the ``dev`` presets. If this option is enabled, then unit
tests are built.
The unit tests are run as a testing step on the CI but if you wish to run them
locally, they appear as a target in CMake. Therefore, depending on the
generator (e.g. Ninja, Make), you can execute
.. code-block:: bash
ninja test
to run the tests. If you are using a release build and still want to run the
tests, you can pass the option ``-DUSE_DEVELOPER_MODE=ON`` to CMake in the
command-line as described in the CMake Presets section.
Building Documentation
""""""""""""""""""""""
If you want to build the documentation locally, then you must have Sphinx and
Doxygen installed, as well as ``doxysphinx`` and ``sphinx_rtd_theme`` (both of
which are written in Python and are available on ``pip``). All of these
dependencies are already included in the VSCode Dev Container.
You must then enable the ``SASS_BUILD_DOCS`` option in CMake. This can be done
by passing ``-DSASS_BUILD_DOCS=ON`` to CMake when configuring the project, for
example. If this succeeded, then the ``docs-sass`` target will be added to the
build, which can be built by running
.. code-block:: bash
ninja docs-sass
if you are using the Ninja build generator (which is the default).
The ``.html`` files will be built in the ``docs/sphinx`` directory inside the
build folder.
Other Build Generators
""""""""""""""""""""""
CMake does not actually build the project itself but instead generates a build script
for a `build system`, e.g. Make, Visual Studio or XCode. Sassena uses Ninja by default as it is simple
and performant. However, if you don't want to use Ninja you can override the
default option, e.g.
.. code-block:: bash
cmake --preset=rel-cpu -G "Unix Makefiles" .
will tell CMake to output Makefiles instead. You can then run ``make`` in the
build directory as usual.
Running
-------
Test Project
~~~~~~~~~~~~
Once you have built Sassena, you may wish to try it on a sample project:
.. code-block:: bash
cd tests/waterdyn4
../../build-rel-cpu/bin/sassena --config scatter.xml
The command-line should terminate with something like:
.. code-block:: bash
[11:38:49 Rank 0] [info] Total runtime (s): 77.552289999999999
[11:38:49 Rank 0] [info] Successfully finished... Have a nice day!
and will output to ``signal.h5`` by default.
Running with CUDA
~~~~~~~~~~~~~~~~~
Sassena has a CUDA backend for self scattering, though it is only possible
to use it if you have built the project from source and used the "Release
(CPU+CUDA)" or "Dev (CPU+CUDA)" CMake preset. You must direct
Sassena to use it in the ``scatter.xml`` file by adding the following lines inside
the ``<scattering>`` tag:
.. code-block:: xml
<device>
<type>cuda</type>
<atoms>100</atoms>
</device>
The value in the ``<type>`` tag can be either ``cpu`` or ``cuda`` depending on
the desired backend. If using CUDA, you can dictate how many atoms to calculate
in parallel using the ``<atoms>`` tag. Note that this is subject to memory
capacity on the GPU and will fail if the value is too high. We recommend you
experiment and use a value as high as possible before the program refuses due to
lack of memory.
Sassena can still be run with ``mpirun`` in the usual way when using the CUDA
backend as only one process will claim the GPU.
Using Sassena
==================================
Here we explain how to use sassena
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<title>Sassena: abstract_scatter_device.cpp File Reference</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
</head>
<body>
<div id="top"><!-- do not remove this div! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><img alt="Logo" src="logo.png"/></td>
<td style="padding-left: 0.5em;">
<div id="projectname">Sassena
&#160;<span id="projectnumber">1.4.1</span>
</div>
<div id="projectbrief">Software for calculating X-ray and Neutron Scattering Intensities from Molecular Dynamics Trajectories using Massively Parrallel Computers</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- Generated by Doxygen 1.7.5.1 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li class="current"><a href="files.html"><span>Files</span></a></li>
<li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
<div id="navrow2" class="tabs2">
<ul class="tablist">
<li><a href="files.html"><span>File&#160;List</span></a></li>
<li><a href="globals.html"><span>File&#160;Members</span></a></li>
</ul>
</div>
</div>
<div class="header">
<div class="headertitle">
<div class="title">abstract_scatter_device.cpp File Reference</div> </div>
</div>
<div class="contents">
<p>This file contains the interface definition for all scattering devices and implements an abstract scattering device from which all other devices are derived.
<a href="#details">More...</a></p>
<div class="textblock"><code>#include &quot;<a class="el" href="abstract__scatter__device_8hpp_source.html">scatter_devices/abstract_scatter_device.hpp</a>&quot;</code><br/>
<code>#include &lt;complex&gt;</code><br/>
<code>#include &lt;fstream&gt;</code><br/>
<code>#include &lt;boost/accumulators/accumulators.hpp&gt;</code><br/>
<code>#include &lt;boost/accumulators/statistics.hpp&gt;</code><br/>
<code>#include &lt;boost/lexical_cast.hpp&gt;</code><br/>
<code>#include &quot;<a class="el" href="exceptions_8hpp_source.html">exceptions/exceptions.hpp</a>&quot;</code><br/>
<code>#include &quot;<a class="el" href="coor3d_8hpp_source.html">math/coor3d.hpp</a>&quot;</code><br/>
<code>#include &quot;<a class="el" href="assignment_8hpp_source.html">decomposition/assignment.hpp</a>&quot;</code><br/>
<code>#include &lt;fftw3.h&gt;</code><br/>
<code>#include &quot;<a class="el" href="control_8hpp_source.html">control.hpp</a>&quot;</code><br/>
<code>#include &quot;log.hpp&quot;</code><br/>
<code>#include &quot;sample.hpp&quot;</code><br/>
<code>#include &quot;<a class="el" href="data__stager_8hpp_source.html">stager/data_stager.hpp</a>&quot;</code><br/>
</div>
<p><a href="abstract__scatter__device_8cpp_source.html">Go to the source code of this file.</a></p>
<hr/><a name="details" id="details"></a><h2>Detailed Description</h2>
<div class="textblock"><p>This file contains the interface definition for all scattering devices and implements an abstract scattering device from which all other devices are derived. </p>
<dl class="author"><dt><b>Author:</b></dt><dd>Benjamin Lindner &lt;<a href="mailto:ben@benlabs.net">ben@benlabs.net</a>&gt; </dd></dl>
<dl class="version"><dt><b>Version:</b></dt><dd>1.3.0 </dd></dl>
<dl class="copyright"><dt><b>Copyright:</b></dt><dd>GNU General Public License </dd></dl>
<p>Definition in file <a class="el" href="abstract__scatter__device_8cpp_source.html">abstract_scatter_device.cpp</a>.</p>
</div></div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Namespaces</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark">&#160;</span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark">&#160;</span>Friends</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark">&#160;</span>Defines</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<hr class="footer"/><address class="footer"><small>
Generated on Tue Feb 7 2012 20:58:53 for Sassena by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.7.5.1
</small></address>
</body>
</html>
This diff is collapsed.