Feature: Addition of other compilers, vectorization, OpenMP and hybrid parallelization

Summary:

Sassena is equipped with MPI parallelization for distributed parallelization and with threading for shared memory parallelization. However significant scalability is not achieved through thread parallelism only. It is useful to add another layer of shared memory parallelism to enable hybris parallelism within sassena. This branch tries to use OpenMP to solve this problem.
Sassena does not take the memory architecture into account while it vectorizes.
New options are added so that the user can choose the compiler. Current sassena would always chose the default MPI compiler.

Problem 1:

Detailed description: Current state of parallelism: (n MPI processes and n threads)

Problem with the current state: It is expected to be n*n or n^2 times faster. However, It is only n times faster.

Reason: If we use n MPI processes and 1 thread as shown below,

then it is n times faster.

However, if we use 1 MPI processes and n threads as shown below,

MPI Process 1
- Thread 1
- Thread 2 ...
- Thread n then it is not speeding up at all apart from some special case like a single trajectory.

Conclusion: So only MPI is effective.

Solution: This feature branch adds openMP as another layer of threading to solve this problem:

Expected final state: (n MPI processes, 1 thread and n OpenMP threads)

It is expected to be n*n times faster with this configuration.

Implementation steps:

I found the hotspot using the intel tools.
For all (coherent) type of calculations, parallelized the loop. For self (incoherent) type of calculations, could not find a good strategy.
Added options for intel compilers and added compiler flags to do vectorization more efficiently.

Edited Apr 12, 2023 by Arnab Majumdar