Use CPUs simultaneously with GPU

The CPU and GPU implementations are not mutually exclusive, indeed it is already possible to run the GPU implementation alongside the CPU one. Yet the issue with this approach is that currently all MPI nodes are staged with an equal amount of work. Assuming the GPU node is as fast as 24 CPU nodes, then clearly the GPU should be provided with 24x the amount of work as one CPU node to achieve the highest overall performance. Currently this is not possible due to the implementation of the DataStagerByAtom class, though with some work this class could be re-written to give arbitrary weight factors to any MPI node.