Parallelization
Parallelization is available using either distributed memory based on MPI or multithreading using OpenMP.
OpenMP
It is straightforward to enable some multithreading support. Note that this will speedup assembly and evaluation of the spatial operators but not in general the linear algebra backend so that the overall speedup of the code might not be as high as expected. Since we rely mostly on external packages for the linear algebra, speedup in this step will depend on the multithreading support available in the chosen linear algebra backend - see the discussion on how to switch between linear solver backends to, for example, use the thread parallel solvers from scipy.
By default only a single thread is used. To enable multithreading simply add
[1]:
from dune.fem import threading
print("Using",threading.use,"threads")
threading.use = 4 # use 4 threads
print("Using",threading.use,"threads")
Using 1 threads
Using 4 threads
At startup a maximum number of threads is selected based on the hardware concurrency. This number can be changed by setting the environment variable DUNE_NUM_THREADS
which sets both the maximum and set the number of threads to use. To get this number or set the number of threads to use to the maximum use
[2]:
print("Maximum number of threads available:",threading.max)
threading.useMax()
print("Using",threading.use,"threads")
Maximum number of threads available: 12
Using 12 threads
MPI
It is straightforward to use MPI for parallelization. It requires a parallel grid, in fact most of the DUNE grids work in parallel except albertaGrid
or polyGrid
. Most iterative solvers in DUNE work for parallel runs. Some of the preconditioning methods also work in parallel, a complete list is found at the bottom of the solver discussion.
Running a parallel job can be done by using mpirun
mpirun -np 4 python script.py
in order to use 4
MPI processes. Example scripts that run in parallel are, for example, the Re-entrant Corner Problem.
Note
On a cluster where multiple parallel jobs are run simultaneously, it’s advisable to use one separate cache per job. This can be easily done by copying an existing cache with pre-compiled modules. See this SLURM batch script for an example on how to do this.
Load balancing
When running distributed memory jobs load balancing is an issue. The specific load balancing method used, depends on the grid implementation. There are two ways to ensure a balanced work load. For computations without dynamic adaptation this only has to be done once in the beginning of the run. In fact, most grids do this automatically. For adaptation and load balancing with data see the crystal growth or Re-entrant Corner Problem examples.
[3]:
from dune.alugrid import aluCubeGrid
from dune.grid import cartesianDomain
from dune.fem.function import partitionFunction
domain = cartesianDomain([0,0], [1,1], [40,40], overlap=1)
gridView = aluCubeGrid( domain, lbMethod=9 )
# distribute work load, no user data is adjusted
gridView.hierarchicalGrid.loadBalance()
# print possible load balancing methods
help( aluCubeGrid )
Help on function aluCubeGrid in module dune.alugrid._grids:
aluCubeGrid(*args, **kwargs)
Create an ALUGrid instance.
Note: This functions has to be called on all cores and the parameters passed should be the same.
Otherwise unexpected behavior will occur.
Parameters:
-----------
constructor means of constructing the grid, i.e. a grid reader or a
dictionary holding macro grid information
dimgrid dimension of grid, i.e. 2 or 3
dimworld dimension of world, i.e. 2 or 3 and >= dimension
comm MPI communication (not yet implemented)
serial creates a grid without MPI support (default False)
verbose adds some verbosity output (default False)
lbMethod load balancing algorithm. Possible choices are (default is 9):
0 None
1 Collect (to rank 0)
4 ALUGRID_SpaceFillingCurveLinkage (assuming the macro
elements are ordering along a space filling curve)
5 ALUGRID_SpaceFillingCurveSerialLinkage (serial version
of 4 which requires the entire graph to fit to one core)
9 ALUGRID_SpaceFillingCurve (like 4 without linkage
storage), this is the default option.
10 ALUGRID_SpaceFillingCurveSerial (serial version
of 10 which requires the entire graph to fit to one core)
11 METIS_PartGraphKway, METIS method PartGraphKway, see
http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
12 METIS_PartGraphRecursive, METIS method
PartGraphRecursive, see
http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
13 ZOLTAN_LB_HSFC, Zoltan's geometric load balancing based
on a Hilbert space filling curve, see https://sandialabs.github.io/Zoltan/
14 ZOLTAN_LB_GraphPartitioning, Zoltan's load balancing
method based on graph partitioning, see https://sandialabs.github.io/Zoltan/
15 ZOLTAN_LB_PARMETIS, using ParMETIS through Zoltan, see
https://sandialabs.github.io/Zoltan/
lbUnder value between 0.0 and 1.0 (default 0.0)
lbOver value between 1.0 and 2.0 (default 1.2)
Returns:
--------
An ALUGrid instance with given refinement (conforming or nonconforming) and element type (simplex or cube).
Run this code snippet in parallel and then inspect the different domain decompositions. For plotting of parallel data vtk has to be used. In this case we display the rank information for each partition.
[4]:
vtk = gridView.sequencedVTK("data", celldata=[partitionFunction(gridView)])
vtk()
For load balancing with re-distribution of user data the function dune.fem.loadBalance
should be used. See the crystal growth or Re-entrant Corner Problem examples.