Welcome to SWIFT
Welcome to SWIFT, a joint project of the Institute for Computational Cosmology (ICC)
and Institute of Advanced Research
Computing (IARC) at the University of Durham.
The ICC is a world leading research institute in the field of
cosmology which focuses on the simulation of the formation of
structures in the Universe and the evolution of galaxies. Via a
collaboration with the IARC, the ICC started the development of the
SPH With Inter-dependent Fine-grained Tasking (SWIFT) code to provide
astrophysicists with a state of the art framework to perform particle
based simulations. The long term goal of the project is to create a
single framework to allow astnophysicists to run simulations
efficiently on all type of architectures ranging from desktop machines
to the largest super-computers.
The entirely open source code uses the concept of task-based
parallelism to distribute the work on the different computing units of
modern clusters. The library used to this end, itself also open
source, is named QuickShed and provides users with an
alternative to standard parallelisation strategies.
The collaboration between IARC and the ICC that is behind this project
is partially supported by Intel through the establishment of an Intel Parallel Computing Centre (IPCC) at the University of Durham.
Movies & Animations
Simulation results from standard hydrodynamical tests can be found
Those two examples have been run using the code with the "Gadget-2
SPH" hydrodynamics switched on. The initial conditions for these cases
can be found alongside the source code.
The aim of the code is to tackle the challenge of running particle
simulations with a very large dynamic range - arising for example in
problems of compressible hydrodynamics or galaxy formation -
efficiently on modern computer architectures. Such architectures
combine many levels of parallelism, using shared memory nodes of many
cores, some of which may have additionally an accelerator. An example
density field of a galaxy formation simulations is shown below: the
density in the hot regions (gas in haloes of galaxies) is many orders
of magnitude higher than in the dark regions (voids), and consequently
the time-steps over which the particles march in time as the system
evolves, also differ by many orders of magnitude.
Figure 1: A visual impression of the virtual universe from
the EAGLE project,
run with a heavily modified version of the Gadget code. Cosmic gas is
coloured according to temperature, from cold (dark) to very hot
(red). Such simulations often take months to run on thousands of
core. Speeding-up such calculations by an order of magnitude would
represent a step-change in the way cosmologists can understand how
The main bottleneck of such simulations is load imbalance, arising
when calculations on a core depend on those performed on another
core. Such interdepency severely limits strong scaling behaviour, yet
good scaling is a vital requirement as computers become ever more
parallel. Swift also tackles the issue of how to distribute work if
not all cores are equal - as is the case when nodes contain
accelerators. Finally the speed with which cores do work is often
limited by the rate at which data gets fed to it: cache-efficiency of
the code is crucial.
The main design specifications of Swift are:
- Task-based parallelism to exploit shared-memory parallelism. This
provides fine-grained load balancing enabling strong scaling, combined
with mixing communication and computation, both on each node$,1ry(Bs cores
as well as on external devices.
- SIMD vectorization and mixed-precision computation using a
gather-scatter paradigm and the use of single-precision values where
excessive accuracy is unwarranted. This is supported by the underlying
algorithms which attempt to maximize data locality such that
vectorization is even possible, and maximises cache throughput.
- Hybrid shared/distributed memory parallelism, using the task-based
schemes. Parts of the computation are scheduled only once the
asynchronous transfers of the required data have
completed. Communication latencies are thus hidden by computation,
providing for strong scaling across multi-core nodes.
- Graph-based domain decomposition, which uses information from the task
graph to decompose the simulation domain such that the work, as
opposed to just the data, as in other space-filling curve schemes, is
equally distributed amongst all nodes
Figure 2: Strong scaling test of the SWIFT and Gadget-2 on a
cosmological problem with 51×106 particles. The left panel shows the speed-up from one to
1024 cores (linear scale). Perfect scaling is indicated by the dotted
line. Gadget-2 stops scaling when more than 400 cores are used while
SWIFT still speeds-up. The numbers indicate the wallclock time per
time step for both codes. SWIFT is 40x faster than Gadget-2 on that
problem. On one core, SWIFT is 7x faster than Gadget-2.
The right panel shows the corresponding parallel
efficiency. SWIFT presents an efficiency of more than 80% up to 256
cores and of 60% at 1024 cores.
A technical presentation of the SWIFT code at the conference on
Exascale computing held in Ascona 2013 can be found here