https://www.dagstuhl.de/20101
March 1 – 6 , 2020, Dagstuhl Seminar 20101
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
Organizers
Luc Giraud (INRIA – Bordeaux, FR)
Ulrich Rüde (Universität Erlangen-Nürnberg, DE)
Linda Stals (Australian National University – Canberra, AU)
For support, please contact
Documents
Dagstuhl Report, Volume 10, Issue 3
Aims & Scope
List of Participants
Dagstuhl's Impact: Documents available
Summary
On the path to extreme scale computing, the hardware design must meet stringent requirements to keep the energy consumption of parallel computers at acceptable levels. This technological challenge is tackled by shrinking the electronic devices and reducing the voltage while simultaneously increasing the number of components. Recent studies indicate that such computer systems will become less reliable and some forecasts show that the mean time between failures could be lower than the time to recover from classical checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated.
The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge.
As a major result from the seminar, all of the participants contributed to the following white paper.


Classification
- Data Structures / Algorithms / Complexity
- Modelling / Simulation
Keywords
- Parallel computer architecture
- Fault tolerance
- Checkpointing
- Supercomputer applications