Integrating HPC, AI, and Workflows for Scientific Data Analysis
( 27. Aug – 01. Sep, 2023 )
- Rosa Maria Badia (Barcelona Supercomputing Center, ES)
- Laure Berti-Equille (IRD - Montpellier, FR)
- Rafael Ferreira da Silva (Oak Ridge National Laboratory, US)
- Ulf Leser (HU Berlin, DE)
- Michael Gerke (für wissenschaftliche Fragen)
- Susanne Bach-Bernhard (für administrative Fragen)
Modern scientific Big Data analysis builds on three pillars: (i) Workflow technologies to express, steer, and make it reproducible, (ii) machine learning (ML) and artificial intelligence (AI) as fundamental and versatile steps, and (iii) HPC for scalable execution of analytic pipelines over very large data sets. Yet their interplay is complex and under-researched. Scientific workflow systems (SWF) today are used universally across scientific domains for approaching large data analysis problems and have underpinned some of the most significant discoveries of the past decades . Many SWF have significant computational, storage, and communication demands, and thus must execute on a wide range of platforms including high performance computing centers (HPC) and even exascale platforms ; on the other hand, for many researchers SWF are the method of choice to develop HPC applications. In the past 10 years, this interplay of workflow technologies and HPC has been challenged by the fast rise of AI technologies, in particular ML . SWF become ML rich, where models are trained and applied to large data sets, leading to significant resource requirements as available in HPC centers. However, ML-heavy tasks bring new requirements to HPC, such as GPUs or neuro-morphic chips, and the need to support iterative computations. On the other hand, ML techniques more and more invade workflow steering and HPC optimization, especially in scheduling and resource provisioning. This leads to a triple-edged relationship between HPC, workflows, and ML, where each offers important capabilities to the others but also needs to react to new requirements brought by the others .
The above three pillars are researched by communities which are largely separated from each other. However, coping with the current and upcoming large-scale scientific challenges, such as Earth Science, Population Genetics, or Computational Material Science, requires their close interaction for the benefit of the society. Previous attempts to unify the communities were at best bi-directional, ignoring the importance of the interplay of all three factors. For instance, in 2021 some of the organizers of this Dagstuhl Seminar organized/attended a series of virtual events to bring the workflows community together in an attempt to mitigate the proliferation of newly-developed workflow systems, and provide a community roadmap for bringing ML closer to workflows . In this Dagstuhl Seminar, we aim to bring together these three communities to study challenges, opportunities, new research directions, and future pathways at the interplay of SWF, HPC, and ML. In particular, the seminar will focus on the following research questions:
- How can ML technologies be used to improve SWF and PHPC operations, for instance by better scheduling, improved fault tolerance, or energy-efficient resource provisioning?
- How must HPC architectures be adapted to better fit to the requirements of large-scale ML technology, in particular from the field of Deep Learning?
- How must SWF languages and execution systems change to unravel the full power of ML- heavy data analysis on HPC systems?
- What are the most prestigious use cases of ML techniques on HPC, and what specific and currently unmet requirements do they yield?
- What does the stochastic nature of ML affect reproducibility of data analysis on HPC?
To approach these questions, the seminar will follow an innovative “continuous integration” setup where individual contributions of experts are iteratively bundled together to eventually produce a common knowledge framework as a basis for paving a road ahead. We expect the seminar to produce tangible outputs both in terms of joint reports / publications and new international collaborations.
 Liew, C. S., Atkinson, M. P., Galea, M., Ang, T. F., Martin, P., & Hemert, J. I. V. (2016). Scientific workflows: moving across paradigms. ACM Computing Surveys, 49(4).  Badia Sala, R. M., Ayguadé Parra, E., & Labarta Mancho, J. J. (2017). Workflows for science: A challenge when facing the convergence of HPC and big data. Supercomputing frontiers and innovations, 4(1).  Ramirez-Gargallo, G., Garcia-Gasulla, M., & Mantovani, F. (2019). TensorFlow on state-of-the-art HPC clusters: a machine learning use case. Int. Symp. on Cluster, Cloud and Grid Computing.  Ferreira da Silva, R., et al. (2021). A community roadmap for scientific workflows research and development. IEEE Workshop on Workflows in Support of Large-Scale Science.
- Computational Engineering / Finance / and Science
- Distributed / Parallel / and Cluster Computing
- Machine Learning
- Scientific Workflows
- High Performance Computing
- Machine Learning
- Scientific Data Analysis
- Big Data