- Jessica Montgomery (University of Cambridge, GB)
- Machine Learning for Science: Bridging Data-Driven and Mechanistic Modelling (Dagstuhl Seminar 22382). Philipp Berens, Kyle Cranmer, Neil D. Lawrence, Ulrike von Luxburg, and Jessica Montgomery. In Dagstuhl Reports, Volume 12, Issue 9, pp. 150-199, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)
- The passive symmetries of machine learning - Villar, Soledad; Hogg, David W.; Yao, Weichi; Kevrekidis, George A.; Schölkopf, Bernhard - Cornell University : arXiv.org, 2023. - 14 pp..
- AI for science : an emerging agenda - Berens, Philipp; Cranmer, Kyle; Lawrence, Neil D.; Luxburg, Ulrike von; Montgomery, Jessica - Cornell University : arXiv.org, 2023. - 44 pp..
Today's scientific challenges are characterised by complexity. Interconnected natural, technological, and human systems are influenced by forces acting across time- and spatial-scales, resulting in complex interactions and emergent behaviours. Understanding these phenomena -- and leveraging scientific advances to deliver innovative solutions to improve society's health, wealth, and well-being -- requires new ways of analysing complex systems.
Artificial intelligence (AI) offers a set of tools to help make sense of this complexity. In an environment where more data is available from more sources than ever before -- and at scales from the atomic to the astronomical -- the analytical tools provided by recent advances in AI could play an important role in unlocking a new wave of research and innovation. The term AI today describes a collection of tools and methods, which replicate aspects of intelligence in computer systems. Many recent advances in the field stem from progress in machine learning, an approach to AI in which computer systems learn how to perform a task, based on data.
Signals of the potential for AI in science can already be seen in many domains. AI has been deployed in climate science to investigate how Earth's systems are responding to climate change; in agricultural science to monitor animal health; in development studies, to support communities to manage local resources more effectively; in astrophysics to understand the properties of black holes, dark matter, and exoplanets; and in developmental biology to map pathways of cellular development from genes to organs. These successes illustrate the wider advances that AI could enable in science. In so doing, these applications also offer insights into the science of AI, suggesting pathways to understand the nature of intelligence and the learning strategies that can deliver intelligent behaviour in computer systems.
Further progress will require a new generation of AI models. AI for science calls for modelling approaches that can: facilitate sophisticated simulations of natural, physical, or social systems, enabling researchers to use data to interrogate the forces that shape such systems; untangle complicated cause-effect relationships by combining the ability to learn from data with structured knowledge of the world; and work adaptively with domain experts, assisting them in the lab and connecting data-derived insights to pre-existing domain knowledge. Creating these models will disrupt traditional divides between disciplines and between data-driven and mechanistic modelling.
The roadmap presented here suggests how these different communities can collaborate to deliver a new wave of progress in AI and its application for scientific discovery. By coalescing around the shared challenges for AI in science, the research community can accelerate technical progress, while deploying tools that tackle real-world challenges. By creating user-friendly toolkits, and implementing best practices in software and data engineering, researchers can support wider adoption of effective AI methods. By investing in people working at the interface of AI and science -- through skills-building, convening, and support for interdisciplinary collaborations -- research institutions can encourage talented researchers to develop and adopt new AI for science methods. By contributing to a community of research and practice, individual researchers and institutions can help share insights and expand the pool of researchers working at the interface of AI and science. Together, these actions can drive a paradigm shift in science, enabling progress in AI and unlocking a new wave of AI-enabled innovations.
The transformative potential of AI stems from its widespread applicability across disciplines, and will only be achieved through integration across research domains. AI for science is a rendezvous point. It brings together expertise from AI and application domains; combines modelling knowledge with engineering know-how; and relies on collaboration across disciplines and between humans and machines. Alongside technical advances, the next wave of progress in the field will come from building a community of machine learning researchers, domain experts, citizen scientists, and engineers working together to design and deploy effective AI tools.
Machine learning has the potential to transform research and innovation. Today’s machine learning methods are already being applied to advance the frontiers of science, helping researchers better understand how the world around us works – from interactions between atoms, to the ways that proteins fold, interactions between cells, the dynamics of Earth’s systems and the discovery of exoplanets. These contributions are the foothills of the wider transformation that machine learning could bring for science and the scientific workflow.
Recent successes in the deployment of machine learning for scientific discovery point to the potential of a new generation of machine learning methods for science. These tools would combine data-derived insights with existing domain knowledge or theory, creating more powerful analytical tools. They would enhance researchers’ ability to simulate the systems they study, testing new ideas or identifying new areas for investigation; and they would support researchers to understand not only what patterns can be found in data, but why and how such patterns have emerged.
Creating this new generation of machine learning methods requires further efforts to bridge the current gap between data-driven and mechanistic modelling. Recent successes in the field suggest a route to create these hybrid approaches. Through further development of machine learning approaches that encode domain knowledge in data-driven systems, that enable simulation and emulation of complex real-world systems, and that allow causal inference in data-enabled systems, machine learning research could create more powerful tools for scientific discovery.
This Dagstuhl Seminar will seek to articulate a roadmap for bridging the gap between data-driven and mechanistic modelling approaches. It will consider the lessons that recent work at the interface of machine learning and science provides for the future development of the field, and it will review emerging research directions at this interface. In so doing, it will identify a set of common interests where further research could unlock progress in the use of machine learning for scientific discovery.
Machine learning methods have already been successfully adopted in a variety of scientific domains. This seminar will review recent experiences of – and lessons learned from – efforts to deploy machine learning to advance:
- Healthcare and biomedical sciences including neuroscience
- Climatology and environmental sciences
- Theoretical and experimental physics
By reviewing these recent experiences, the seminar will identify emerging research directions and best practices in:
- Encoding domain knowledge in machine learning systems, reviewing methods for leveraging insights from data while embedding the knowledge contained in mechanistic modelling approaches.
- Simulation and emulation, investigating how innovations in the mathematics of emulation and techniques for understanding uncertainty propagation can support more effective machine learning tool.
- Approaches to causality in machine learning, exploring how techniques from statistical inference and uncertainty quantification can be combined to create a new mathematics of causality.
- Bubacarr Bah (AIMS South Africa - Cape Town, ZA)
- Jessica Beasley (Collective Next - Boston, US)
- Philipp Berens (Universität Tübingen, DE) [dblp]
- Maren Büttner (Helmholtz Zentrum München & Universität Bonn)
- Thomas G. Dietterich (Oregon State University - Corvallis, US) [dblp]
- Carl Henrik Ek (University of Cambridge, GB)
- Asja Fischer (Ruhr-Universität Bochum, DE) [dblp]
- Philipp Hennig (Universität Tübingen, DE) [dblp]
- David W. Hogg (New York University, US)
- Christian Igel (University of Copenhagen, DK) [dblp]
- Samuel Kaski (Aalto University, FI) [dblp]
- Ieva Kazlauskaite (University of Cambridge, GB)
- Hans Kersting (INRIA - Paris, FR) [dblp]
- Niki Kilbertus (TU München, DE & Helmholtz AI München, DE)
- Neil D. Lawrence (University of Cambridge, GB) [dblp]
- Gilles Louppe (University of Liège, BE)
- Jakob Macke (Universität Tübingen, DE)
- Siddharth Mishra-Sharma (MIT - Cambridge, US)
- Jessica Montgomery (University of Cambridge, GB)
- Jonas Peters (University of Copenhagen, DK) [dblp]
- Markus Reichstein (MPI für Biogeochemistry - Jena, DE) [dblp]
- Bernhard Schölkopf (MPI für Intelligente Systeme - Tübingen, DE) [dblp]
- Soledad Villar (Johns Hopkins University - Baltimore, US)
- Ulrike von Luxburg (Universität Tübingen, DE) [dblp]
- Verena Wolf (Universität des Saarlandes - Saarbrücken, DE) [dblp]
- Mauricio A Álvarez (University of Manchester, GB)
- Kyle Cranmer (University of Wisconsin - Madison, US)
- Stuart Feldman (Schmidt Futures - New York, US)
- Vidhi Lalchand (University of Cambridge, GB)
- Dina Machuve (DevData Analytics - A, TZ)
- Eric Meissner (University of Cambridge, GB)
- Aditya Ravuri (University of Cambridge, GB)
- Francisco Vargas (University of Cambridge, GB)
- Artificial Intelligence
- Machine Learning
- Machine learning
- scientific discovery