Dagstuhl Seminar 24461
Rethinking the Role of Bayesianism in the Age of Modern AI
( Nov 10 – Nov 15, 2024 )
Permalink
Organizers
- Vincent Fortuin (Helmholtz AI - Neuherberg, DE)
- Zoubin Ghahramani (Google - Mountain View, US)
- Mohammad Emtiyaz Khan (RIKEN - Tokyo, JP)
- Mark van der Wilk (University of Oxford, GB)
Contact
- Marsha Kleinbauer (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
The Dagstuhl Seminar "Rethinking the Role of Bayesianism in the Age of Modern AI" (24461) was convened to explore the contemporary role of Bayesian methods in artificial intelligence, particularly in light of the remarkable advancements in large-scale deep learning. While Bayesian Deep Learning (BDL) holds the promise of addressing key limitations of traditional deep learning, such as uncertainty estimation, encoding prior knowledge, and preventing catastrophic failures, it frequently falls short of its potential in practical applications. This discrepancy arises from several fundamental challenges. These challenges include the difficulty of computing accurate posterior approximations, the scarcity of flexible prior distributions, and the lack of suitable benchmarks for evaluating Bayesian models. Furthermore, misconceptions regarding the scope of Bayesian methods often lead researchers to harbor unrealistic expectations and overlook simpler, non-Bayesian alternatives like bootstrap methods, post-hoc uncertainty scaling, and conformal prediction. Such over-expectations, followed by under-delivery, may cause researchers to lose faith in Bayesian approaches. The central question addressed by the seminar was: In this era of AI where scaling seems to solve many problems, what is the unique role of Bayesian methods? The goal was to redefine the promises and challenges of Bayesian approaches, identify areas where they can outperform non-Bayesian methods, and highlight key application domains where their strengths can be best leveraged. By bringing together researchers from diverse backgrounds, the seminar aimed to chart a path for future research to innovate, enhance, and strengthen the real-world impact of BDL. The seminar recognized that while non-Bayesian methods seem to be solving problems that Bayesians once hoped to solve with Bayesian methods, it was important to re-examine the value and potential of the Bayesian approach.
Structure of the Seminar
The seminar was designed to foster an interactive and collaborative environment, incorporating three distinct types of events: workgroup sessions, guided discussions, and final plenary discussions.
Workgroup Sessions. These sessions revolved around overarching questions pertaining to the main advantages of Bayesian methods, the challenges hindering their adoption, and the practical areas where they can make the most difference. The workgroups always featured between one and three short input talks from different participants, which then informed the subsequent discussions. The workgroups were structured around three main questions:
- What are the main benefits of Bayes that are hard to achieve otherwise?
- What are the most pressing challenges in its adoption?
- What are the most impactful ways for Bayes to make a difference in practice?
Guided Discussions. The guided discussions were designed to examine contentious issues and encourage debate, focusing on three key motions:
- „Bayes' theorem is broken for making predictions with large models“
- „We can build subjective Bayesian priors for NNs that we actually believe in“
- „Bayes is useless if we cannot scale to LLMs“
Final plenary discussions. The final discussions focused on the bigger picture and the next steps for researchers in the field. They centered around three main themes:
- What can we do to encourage researchers to join the BDL community and how can we support and uplift each other within the community?
- How can we measure progress in the field and find promising application areas that would convince practitioners to use Bayesian methods?
- What are some grand long-term challenges for which we could hope Bayesian methods to make a difference and potentially outperform standard deep learning?
Insights from the Working Groups
Talks. The seminar featured a series of presentations covering a wide range of topics related to Bayesian methods. The participants contributed these talks based on a pre-seminar poll regarding the group's interests, which informed the working group's discussions. They included discussions on:
- The distinction between aleatoric and epistemic uncertainty. This included a detailed look at how these terms are often used inconsistently, leading to issues in the literature. The discussion also covered how to estimate these uncertainties in practice and how to best decompose total uncertainty.
- The difference between predictive and parameter uncertainty. The discussion here considered how to search the space of predictions and how to judge explanations without relying on predictions.
- Developing benchmarks for Bayesian methods. This included a discussion on whether current uncertainty measures are useful for model comparison and selection, and whether new benchmarks are needed.
- The roles of prediction and explanation in science. The discussion focused on how machine learning has changed the landscape of prediction and explanation, and the role of Bayesian approaches in these areas.
- Bayesian foundation models. This discussion considered how probabilistic thinking can help us understand foundation models and whether deep learning technologies can help advance probabilistic methods.
- Bayesian Neural Network (BNN) architectures. This included a look at model selection using the marginal likelihood, and whether uncertainty helps to avoid overfitting.
- Pseudo-posteriors. This session explored methods like likelihood tempering and robust loss functions to address model misspecification.
- Bayesian methods for sequential learning. This included discussions of new algorithms for deep learning and how to apply them in dynamic settings.
- The geometry of BNN posteriors. The discussion focused on the challenges for Bayesian inference in deep learning, such as the intractability of posterior distributions and the existence of multiple minima.
- Partial stochasticity in BNNs. This talk explored scalable variational approximations based on subnetworks and whether a fully Bayesian treatment of NNs is necessary.
- Teaching Bayesian ML. This session covered the decisions academics make when teaching Bayesian ML, what to include and what to omit, and the value of diversity in teaching approaches.
- The relationship between Bayesian theory and practice. This presentation explored non-Bayesian justifications for Bayesian updating, the challenges in modeling complex data, and the value of trying out models to see which ones work best.
Together these workgroup sessions yielded important insights into the potential and challenges associated with Bayesian methods along the three main themes of the seminar:
Benefits of Bayes. Participants highlighted several core benefits, including the ability to quantify uncertainty, update models, perform model selection, and obtain improved point estimates. The quantification of uncertainty was noted as a key advantage, although it was admitted that it can sometimes be achieved by other means. In contrast, model updating was seen as a critical unique benefit, allowing models to adapt to new data without complete retraining.
Challenges in Adoption. Significant challenges were identified, particularly in the areas of scalability, and prior and model misspecification. These challenges pose barriers to the wider adoption of Bayesian methods. Scalability was a significant concern, as many Bayesian methods are computationally expensive. Prior misspecification was also identified as a major issue, as it can bias the results negatively and hamstring many of the benefits of the Bayesian approach. Finally, model misspecification also presents problems, as many models do not perfectly fit real-world data.
Impactful Applications. Sequential learning was emphasized as an area where Bayesian methods have the potential to make a substantial impact. The ability of Bayesian methods to update beliefs over time and adapt to new data makes them well-suited to sequential learning tasks.
Insights from the Guided Discussions
The guided discussions brought to light differing opinions and perspectives on critical issues within the Bayesian community.
Bayes' Theorem and Large Models. A central debate revolved around the applicability of Bayes' theorem to large models. The „pro“ side contended that while mathematically sound, the epistemological assumptions of Bayes' theorem do not translate well to complex neural networks (NNs). They argued that NNs lack clearly defined priors. Furthermore, they noted that simpler, more direct methods like point estimates or conformal predictions are often more cost-effective and practical. The „con“ side, however, argued that any limitations are due to implementation issues and not the theorem itself. They also noted the value of Bayesian methods when fine-tuning models with small datasets, emphasizing that Bayes provides a flexible framework. The core of the debate was whether the practical constraints of large models should limit the application of Bayesian methods, or whether the flexibility of Bayesian approaches could be adapted to these large models. This discussion highlighted the need for a nuanced understanding of the strengths and limitations of Bayesian methods in different contexts.
Subjective Bayesian Priors. The discussion on subjective priors for NNs explored the significance of priors, particularly for out-of-distribution data, and the difficulties in defining them effectively. Some participants emphasized that priors should be based on domain expertise, while others questioned the mathematical basis for using subjective priors on neural networks. The discussion highlighted the challenge of balancing subjective knowledge with the need for mathematical rigor. It was also noted that priors on function spaces might be easier to specify than priors on model parameters, and that designing priors to bias solutions toward the data was an area worth exploring.
Scaling to LLMs. A significant point of contention was whether Bayesian methods are still relevant if they cannot scale to LLMs. The „pro“ side argued that the need for scalability to LLMs is paramount for Bayes to stay relevant in the field. The „con“ side countered that Bayes should not be limited to large models; it also plays a crucial role in small-data problems and scientific experiments. It was suggested that LLMs themselves could be used as priors and diffusion models as inference algorithms, highlighting the possibility of using modern AI tools within a Bayesian framework. This debate emphasized the need to re-evaluate the role of Bayesian methods in the context of rapidly advancing AI technologies, and whether the Bayesian approach can be adapted to new tools.
Insights from the Final Discussions
The concluding discussions synthesized the key findings from the seminar and outlined future directions for the community.
Community Building. There was a strong consensus on the need to foster inclusivity within the Bayesian community, encompassing all levels of seniority, as well as industry and academia. It was stressed that a positive outlook on the Bayesian toolkit in reviews was also crucial. The community should view Bayesian methods as a set of useful tools, rather than as a rigid ideology. The importance of mentorship and support for junior researchers was also noted, as well as the value of bringing in people who may be implicitly Bayesian without realizing it.
Benchmarks and Applications. Participants emphasized the importance of moving beyond traditional vision-based benchmarks to include decision-making and sequential learning tasks. The community needs to focus on identifying applications that highlight the unique advantages of Bayesian methods and create tools that can be used in impactful applications. The use of scoring rules for decision-making was also suggested, as it allows for a clear understanding of the value of improvements and highlights utility in downstream decisions as the key metric for the success of predictive systems. The discussion also highlighted the need to consider applications relevant to the current state of AI and other sciences, rather than relying on past applications.
Grand Challenges. Discussions on grand challenges included developing a Bayesian equivalent of AlphaFold, addressing the ARC challenge, and incorporating LLMs as priors. Data efficiency was highlighted as a key strength of Bayesian methods, with the potential to significantly reduce the amount of data required for training. The group also raised important questions about the nature of reasoning and compositionality in models, as well as the challenge of building robust and trustworthy AI systems. The need for causal inference was also noted as critical for many real-world applications. The discussion also covered the possibility of using LLMs to learn structured models and programs.
Next Steps
The seminar concluded with the identification of several concrete steps to advance the field of Bayesian deep learning.
Benchmarks. The community should develop benchmarks that are challenging for deep learning but can be addressed using Bayesian methods, with a focus on sequential and active learning. Data efficiency should also be a focus when creating benchmarks. Furthermore, existing benchmarks should be evaluated for adaptation, especially those that move beyond vision-based tasks.
Research. Future research should move beyond traditional likelihood metrics and instead prioritize posterior predictive checks to ensure that models are making good predictions. Researchers should also seek to communicate the importance of decision outcomes and ensure that the metrics align with practical goals. There was also a call to focus on the principles behind Bayesian methods rather than just scaling, and to allow for alternative Bayesian inference frameworks (such as the martingale posterior).
Organization. There is a need to establish a benchmark track at the yearly Symposium on Advances in Approximate Bayesian Inference (AABI), as well as to continue fostering connections within the Bayesian deep learning community through communicative tools (e.g., slack, Notion). The community should also explore the possibility of creating a Bayesian summer school and a virtual seminar, and to seek integration with with the International Society for Bayesian Analysis (ISBA), possible through the foundation of a Bayesian deep learning chapter. There is also a desire to create a yearly Bayesian AI event to foster community. Furthermore, there was a call to share teaching resources to help standardize and improve international higher education in Bayesian machine learning.

Despite the recent success of large-scale deep learning, these systems still fall short in terms of their reliability and trustworthiness. They often lack the ability to estimate their own uncertainty in a calibrated way, encode meaningful prior knowledge, avoid catastrophic failures, and also reason about their environments to avoid such failures. Since its inception, Bayesian deep learning (BDL) has harbored the promise of achieving these desiderata by combining the solid statistical foundations of Bayesian inference with the practically successful engineering solutions of deep learning methods. This was intended to provide a principled mechanism to add the benefits of Bayesian learning to the framework of deep neural networks.
However, compared to its promise, BDL methods often do not live up to the expectation and underdeliver in terms of real-world impact. This is due to many fundamental challenges related to, for instance, computation of approximate posteriors, unavailability of flexible priors, but also lack of appropriate testbeds and benchmarks. To make things worse, there are also numerous misconceptions about the scope of Bayesian methods, and researchers often end up expecting more than what they can get out of Bayes. They can also ignore other simpler and cheaper non-Bayesian alternatives such as the bootstrap method, post-hoc uncertainty scaling, and conformal prediction. Such overexpectation followed by an underdelivery can lead researchers to lose faith in the Bayesian ways, something we ourselves have witnessed in the past.
So, what exactly is the role of Bayes in this modern day and age of AI where many of the original promises of Bayes are being (or at least seem to be) unlocked simply by scaling? Non-Bayesian approaches appear to solve many problems that Bayesians once dreamt of solving using Bayesian methods. We thus believe that it is timely and important to rethink and redefine the promises and challenges of Bayesian approaches; and also to elucidate which Bayesian methods might prevail against their non-Bayesian competitors; and finally identify key application areas where Bayes can shine.
By bringing together researchers from diverse communities, such as machine learning, statistics, and deep learning practice, in a personal and interactive seminar environment featuring debates, round tables, and brainstorming sessions, we hope to discuss and answer these questions from a variety of angles and chart a path for future research to innovate, enhance, and strengthen meaningful real-world impact of Bayesian deep learning.

Please log in to DOOR to see more details.
- Laurence Aitchison (University of Bristol, GB) [dblp]
- Alexander A. Alemi (Kissimmee, US) [dblp]
- Pierre Alquier (ESSEC Business School - Singapore, SG) [dblp]
- Julyan Arbel (INRIA - Grenoble, FR) [dblp]
- Thang Bui (Australian National University - Acton, AU)
- Kamélia Daudel (ESSEC Business School - Cergy Pontoise, FR)
- Gintare Karolina Dziugaite (Google DeepMind - Toronto, CA) [dblp]
- Carl Henrik Ek (University of Cambridge, GB) [dblp]
- Maurizio Filippone (EURECOM - Biot, FR) [dblp]
- Katharine Fisher (MIT - Cambridge, US)
- Vincent Fortuin (Helmholtz AI - Neuherberg, DE) [dblp]
- Pablo García Arce (Institute of Mathematical Sciences - Madrid, ES)
- Erin Grant (University College London, GB) [dblp]
- Philipp Hennig (Universität Tübingen, DE) [dblp]
- Alexander Immer (Bioptimus - Zürich, CH) [dblp]
- Desi Ivanova (University of Oxford, GB)
- Theofanis Karaletsos (Paramid, US) [dblp]
- Mohammad Emtiyaz Khan (RIKEN - Tokyo, JP) [dblp]
- Jeremias Knoblauch (University College London, GB) [dblp]
- Yingzhen Li (Imperial College London, GB) [dblp]
- Thomas Möllenhoff (RIKEN - Tokyo, JP) [dblp]
- Kevin Murphy (Google DeepMind - Mountain View, US) [dblp]
- Eric Nalisnick (Johns Hopkins University - Baltimore, US) [dblp]
- Roi Naveiro Flores (CUNEF University - Madrid, ES)
- Theodore Papamarkou (Zhejiang Normal University - Jinhua, CN)
- Guiomar Pescador Barrios (Imperial College London, GB)
- Tom Rainforth (University of Oxford, GB) [dblp]
- Daniel Roy (University of Toronto, CA) [dblp]
- Tim Rudner (New York University, US) [dblp]
- Maja Rudolph (University of Wisconsin - Madison, US) [dblp]
- David Rügamer (LMU München, DE)
- Jan-Willem van de Meent (University of Amsterdam, NL) [dblp]
- Tycho van der Ouderaa (University of Oxford, GB)
- Mark van der Wilk (University of Oxford, GB) [dblp]
- Mariia Vladimirova (Criteo - Paris, FR) [dblp]
- Florian Wenzel (Mirelo AI - Tübingen, DE) [dblp]
- Sinead Williamson (Apple - Seattle, US) [dblp]
- Andrew G. Wilson (New York University, US) [dblp]
Classification
- Artificial Intelligence
- Machine Learning
Keywords
- Bayesian machine learning
- Deep learning
- Foundation models
- Uncertainty estimation
- Model selection