Dagstuhl Seminar 26152
Causality and Large Language Models: Opportunities to Advance Causal Reasoning
( Apr 07 – Apr 10, 2026 )
Permalink
Organizers
- Dominik Janzing (Amazon Web Services - Tübingen, DE)
- Zhijing Jin (MPI für Intelligente Systeme - Tübingen, DE)
- Amit Sharma (Microsoft Research India - Bangalore, IN)
- Kun Zhang (Carnegie Mellon University - Pittsburgh, US)
Contact
- Michael Gerke (for scientific matters)
- Jutka Gasiorowski (for administrative matters)
In addition to code and language reasoning, recent work shows that large language models (LLMs) can achieve promising results on causal reasoning tasks, e.g. in determining cause-effect relationships. This points to a new paradigm for addressing real-world causal tasks by integrating LLMs in the causal inference workflow, but also raises fundamental questions on causal reasoning capabilities of LLMs. That is, given that LLMs are trained on observational data, do LLMs’ capabilities correspond to causal reasoning or are simply a result of dataset memorization; and how reliable are the reasoning outputs? And if not, how can we use causal techniques to improve LLMs’ reasoning capabilities?
To address these questions, this Dagstuhl Seminar would bring together experts from causal machine learning and language models to foster collaboration and identify the key research questions at the intersection of the fields. Specifically, we will focus on two objectives: 1) How to use LLMs’ world knowledge to assist causal inference workflows; 2) How to use causal principles to improve reasoning of LLMs. Recent work provides motivating evidence for both challenges, but a critical question is how to marry the flexibility and unreliability of LLMs with the rigor of causal reasoning algorithms. We hope that solving these challenges will help in wider adoption of causal reasoning and in turn, ensure reliable answers from AI systems.
Topics to be discussed:
I. Integrating LLMs into causal ML workflows. A longstanding problem in causal ML is how to obtain domain knowledge to guide formal specification (e.g., a causal graph) and estimate quantities of interest. For example, in effect inference, obtaining the underlying causal graph or determining special identifying variables (such as instrumental variables) is a key step before applying causal inference techniques in practice. Moreover, in unstructured data settings such as text or image, extracting the key factors that characterize high-level causal relationships is an important representation learning problem. Recent work shows that LLMs can help with providing such domain knowledge. The first part of the seminar will discuss how LLMs may be incorporated in causal ML algorithms for the four key tasks: effect inference, representation learning, causal discovery, and causal attribution.
II. Evaluating and improving causal reasoning of LLMs. For language models, a key challenge is to improve their reasoning and reliability, including causal reasoning. Recent work shows that incorporating causality, either through providing training data based on causal principles or combining causal tools with LLM generation, can lead to improvement in reasoning capabilities and avoid hallucinations. Thus, an important direction is to develop methods for evaluating causal reasoning and build training algorithms that can significantly improve the reasoning reliability of LLMs. We will discuss the following topics: evaluating (causal) reasoning of LLMs, training algorithms for improving causal reasoning, verification of causal statements from LLMs, and how causal reasoning may improve reliability of LLMs broadly.

Classification
- Artificial Intelligence
- Machine Learning
Keywords
- causal inference
- large language model
- reasoning
- trustworthy AI