Dagstuhl Seminar 25231: Certifying Algorithms for Automated Reasoning

Dagstuhl Seminar 25231

Certifying Algorithms for Automated Reasoning

( Jun 01 – Jun 06, 2025 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/25231

Organizers

Nikolaj S. Bjørner (Microsoft - Redmond, US)
Marijn J. H. Heule (Carnegie Mellon University - Pittsburgh, US)
Daniela Kaufmann (TU Wien, AT)
Jakob Nordström (University of Copenhagen, DK & Lund University, SE)

Contact

Michael Gerke (for scientific matters)
Jutka Gasiorowski (for administrative matters)

Summary

Show Summary

Automated reasoning has been widely adopted over the last decades for developing formally verified software and also in the context of combinatorial optimization. The foundation is built on automated deduction algorithms that are used for determining the satisfiability of propositional logic, first-order logic, and arithmetic formulas.

Algorithms for determining the validity of Boolean satisfiability (SAT), mixed integer programming (MIP), satisfiability modulo theories (SMT), and first-order automated theorem proving (ATP) formulas are integral components in verification tools. The question of how to certify correctness of the conclusions reached by such reasoning algorithms has received long-running attention, since in industrial formal verification they form the trusted base for correctness of safety-critical applications such as control systems for trains and airplanes.

Another application of automated reasoning is in combinatorial optimization, which studies problems where solutions are constructed by combining objects, but where the supply of objects is limited and there are constraints how they can be utilized together. Combinatorial optimization problems are encountered in a multitude of practical scenarios including logistics, scheduling, and disaster management. Here reasoning algorithms are employed not only to establish when a feasible solution is separated from non-feasible bounds, but they are also deeply integrated within combinatorial optimization solvers that can provide guaranteed optimal answers. The computational complexity of solving combinatorial optimization problems, which are typically harder than NP-complete, as well as the complexity of implementing sophisticated combinatorial solvers in practice, provide major challenges.

Algorithms used in symbolic solvers are often stunningly powerful in practice, and are today used routinely to solve large-scale real-world problems in a wide range of application areas. But the “dirty little secret” is that the solvers are sometimes wrong. It is well documented that even the best constraint programming (CP) and mixed integer programming (MIP) solvers sometimes return “solutions” that do not satisfy the constraints, or erroneously claim optimality, and that verification tools can erroneously claim a set of constraints is infeasible when, on the contrary, it has a solution. Also, in more complex scenarios, where solvers are used to solve subproblems, even seemingly innocuous off-by-one mistakes can snowball into huge overall errors.

Dealing with errors in software is, of course, not a new problem. The traditional method to discover and eliminate bugs is software testing. However, while substantial progress has been made recently on powerful so-called fuzzing-based tools applied to symbolic solvers, they cannot offer any guarantees that results produced by a solver are correct. It is an inherent limitation that testing can only reveal the presence of bugs, but never prove their absence.

Another very appealing approach is formal verification. This means that one writes down a formal, mathematical, specification of how the solver should work, and then provides a proof that the solver adheres to this specification. The main obstacle for this method is that the advanced techniques used in state-of-the-art solvers go far beyond what formal verification can currently handle. And even a fully verified solver cannot deal with the problem of incorrect results arising from hardware failures, which are unavoidable in large-scale computations.

Thus, the state of the art when it comes to verifiably correct automated reasoning is that this is a well-recognized problem that has remained without convincing solutions.

The Main Focus of This Seminar

This seminar focused on what currently appears to be the most promising way to eliminate errors in automated reasoning algorithms, namely proof logging. This means turning solvers into certifying algorithms in the sense of [1, 12] by having them output not only an answer but also a simple, machine-verifiable proof that this answer is correct. With such a solver, the workflow becomes as follows (see also Figure 1):

Run the solver on a problem to obtain an answer together with a proof.
Feed the problem, the answer, and the proof to a special computer program, called a proof checker.
Accept the answer if the proof checker says that the proof is valid.

Figure 1 Schematic workflow for solver with proof logging.

For this to be feasible, the proof format needs to be sufficiently powerful, so that the solver can generate concise proofs even for sophisticated reasoning without incurring any serious overhead in running time. But the proof format should also be very simple, so that checking correctness becomes almost trivial -- the point is that the proof checker, in contrast to the solver, should be so easy to code up that we can be confident that it is correct. Clearly, there is a conflict between expressivity and simplicity here. Perhaps asking for both at the same time is a little bit too good to be true? This tension between succinct proofs and easy verification goes to the heart of proof logging, and discussions of different ways of managing this trade-off was one of the key topics of the seminar.

One example of an approach that has so far been found to be unsatisfactory are methods in constraint programming using explanations [13, 15, 5], which essentially boils down to writing out reasons for solver conclusions trusting that these reasons are correct. The problem is that this means that proofs are so expressive that they cannot be efficiently verified by simple proof checking algorithms.

A much more successful approach is the DRAT proof system [9, 8, 16], which has become standard in SAT solving. This proof system is simple enough that the proof checker can even be implemented as a formally verified piece of software [4, 11], meaning that the full power of formal methods can be harnessed to guarantee correctness of the result produced by the solving algorithm. The crucial change of perspective making this possible is that the guarantee is not that the algorithm is correct, but that the answer found by the algorithm is. And, in some sense, this even goes further than formally verified software in two important ways. Firstly, formally verified proof checking makes it possible to detect errors even if they are not due to faults in the solver but are caused by a buggy compiler, faulty hardware, or even cosmic rays during solver execution. Secondly, formally verified proof checking can allow us to fully trust the results even from buggy solvers. If the proof generated by a concrete algorithm execution checks out, then we can be fully confident that this particular computation reached the correct result, regardless of what bugs might be triggered for other inputs. It seems fair to argue that for reasons like this, SAT proof logging is perhaps the most successful showcase of certifying algorithms for computationally challenging problems to date, and for this reason it was natural to survey this area and discuss how similar advances could be made in other areas of automated reasoning.

However, when one tries to extend proof logging to stronger optimization paradigms such as maximum satisfiability (MaxSAT), pseudo-Boolean optimization, CP, and MIP, or other areas of automated reasoning such as automated planning and SMT solving, the conflict between expressivity and simplicity reasserts itself. The clean and efficient reasoning in terms of disjunctive clauses used in DRAT seems poorly suited to capture reasoning about more complex objects like constraint programming propagators. Some of the other reasoning performed in more advanced solvers also seems hard to express in terms of the disjunctive clauses used in DRAT, and this limitation also makes it hard argue about, e.g., values of objective functions in optimization problems. At a high-level, the reason that this is a highly nontrivial challenge is that the stronger the solving techniques used, the harder it becomes to design simple proof logging methods that can efficiently certify the correctness of these techniques. At the same time, the fact that CP, MIP, and SMT solvers are so fiendishly complex only makes the need for proof logging methods in these settings even more urgent.

In SMT solving, the most popular approach to date seems to be to design very expressive proof systems that can capture all the different theories considered and their combinations. One downside with this that the proof systems become extremely complex, meaning that not only does the proof checking algorithm have to be very involved, but it is even a highly nontrivial task to even decide whether the proof system itself is consistent. Another direction, which has recently been pursued in the context of CP proof logging, is to compile all information about the input problem down to a simpler format in a trusted (or formally verified) way, and then mirror the reasoning performed in the solver by a proof in this simpler format. During the seminar, different subcommunities in automated reasoning were able to exchange experiences and best practices for these and other proof logging approaches for automated reasoning paradigms beyond SAT.

Further Topics Discussed During the Seminar

While the initial motivation for proof logging techniques is that they provide a way of ensuring the validity of outputs from complex algorithms, discussions at the seminar ranged over a number of topics that went beyond just providing formal certificates of correctness for answers produced by automated reasoning algorithms. Several participants of the seminar highlighted that proof logging can also be used as a tool for debugging during software development. Bugs that only very rarely affect the final result can be next to impossible to discover, but with proof logging switched on, it is easy to detect that the algorithm is performing unsound reasoning even when the output happens to be correct (as shown in, e.g., [6, 7, 10, 3]). It is also worth noting that it simplifies test case generation during debugging. There is no need to know what the correct output is, and instead testing can be done by checking the proof log.

Designing proof logging for a concrete algorithm typically involves describing in a formal proof system how the algorithm works, so that different reasoning steps can be written down as rule applications in the proof system. This type of analysis can also be quite helpful for algorithm design in that it can identify limitations in the current implementation of an algorithm and uncover potential for further improvements (if the proof system suggest that more powerful reasoning steps could be applied than what the algorithm actually does). Furthermore, since proof logging allows us to “peek inside” the algorithm, as it were, to get detailed information about what reasoning steps were performed, this provides a new tool for in-depth, scientifically rigorous, performance analysis.

Going further, it can be noted that proof logging by its very nature enables auditability, since once an algorithm execution has finished we can save the problem, answer, and proof for posterity so that it can be verified at any time by a third party, even if this third party has no access to the original algorithm used to solve the problem Also, the fact that a proof for an optimization problem shows in a formal, mathematical, sense why a solution is correct, and/or why no better solution is possible, means that proof logging can serve as a stepping stone towards explainability, which is a topic of growing importance in the context of artificial intelligence (AI). These and other aspects were discussed in presentation by participants and also during a dedicated panel discussion towards the end of the seminar week.

The topics outlined above are well-represented in the research pursued by the researchers who participated in the seminar. In the SAT community, Armin Biere, Katalin Fazekas, Mathias Fleury, Marijn Heule, Adrián Rebola-Pardo and others have developed proof systems based on DRAT and extensions. Efficient and formally verified proof checkers for such proof systems have been built by Magnus Myreen and Yong Kiam Tan. Proof logging techniques for combinatorial optimization paradigms beyond SAT have been successfully pursued by Jeremias Berg, Bart Bogaerts, Wietze Koops, Ciaran McCreesh, Matthew McIlree, Jakob Nordström, Andy Oertel, and their collaborators. For mixed integer programming, Ambros Gleixner has done important exploratory work on proof logging for perhaps the most wellknown open-source MIP solver SCIP together with collaborators. Certification of SMT solvers has received long-running attention, including work on Z3 by Nikolaj Bjørner and cvc5 by Haniel Barbosa, Bruno Dutertre, Hanna Lachnitt, and Andrew Reynolds together with collaborators. Many of these researchers gave presentations of their work where they also identified important future challenges.

In the context of automated theorem proving (ATP) for first-order logic, there are natural connections between formats for certifying deductions of ATP systems on the one hand and proof logging and model verification on the other hand, but also formidable technical (and organizational) obstacles to wider adoption of such techniques. During the seminar, Geoff Sutcliffe and Michael Rawson presented new results on ATP proof logging and checking. The participants also provided coverage of research on algebraic algorithms (Daniela Kaufmann), quantified Boolean formula solving (Martina Seidl), automated planning (Malte Helmert and Tanja Schindler), hardware verification (Dirk Beyer and Randal Bryant), hybrid systems (Erika Ábrahám), and several other topics related to automated reasoning.

Outcomes

The seminar aimed to advance the state of the art in the integration of proof logging with symbolic solvers, and to establish deeper contacts between different research communities working on certifying algorithms where interaction has previously been quite limited or non-existent. The intention was to achieve this broad goal by assembling stakeholders in Boolean satisfiability (SAT) solving, constraint programming (CP), Mixed integer programming (MIP), satisfiability modulo theories (SMT) solving, automated theorem proving (ATP), and other closely related communities, including leading researchers in the areas of solver development, deployment of solver tools in applications, and design of proof logging techniques.

Concretely, the seminar aimed to:

Connect automated-reasoning experts from the different domains around proof logging techniques.
Infuse the communities with new insights into the practical integration of proof logging and methods to develop formally verified proof checkers.
Facilitate technology transfer between different research areas in automated reasoning, in particular, concerning techniques for certifying correctness.

Going by the evaluations, the seminar was very successful in reaching these goals. It is our hope that this seminar will turn out to be only the first in a series of seminars dedicated to the important topic of certifying algorithms for automated reasoning. In the longer perspective, our vision is that such a series of seminars would contribute to a fundamental shift in how the computer science community thinks about algorithms, so that in the future algorithms will be expected to not just produce output but to prove that this output is in fact correct.

Seminar Structure

The scientific program of the seminar consisted of 30~presentations. Among these there were eleven 50-minute surveys of different core topics of the seminar. These talks occupied most of the morning schedule Monday-Wednesday, and were intended to make sure that the diverse audience would have a bit of a common background for the more technical talks reporting on recent progress and/or ongoing research. The list of survey talks and speakers were as follows:

Certified SAT solving (Katalin Fazekas)
Certified subgraph solving (Ciaran McCreesh)
Certified constraint programming (Matthew McIlree)
Proof logging for algebraic algorithms (Daniela Kaufmann)
Proof logging for MIP (Ambros Gleixner)
Certified automated planning (Malte Helmert)
Certified SMT solving (Haniel Barbosa)
Certified model counting and knowledge compilation (Randal Bryant)
Certified QBF solving (Martina Seidl)
Certified first-order theorem proving (Michael Rawson)
Formally verified proof checking (Magnus O. Myreen & Yong Kiam Tan)

The rest of the talks were 25-minute presentations on recent research of the participants. The time after lunch each day was left for self-organized collaborations and discussions, and there was no schedule on Wednesday afternoon.

Based on polling of participants during the seminar, it was decided to have a panel discussion on Thursday afternoon. The poll also asked whether an open-problem session should be organized, but the support for this idea was weaker, and several participants emphasized that the seminar program should not be too dense and that the evenings should be left free of any program. Therefore, the organizers decided not to have an open-problem session.

References

Eyad Alkassar, Sascha Böhme, Kurt Mehlhorn, Christine Rizkallah, and Pascal Schweitzer. An introduction to certifying algorithms. it - Information Technology Methoden und innovative Anwendungen der Informatik und Informationstechnik, 53(6):287–293, December 2011.
Haniel Barbosa, Clark W. Barrett, Byron Cook, Bruno Dutertre, Gereon Kremer, Hanna Lachnitt, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Cesare Tinelli, and Yoni Zohar. Generating and exploiting automated reasoning proof certificates. Commun. ACM, 66(10):86–95, 2023.
Jeremias Berg, Bart Bogaerts, Jakob Nordström, Andy Oertel, and Dieter Vandesande. Certified core-guided MaxSAT solving. In Proceedings of the 29th International Conference on Automated Deduction (CADE-29), volume 14132 of Lecture Notes in Computer Science, pages 1–22. Springer, July 2023.
Luís Cruz-Filipe, Marijn J. H. Heule, Warren A. Hunt Jr., Matt Kaufmann, and Peter Schneider-Kamp. Efficient certified RAT verification. In Proceedings of the 26th International Conference on Automated Deduction (CADE-26), volume 10395 of Lecture Notes in Computer Science, pages 220–236. Springer, August 2017.
Nicholas Downing, Thibaut Feydy, and Peter J. Stuckey. Explaining alldifferent. In Proceedings of the 35th Australasian Computer Science Conference (ACSC ’12), pages 115–124, January 2012.
Leon Eifler and Ambros Gleixner. A computational status update for exact rational mixed integer programming. In Proceedings of the 22nd International Conference on Integer Programming and Combinatorial Optimization (IPCO ’21), volume 12707 of Lecture Notes in Computer Science, pages 163–177. Springer, May 2021.
Stephan Gocht, Ross McBride, Ciaran McCreesh, Jakob Nordström, Patrick Prosser, and James Trimble. Certifying solvers for clique and maximum common (connected) subgraph problems. In Proceedings of the 26th International Conference on Principles and Practice of Constraint Programming (CP ’20), volume 12333 of Lecture Notes in Computer Science, pages 338–357. Springer, September 2020.
Marijn J. H. Heule, Warren A. Hunt Jr., and Nathan Wetzler. Trimming while checking clausal proofs. In Proceedings of the 13th International Conference on Formal Methods in Computer-Aided Design (FMCAD ’13), pages 181–188, October 2013.
Marijn J. H. Heule, Warren A. Hunt Jr., and Nathan Wetzler. Verifying refutations with extended resolution. In Proceedings of the 24th International Conference on Automated Deduction (CADE-24), volume 7898 of Lecture Notes in Computer Science, pages 345–359. Springer, June 2013.
Sonja Kraiczy and Ciaran McCreesh. Solving graph homomorphism and subgraph isomorphism problems faster through clique neighbourhood constraints. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI ’21), pages 1396–1402, August 2021.
Peter Lammich. Efficient verified (UN)SAT certificate checking. Journal of Automated Reasoning, 64(3):513–532, March 2020. Extended version of paper in CADE 2017.
Ross M. McConnell, Kurt Mehlhorn, Stefan Näher, and Pascal Schweitzer. Certifying algorithms. Computer Science Review, 5(2):119–161, May 2011.
Olga Ohrimenko, Peter J. Stuckey, and Michael Codish. Propagation via lazy clause generation. Constraints, 14(3):357–391, January 2009.
Alexander Steen, Geoff Sutcliffe, Pascal Fontaine, and Jack McKeown. Representation, verification, and visualization of Tarskian interpretations for typed first-order logic. In LPAR 2023: Proceedings of 24th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 94 of EPiC Series in Computing, pages 369–385. EasyChair, June 2023.
Michael Veksler and Ofer Strichman. A proof-producing CSP solver. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI ’10), pages 204–209, July 2010.
Nathan Wetzler, Marijn J. H. Heule, and Warren A. Hunt Jr. DRAT-trim: Efficient checking and trimming using expressive clausal proofs. In Proceedings of the 17th International Conference on Theory and Applications of Satisfiability Testing (SAT ’14), volume 8561 of Lecture Notes in Computer Science, pages 422–429. Springer, July 2014.

Creative Commons BY 4.0

Nikolaj S. Bjørner, Marijn J. H. Heule, Daniela Kaufmann, and Jakob Nordström

Motivation

Show Motivation

The Dagstuhl Seminar aims to advance the state of the art in the integration of proof logging with symbolic solvers, and to establish deeper contacts between different research communities working on certifying algorithms where interaction has previously been quite limited or non-existent. The intention is to achieve this broad goal by assembling stakeholders in the SAT, CP, MIP, SMT, ATP, and other closely related communities, including leading researchers in the areas of solver development, deployment of solver tools in applications, and design of proof logging techniques. Concretely, the seminar should:

Connect automated-reasoning experts from the different domains around proof-logging techniques.
Infuse the communities with new insights into the practical integration of proof logging and methods to develop formally verified proof checkers.
Facilitate technology transfer between different research areas in automated reasoning; in particular, concerning techniques for certifying correctness.

A backdrop for the seminar is the wide adoption of automated reasoning within the past decades for developing formally verified software and in the context of combinatorial optimization. The foundation is built on automated deduction algorithms that are used for determining the satisfiability of propositional logic (SAT), first-order logic (ATP), and arithmetic formulas (MIP), constraint programming (CP) and satisfiability modulo theories (SMT). Certification of automated reasoning systems, whether for verification or optimization, is vital to form a trusted base for correctness of safety-critical systems such as control systems for trains and airplanes and to ensure guaranteed optimal answers where the cost of low-quality answers can carry a high monetary or societal impact.

Algorithms used in symbolic solvers are often stunningly powerful in practice and are today used routinely to solve large-scale real-world problems in a wide range of application areas. But the “dirty little secret” is that they are sometimes wrong. It is well documented that even the best CP and MIP solvers sometimes return “solutions” that do not satisfy the constraints, or erroneously claim optimality, and that verification tools can erroneously claim a set of constraints is infeasible when, on the contrary, it has a solution. Also, in more complex scenarios, where solvers are used to solve subproblems, even seemingly innocuous off-by-one mistakes can snowball into huge overall errors.

The goal of the seminar is to contribute to a fundamental shift in how the computer science community thinks about algorithms, so that in the future algorithms will be expected to not just produce an output but to prove that this output is in fact correct.

Creative Commons BY 4.0

Nikolaj S. Bjørner, Marijn J. H. Heule, Daniela Kaufmann, and Jakob Nordström

Participants

Show Participants

Erika Abraham (RWTH Aachen University, DE) [dblp]
Markus Anders (RPTU Kaiserslautern-Landau, DE) [dblp]
Franz Baader (TU Dresden, DE) [dblp]
Haniel Barbosa (Federal University of Minas Gerais-Belo Horizonte, BR) [dblp]
Jeremias Berg (University of Helsinki, FI) [dblp]
Dirk Beyer (LMU München, DE) [dblp]
Armin Biere (Universität Freiburg, DE) [dblp]
Nikolaj S. Bjørner (Microsoft - Redmond, US) [dblp]
Bart Bogaerts (KU Leuven, BE) [dblp]
Benjamin Bogø (University of Copenhagen, DK)
Randal E. Bryant (Carnegie Mellon University - Pittsburgh, US) [dblp]
Sam Buss (University of California - San Diego, US) [dblp]
Simon Dold (Universität Basel, CH)
Bruno Dutertre (Amazon Web Services - Santa Clara, US) [dblp]
Katalin Fazekas (TU Wien, AT) [dblp]
Mathias Fleury (Universität Freiburg, DE) [dblp]
Ambros Gleixner (HTW - Berlin, DE) [dblp]
Arie Gurfinkel (University of Waterloo, CA) [dblp]
Malte Helmert (Universität Basel, CH) [dblp]
Marijn J. H. Heule (Carnegie Mellon University - Pittsburgh, US) [dblp]
Mikoláš Janota (Czech Technical University - Prague, CZ) [dblp]
Matti Järvisalo (University of Helsinki, FI) [dblp]
Daniela Kaufmann (TU Wien, AT) [dblp]
Wietze Koops (Lund University, SE & University of Copenhagen, DK)
Konstantin Korovin (University of Manchester, GB) [dblp]
Hanna Lachnitt (Stanford University, US) [dblp]
Ciaran McCreesh (University of Glasgow, GB) [dblp]
Matthew McIlree (University of Glasgow, GB)
Magnus Myreen (Chalmers University of Technology - Göteborg, SE) [dblp]
Jakob Nordström (University of Copenhagen, DK & Lund University, SE) [dblp]
Andy Oertel (Lund University, SE) [dblp]
Albert Oliveras (UPC Barcelona Tech, ES) [dblp]
Michael Rawson (University of Southampton, GB) [dblp]
Joseph Reeves (Carnegie Mellon University - Pittsburgh, US) [dblp]
Andrew Reynolds (University of Iowa - Iowa City, US) [dblp]
Tanja Schindler (Universität Basel, CH) [dblp]
Martina Seidl (Johannes Kepler Universität Linz, AT) [dblp]
Monika Seisenberger (Swansea University, GB) [dblp]
Mate Soos (Ethereum - Berlin, DE) [dblp]
Geoff Sutcliffe (University of Miami, US) [dblp]
Stefan Szeider (TU Wien, AT) [dblp]
Yong Kiam Tan (Institute for Infocomm Research (I2R), A*STAR, Singapore, SG & Nanyang Technological University, Singapore, SG) [dblp]
Dieter Vandesande (VU - Brussels, BE)
Christoph Weidenbach (MPI für Informatik - Saarbrücken, DE) [dblp]

Classification

Artificial Intelligence
Data Structures and Algorithms
Logic in Computer Science

Keywords

Automated reasoning
Combinatorial optimization
Certifying algorithms
Proof logging
Verified computing

Seminar 25231

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 25231

Certifying Algorithms for Automated Reasoning

( Jun 01 – Jun 06, 2025 )

Permalink

Organizers

Contact

Publications

Impacts

Schedule

Summary

The Main Focus of This Seminar

Further Topics Discussed During the Seminar

Outcomes

Seminar Structure

References

Motivation

Participants

Classification

Keywords