Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Within this website:
External resources:
Within this website:
External resources:
  • the dblp Computer Science Bibliography

Dagstuhl Seminar 23251

Challenges in Benchmarking Optimization Heuristics

( Jun 18 – Jun 23, 2023 )

(Click in the middle of the image to enlarge)

Please use the following short url to reference this page:






The overall objective of the seminar was to explore the possibilities of defining how one can ensure that benchmarking is used to fundamentally advance the field of computational heuristics.

More often than not, in current practice, benchmarks are used to suit the needs of specific authors. That means that benchmark problems, including specific settings for benchmarks - such as how long to run the heuristics or what target performance(s) to achieve - are cherry-picked by authors to make the tested algorithm look good. Moreover, the algorithms used for comparison are often cherry-picked too, and are not always considered state-of-the-art in the field. On the outskirts of our fields, as a consequence, one finds a proliferation of algorithms that have little basis in assumptions on problem structure that may be exploited but rather are vaguely based on biological or physical phenomena.

While benchmarking cannot stop this from happening, it can contribute to what is considered good practice at the core of the field, creating a stable basis for algorithmic advances and ensuring sensible comparisons.

Seminar Structure

The organization of the seminar consisted of a mix of talks based on proposals from participants, discussions organized along breakout sessions, together with presentations that were encouraged by the organizers of the seminar in order to define common grounds among participants from different research fields.


In the various breakout sessions, ways to ensure good practices - in both theory and practice - were discussed for different (types of) problems that one often encounters. For some scenarios, such as the classic single-objective, non-expensive optimization case, advanced discussions led to definitions of ground rules for experimental studies on what is sometimes called ``horse racing'' algorithms. In other scenarios, where arguably comparisons are more difficult, such as multi-objective expensive optimization, discussions were more exploratory, yet key takeaways were formulated to be expanded upon in the future.

Related to this, in various talks, the related concept of landscape analysis has been discussed, potentially providing insights into why certain algorithms work well on certain problems, another hallmark of what we try to achieve through benchmarking. Whereas many of the former aspects are related more to the engineering aspect of algorithmic design, these aspects are more closely related to the scientific aspect of algorithmic design, increasing our understanding of what can and cannot be computed in a certain amount of time. On both sides of the coin, advances were made during the seminar, and bridges were built.

Overall, the seminar brought people closer together, advancing efforts on benchmarking from the fundamental (how) and the importance (why) perspective. The audience's interdisciplinary nature helped define the palette of problems to create benchmarks for and understand different views on the same problem. From the various sessions, it became clear that there are lessons learned already that can inform the future creators of benchmarks to ensure that new benchmarks have added value and help to truly advance the field of optimization heuristics.

Copyright Pascal Kerschke, Anne Auger, Peter A. N. Bosman, and Darrell Whitley


Benchmarking, i.e., determining how well and/or fast algorithms solve particular classes of problems, is a cornerstone of research on optimization algorithms. This is crucial in helping practitioners choose the best algorithm for a real-world application. Although these applications (e.g., from energy, healthcare, or logistics) pose many challenging optimization problems, algorithmic design research typically evaluates performance on sets of artificial problems. Such problems are often designed from a theoretical perspective to study specific properties of algorithm and problem classes. However, some benchmarks lack connections to real-world problems. Consequently, extrapolating meaningful implications for the performance of these algorithms to real-world applications can be difficult. It is therefore critical to ensure that well-known, easily-accessible and frequently-used benchmarks are unbiased (w.r.t. structural properties), but also to ensure that benchmarks are aligned with real-world applications.

On the other hand, literally thousands of papers are published each year making some claim of having developed a better optimization tool for some application domain. And yet, the overwhelming majority of these enhancements are never adequately tested. Indeed, in many cases, it is highly likely that the only individuals who will execute and evaluate a new algorithm are the authors who created it. This is a source of inefficiency when it comes to how research is executed and evaluated in the field of optimization. This problem can only be resolved by using proper benchmarking.

In this context, the present seminar is centered on benchmarking. It brings together a selected list of international experts with different backgrounds and from various application areas (e.g., computer science, machine learning, engineering, statistics, mathematics, operations research, medicine, as well as industrial applications) with the overall objective of

  1. analyzing, comparing and improving the general understanding of the status quo and caveats of benchmarking in different subdomains of optimization and related research fields, and
  2. developing principles for improving our current benchmarks, as well as for designing new, real-world relevant, benchmark problems.

The seminar proposes to address challenges that the current state of benchmarking still faces throughout various areas of optimization-related research. The following (non-exhaustive) list of topics will be at the core of the seminar:

  • the integration and/or inclusion of (ideally, scalable) real-world problems,
  • a benchmark’s capability to expose the generality vs. specificity trade-off of optimization heuristics,
  • approaches to measure and demonstrate algorithmic performance,
  • ways to measure problem characteristics by means of (problem-specific) features, and
  • novel visualization methods, which improve our understanding of the optimization problem itself and/or the behavior of the optimization heuristic that is operating on the problem.

In order to achieve a fruitful and productive collaboration of all participants, this Dagstuhl Seminar will follow a very interactive format (complemented by a very limited number of invited presentations). The main focus will be on spotlight discussions and breakout sessions, while providing sufficient time for informal discussions.

Copyright Anne Auger, Peter A. N. Bosman, Pascal Kerschke, and Darrell Whitley

  • David L. Applegate (Google - New York, US)
  • Anne Auger (INRIA Saclay - Palaiseau, FR) [dblp]
  • Thomas Bäck (Leiden University, NL) [dblp]
  • Carolin Benjamins (Leibniz Universität Hannover, DE)
  • Peter A. N. Bosman (CWI - Amsterdam, NL) [dblp]
  • Carola Doerr (Sorbonne University - Paris, FR) [dblp]
  • Katharina Eggensperger (Universität Tübingen, DE) [dblp]
  • Tobias Glasmachers (Ruhr-Universität Bochum, DE) [dblp]
  • Nikolaus Hansen (INRIA Saclay - Palaiseau, FR) [dblp]
  • Emma Hart (Edinburgh Napier University, GB) [dblp]
  • Holger H. Hoos (RWTH Aachen, DE) [dblp]
  • Pascal Kerschke (TU Dresden, DE) [dblp]
  • Lars Kotthoff (University of Wyoming - Laramie, US) [dblp]
  • Manuel López-Ibáñez (University of Manchester, GB) [dblp]
  • Mariapia Marchi (ESTECO SpA - Trieste, IT) [dblp]
  • Olaf Mersmann (TH Köln, DE) [dblp]
  • Boris Naujoks (TH Köln, DE) [dblp]
  • Gabriela Ochoa (University of Stirling, GB) [dblp]
  • Gorjan Popovski (Jozef Stefan Institute - Ljubljana, SI)
  • Mike Preuß (Leiden University, NL) [dblp]
  • Lennart Schäpermeier (TU Dresden, DE)
  • Ofer M. Shir (Tel-Hai College - Upper Galilee, IL) [dblp]
  • Kate Smith-Miles (The University of Melbourne, AU) [dblp]
  • Olivier Teytaud (Meta AI Research - Tournon-sur-Rhone, FR) [dblp]
  • Dirk Thierens (Utrecht University, NL) [dblp]
  • Tea Tusar (Jozef Stefan Institute - Ljubljana, SI) [dblp]
  • Konstantinos Varelas (Athens, GR) [dblp]
  • Sebastien Verel (Calais University, FR) [dblp]
  • Diederick Vermetten (Leiden University, NL) [dblp]
  • Vanessa Volz ( - Copenhagen, DK) [dblp]
  • Hao Wang (Leiden University, NL) [dblp]
  • Darrell Whitley (Colorado State University - Fort Collins, US) [dblp]
  • Stefan Wild (Lawrence Berkeley National Laboratory, US) [dblp]

  • Data Structures and Algorithms
  • Neural and Evolutionary Computing
  • Performance

  • benchmarking
  • optimization
  • real-world applications
  • design of search heuristics
  • understanding problem complexity