Dagstuhl Seminar 25181
Learned Predictions for Data Structures and Running Time
( Apr 27 – May 02, 2025 )
Permalink
Organizers
- Inge Li Gørtz (Technical University of Denmark - Lyngby, DK)
- Benjamin J. Moseley (Carnegie Mellon University - Pittsburgh, US)
- Shikha Singh (Williams College - Williamstown, US)
- Sergei Vassilvitskii (Google - New York, US)
Contact
- Andreas Dolzmann (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
Traditionally, the performance of algorithms and data structures is measured in terms of the worst-case analysis of their cost, e.g. running time, approximation ratio or memory usage. Algorithms designed in the worst-case model are optimized to perform well against all instances, even carefully contrived adversarial instances. While the worst-case model leads to extremely strong guarantees, it can be a pessimistic predictor of the algorithm’s performance on instances typically seen in practice. In particular, it is frequently observed that the theoretically best worst-case algorithm may not necessarily be the fastest algorithm in experimental evaluations. An emerging line of work, called algorithms with predictions or learning-augmented algorithms, uses machine learning to design improved algorithms that circumvent worst-case lower bounds. The majority of prior work in this area has focused on improving the quality of the solution (that is, the competitive ratio) of optimization algorithms. How to leverage learned predictions to speed up the running time of optimization algorithms and data structures has received less attention.
Seminar Overview
The goal of this Dagstuhl Seminar was to develop the theoretical foundations of using predictions for faster data structures and optimization. It brought together researchers from different but complementary areas of data structures, combinatorial optimization and learning theory.
The initial days of the seminar consisted of a small number of survey style talks (45 mins) focused on the key areas of learning-augmented and worst-case data structures, machine learning for NP hard problems as well as learning-augmented streaming algorithms. The rest of the schedule included shorter technical presentations, unstructured collaboration time, open-problem working groups as well as group discussions. Overall, the seminar was successful in bridging the gap between the different communities as well as creating new technical collaborations among the participants.
Technical Themes
Overall, the seminar converged on the following key themes.
Beyond-Worst-Case Data Structures. A major theme was how predictions can be used to improve the running time of fundamental data structures such as sorted arrays, priority queues and search trees. The main techniques that were discussed in the talks were: how prediction decomposition can be used for designing learned data structures, how history independence has been surprisingly successful at improving worst-case data structure performance, and how techniques such as compression, sketching and adapting to workload patterns can be leveraged for faster practical performance.
Warm-starting offline optimization. Traditional optimization algorithms assume that problems are solved from scratch each time. However, algorithms are often run on different correlated data sets over time. These prior runs can encode information that can be used to improve run times in the future. An intuitive approach to speed up an algorithm is to start with some previously computed solution, referred to as a warm start. Warm start is commonly used by heuristics to improve empirical performance. This seminar focused on how several recent results that provide a theoretical foundation of warm-start heuristics in the algorithms-with-predictions model.
Dynamic and streaming algorithms with predictions. Predictions can be specially useful in dynamic and streaming algorithms to help with the uncertainty with future inputs. The seminar focused on how predicting future updates can be used to efficiently “lift” fully offline algorithms to the online setting. This technique proved to be specially effective in the area of dynamic graph problems where the existing worst-case algorithms have slow update times (polynomial in input size). Similarly in the streaming setting, predictions about future updates enabled efficient sketching algorithms. Finally, the seminar also focused on how dynamic algebraic algorithms have been leveraged predictions to bypass worst-case lower bounds.
Learned combinatorial optimization. Finally, the seminar focused on how correlations in the input distributions can be exploited to design more efficient data-driven algorithms for optimization problems. These techniques have been applied to a variety of domains such as configuring integer programming algorithms, using heat maps and neural net techniques for improved algorithms for NP hard problems such as TSP, and using predicted distributions for portfolio optimization.
Takeaways and Conclusion. We believe the Dagstuhl Seminar was successful in meeting our as well as the participants expectations. The post-seminar survey was highly positive. Attendees rated the scientific quality of the seminar very highly (median 10/11) and said that it inspired new research ideas, interdisciplinary insights, and potential collaborations, particularly due to its successful integration of communities working on predictions and data structures.
In the future, we would like to include more PhD students as well as start working group activities earlier in the week. Overall, we were very happy with the seminar. We would like to thank the entire Schloss Dagstuhl team for making the seminar possible and ensuring that everything from initial planning, to logistics, meals, etc. were impeccably smooth.
Inge Li Gørtz, Benjamin J. Moseley, Shikha Singh, and Sergei Vassilvitskii
A long-standing question in theoretical computer science is to offer analysis tools for improving performance on typical everyday, non-worst-case instances. A recent model addressing this goal is the algorithms-with-predictions model: each problem instance comes with a possibly error-prone prediction, and the worst-case running time is given as a function of the error in that prediction. This model reflects how recent advances in machine learning are able to make reasonably good predictions on practical ‒ even very complex ‒ datasets.
The algorithms-with-predictions model has been used to give strong approximation-ratio guarantees for fundamental online algorithms like scheduling and caching. How to leverage predictions to speed up running time of offline algorithms and data structures has received less attention.
This Dagstuhl Seminar aims to bring together researchers from the data structures, combinatorial optimization and learned predictions communities to address the challenges of adopting learned predictions for improving running time guarantees. Ideal outcomes include the development of new beyond-worst-case data structures and algorithms, a formal framework to model and analyze predictions, and improved heuristics that bring theory closer to practice.
This seminar will specifically focus on developing foundations for: (1) learned predictions for data structure design, and (2) learned predictions for offline optimization.
Learned Predictions for Data Structure Design. There is strong empirical evidence that augmenting worst-case data structures with machine-learned advice leads to improved running times. These results showcase the potential of learned data structures. This seminar will focus on how to theoretically understand the power and limitations of predictions through the algorithms-with-predictions model. This model has the potential to fundamentally change how the community thinks about designing worst-case data structures ‒ by leveraging predictions that are learnable, incur minimal overhead, and deliver significant speedups on most datasets while simultaneously being robust towards adversarial inputs.
Learned Predictions for Offline Optimization. Traditional running time analysis assumes that problems are solved from scratch. However, algorithms are often run on different data sets over time. These prior runs can encode information that can be used to improve run times in the future. An intuitive approach to speed up an algorithm is to start with some previously computed solution, referred to as a warm start.
Warm-start algorithm design is not new and is commonly used by heuristics. However, most prior work on this topic is empirical. There is a need for a theoretical foundation of warm-start heuristics via the algorithms-with-predictions model.
There has been some preliminary work in the community toward this goal which showcases the potential of the area and serves as a proof-of-concept for other researchers. In particular, recent results show that warm-start can lead to faster algorithms speeding up the Hungarian algorithm for computing weighted bipartite matching. This has since been extended for other graph algorithms. The area of warm-start algorithm design has the potential to grow into an area as deep and interesting as the line of work using predictions online in the competitive analysis model. This seminar will bring together researchers from this area and organize sessions and discussions that build toward this vision.
Inge Li Gørtz, Benjamin J. Moseley, Shikha Singh, and Sergei Vassilvitskii
Please log in to DOOR to see more details.
- Antonios Antoniadis (University of Twente, NL) [dblp]
- Martin Aumüller (IT University of Copenhagen, DK) [dblp]
- Ioana Oriana Bercea (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
- Philip Bille (Technical University of Denmark - Lyngby, DK) [dblp]
- Gerth Stølting Brodal (Aarhus University, DK) [dblp]
- Christian Coester (University of Oxford, GB) [dblp]
- Alexander Conway (Cornell Tech - New York, US) [dblp]
- Michael Dinitz (Johns Hopkins University - Baltimore, US) [dblp]
- Christoph Dürr (Sorbonne University - Paris, FR) [dblp]
- Martin Farach-Colton (NYU - New York, US) [dblp]
- Paolo Ferragina (Sant'Anna School of Advanced Studies - Pisa, IT) [dblp]
- Inge Li Gørtz (Technical University of Denmark - Lyngby, DK) [dblp]
- John Iacono (ULB - Brussels, BE) [dblp]
- Sungjin Im (University of California - Santa Cruz, US) [dblp]
- Riko Jacob (IT University of Copenhagen, DK) [dblp]
- Rob Johnson (Broadcom - San Jose, US) [dblp]
- Hanna Komlós (NYU - New York, US) [dblp]
- László Kozma (FU Berlin, DE) [dblp]
- William Kuszmaul (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Silvio Lattanzi (Google - Barcelona, ES) [dblp]
- Thomas Lavastida (University of Texas - Dallas, US) [dblp]
- Alexander Lindermayr (Universität Bremen, DE) [dblp]
- Quanquan C. Liu (Yale University - New Haven, US) [dblp]
- Samuel McCauley (Williams College - Williamstown, US) [dblp]
- Ulrich Carsten Meyer (Goethe University - Frankfurt am Main, DE) [dblp]
- Benjamin J. Moseley (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Yasamin Nazari (VU Amsterdam, NL) [dblp]
- Kim Thang Nguyen (University of Grenoble, FR)
- Debmalya Panigrahi (Duke University - Durham, US) [dblp]
- Cynthia A. Phillips (Sandia National Labs - Albuquerque, US) [dblp]
- Adam Polak (Bocconi University - Milan, IT) [dblp]
- Rajeev Raman (University of Leicester, GB) [dblp]
- Golnoosh Shahkarami (MPI für Informatik - Saarbrücken, DE)
- Shikha Singh (Williams College - Williamstown, US) [dblp]
- Clifford Stein (Columbia University - New York, US) [dblp]
- Ola Svensson (EPFL - Lausanne, CH) [dblp]
- Ali Vakilian (TTIC - Chicago, US) [dblp]
- Jan van den Brand (Georgia Institute of Technology - Atlanta, US) [dblp]
- Sergei Vassilvitskii (Google - New York, US) [dblp]
- Ellen Vitercik (Stanford University, US) [dblp]
- Sebastian Wild (University of Liverpool, GB) [dblp]
- Chenyang Xu (East China Normal University - Shanghai, CN) [dblp]
- Helen Xu (Georgia Institute of Technology - Atlanta, US) [dblp]
Classification
- Data Structures and Algorithms
Keywords
- Algorithms with predictions
- learning-augmented algorithms
- beyond-worst-case analysis
- data structures
- approximation algorithms

Creative Commons BY 4.0
