- A generalized join algorithm : article in BTW 2011 - Graefe, Goetz - Bonn : Gesellschaft für Informatik e.V., 2011. - pp. 267-286 - (Lecture notes in informatics / P ; 180 : article).
- Metrics for Measuring the Performance of the Mixed Workload CH-benCHmark : article in LNCS 7144: Topics in Performance Evaluation, Measurement and Characterization - Funke, Florian; Kemper, Alfons; Krompass, Stefan; Kuno, Harumi Anne; Nambiar, Raghunath; Neumann, Thomas; Nica, Anisoara; Poess, Meikel; Seibold, Michael - Berlin : Springer, 2012. - pp. 10-30 - (Lecture notes in computer science ; 7144 : pp. 10-30).
- New algorithms for join and grouping operations operations : article pp. 3-27 - Graefe, Goetz - Berlin : Springer, 2012. - pp. 3-27 - (Computer Science - Research and Development ; 27. 2012, 1).
- Plan space analysis : an early warning system to detect plan regressions in cost-based optimizers : article : DBTest '11 - Waas, Florian M.; Giakoumakis, Leo; Zhang, Shin - New York : ACM, 2011 - (Proceedings of the Fourth International Workshop on Testing Database Systems : DBTest 2011 ; Article No. 2).
- Stochastic Database Cracking : Towards Robust Adaptive Indexing in Main-Memory Column-Stores : article pp. 502-513 - Halim, Felix; Idreos, Stratos; Karras, Panagiotis; Yap, Roland H. C. - New York : ACM, 2012. - pp. 502-513 - (Proceedings of the VLDB Endowment ; 5. 2012, 6).
- The mixed workload CH-benCHmark - Cole, Richard; Funke, Florian; Giakoumakis, Leo; Guy, Wey; Kemper, Alfons; Krompass, Stefan; Kuno, Harumi; Nambiar, Raghunath; Neumann, Thomas; Poess, Meikel; Sattler, Kai-Uwe; Seibold, Michael; Simon, Eric; Waas, Florian M. - New York : ACM, 2011 - (Proceedings of the Fourth International Workshop on Testing Database Systems : DBTest 2011 ; Article No. 8).
- Tractor pulling on data warehouses : article : DBTest '11 - Kersten, Martin L.; Kemper, Alfons; Markl, Volker; Nica, Anisora; Poess, Meikel; Sattler, Kai-Uwe - New York : ACM, 2011 - (Proceedings of the Fourth International Workshop on Testing Database Systems : DBTest 2011 ; Article No. 7).
- Visualizing the robustness of query execution : article in 4th Biennial Conference on Innovative Data Systems Research (CIDR) January 4 - 7, 2009, Asilomar, California, USA - Graefe, Goetz; Kuno, Harumi Anne; Wiener, Janet Lynn - www.cidrdb.org, 2009 - (Biennial Conference on Innovative Data Systems Research 2009 ; article).
In the context of data management, robustness is usually associated with recovery from failure, redundancy, disaster preparedness, etc. Robust query processing, on the other hand, is about robustness of performance and scalability. It is more than progress reporting or predictability. A system that predictably fails or obviously performs poorly is somewhat more useful than an unpredictable one, but it is not robust. This is comparable to an automobile that only starts in dry weather: it is predictable but not nearly as useful or robust as a car that starts in any weather.
Robust query processing performance has been a known problem for a long time. It also seems common to most or all database management systems and most or all installations. All experienced database administrators know of sudden disruptions of data center processing due to database queries performing poorly, including queries that had performed flawlessly or at least acceptably for days or weeks.
We believe that a fundamental cause of lack of robustness is that the various stages of database query processing are performed by loosely coupled system components developed, maintained, and studied by largely disjoint cliques of developers and researchers. Only a handful of researchers have established expertise in more than one, or possibly two, areas of query processing. In many industrial database development groups, the query optimizer and executor teams report to different management chains.
Some techniques are meant to alleviate problems of poor performance, e.g., automatic index tuning or statistics gathered and refreshed on-demand. However, they sometime exacerbate the problem. For example, insertion of a few new rows into a large table might trigger an automatic update of statistics, which uses a different sample than the prior one, which leads to slightly different histograms, which results in slightly different cardinality or cost estimates, which leads to an entirely different query execution plan, which might actually perform much worse than the prior one due to estimation errors. Such occasional "automatic disasters" are difficult to spot and usually require lengthy and expensive root cause analysis, often at an inconvenient time.
A frequent cause of unpredictable performance is that compile-time query optimization is liable to suffer from inaccuracy in cardinality estimation or in cost calculations. Such errors are common in queries with dozens of tables or views, typically generated by software for business intelligence or for mapping objects to relational databases. Estimation errors do not necessarily lead to poor query execution plans, but they do so often and at unpredictable times.
Other sources for surprising query performance are widely fluctuating workloads, conflicts in concurrency control, changes in physical database design, rigid resource management such as a fixed-size in-memory workspace for sorting, and, of course, automatic tuning of physical database design or of server parameters such as memory allocation for specific purposes such as sorting or index creation.
Numerous approaches and partial solutions have been proposed over the decades, from automatic index tuning, automatic database statistics, self-correcting cardinality estimation in query optimization, dynamic resource management, adaptive workload management, and many more. Many of them are indeed practical and promising, but there is no way of comparing the value of competing techniques (and they all compete at least for implementation engineers!) until a useful metric for query processing robustness has been defined. Thus, defining robustness as well as a metric for it is a crucial step towards making progress.
Such a metric can serve multiple purposes. The most mundane purpose might be regression testing, i.e., to ensure that progress, once achieved in a code base, is not lost in subsequent maintenance or improvement of seemingly unrelated code or functionality. The most public purpose might be to compare competing software packages in terms of their robustness in query processing performance and scalability as a complement to existing benchmarks that measure raw performance and scalability without regard to robustness.
- Parag Agrawal (Stanford University, US)
- Anastasia Ailamaki (EPFL - Lausanne, CH) [dblp]
- Awny Al-Omari (HP Labs - Austin, US) [dblp]
- Nicolas Bruno (Microsoft Corporation - Redmond, US) [dblp]
- Surajit Chaudhuri (Microsoft Corporation - Redmond, US) [dblp]
- Richard L. Cole (ParAccel Inc. - Cupertino, US) [dblp]
- Amol Deshpande (University of Maryland - College Park, US) [dblp]
- Jens Dittrich (Universität des Saarlandes, DE) [dblp]
- Stephan Ewen (TU Berlin, DE)
- Leo Giakoumakis (Microsoft Corporation - Redmond, US)
- Goetz Graefe (HP Labs - Madison, US) [dblp]
- Wey Guy (Microsoft Corporation - Redmond, US) [dblp]
- Jayant R. Haritsa (Indian Institute of Science, IN) [dblp]
- Stratos Idreos (CWI - Amsterdam, NL) [dblp]
- Ihab Francis Ilyas (University of Waterloo, CA) [dblp]
- Alfons Kemper (TU München, DE) [dblp]
- Martin L. Kersten (CWI - Amsterdam, NL) [dblp]
- Arnd Christian König (Microsoft Corporation - Redmond, US) [dblp]
- Stefan Krompaß (TU München, DE)
- Harumi Anne Kuno (HP Labs - Palo Alto, US) [dblp]
- Wolfgang Lehner (TU Dresden, DE) [dblp]
- Guy Lohman (IBM Almaden Center, US) [dblp]
- Stefan Manegold (CWI - Amsterdam, NL) [dblp]
- Volker Markl (TU Berlin, DE) [dblp]
- Bernhard Mitschang (Universität Stuttgart, DE) [dblp]
- Thomas Neumann (TU München, DE) [dblp]
- Anisoara Nica (Sybase - Waterloo, CA) [dblp]
- Glenn Paulley (Sybase - Waterloo, CA) [dblp]
- Meikel Poess (Oracle Labs., US) [dblp]
- Alkis Polyzotis (University of California - Santa Cruz, US)
- Ken Salem (University of Waterloo, CA) [dblp]
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Harald Schöning (Software AG - Darmstadt, DE) [dblp]
- Eric Simon (SAP BusinessObjects Divison - Levallois-Perret, FR) [dblp]
- Florian M. Waas (EMC Greenplum Inc. - San Mateo, US) [dblp]
- Robert Wrembel (Poznan University of Technology, PL) [dblp]
- Dagstuhl Seminar 12321: Robust Query Processing (2012-08-05 - 2012-08-10) (Details)
- Dagstuhl Seminar 17222: Robust Performance in Database Query Processing (2017-05-28 - 2017-06-02) (Details)
- Dagstuhl Seminar 22111: Database Indexing and Query Processing (2022-03-13 - 2022-03-18) (Details)
- Dagstuhl Seminar 24101: Robust Query Processing in the Cloud (2024-03-03 - 2024-03-08) (Details)
- data bases / information retrieval
- data structures / algorithms / complexity
- optimization / scheduling
- robust query processing
- adaptive query optimization
- query execution