TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 10381

Robust Query Processing

( Sep 19 – Sep 24, 2010 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/10381

Organizers



Impacts

Summary

In the context of data management, robustness is usually associated with recovery from failure, redundancy, disaster preparedness, etc. Robust query processing, on the other hand, is about robustness of performance and scalability. It is more than progress reporting or predictability. A system that predictably fails or obviously performs poorly is somewhat more useful than an unpredictable one, but it is not robust. This is comparable to an automobile that only starts in dry weather: it is predictable but not nearly as useful or robust as a car that starts in any weather.

Robust query processing performance has been a known problem for a long time. It also seems common to most or all database management systems and most or all installations. All experienced database administrators know of sudden disruptions of data center processing due to database queries performing poorly, including queries that had performed flawlessly or at least acceptably for days or weeks.

We believe that a fundamental cause of lack of robustness is that the various stages of database query processing are performed by loosely coupled system components developed, maintained, and studied by largely disjoint cliques of developers and researchers. Only a handful of researchers have established expertise in more than one, or possibly two, areas of query processing. In many industrial database development groups, the query optimizer and executor teams report to different management chains.

Some techniques are meant to alleviate problems of poor performance, e.g., automatic index tuning or statistics gathered and refreshed on-demand. However, they sometime exacerbate the problem. For example, insertion of a few new rows into a large table might trigger an automatic update of statistics, which uses a different sample than the prior one, which leads to slightly different histograms, which results in slightly different cardinality or cost estimates, which leads to an entirely different query execution plan, which might actually perform much worse than the prior one due to estimation errors. Such occasional "automatic disasters" are difficult to spot and usually require lengthy and expensive root cause analysis, often at an inconvenient time.

A frequent cause of unpredictable performance is that compile-time query optimization is liable to suffer from inaccuracy in cardinality estimation or in cost calculations. Such errors are common in queries with dozens of tables or views, typically generated by software for business intelligence or for mapping objects to relational databases. Estimation errors do not necessarily lead to poor query execution plans, but they do so often and at unpredictable times.

Other sources for surprising query performance are widely fluctuating workloads, conflicts in concurrency control, changes in physical database design, rigid resource management such as a fixed-size in-memory workspace for sorting, and, of course, automatic tuning of physical database design or of server parameters such as memory allocation for specific purposes such as sorting or index creation.

Numerous approaches and partial solutions have been proposed over the decades, from automatic index tuning, automatic database statistics, self-correcting cardinality estimation in query optimization, dynamic resource management, adaptive workload management, and many more. Many of them are indeed practical and promising, but there is no way of comparing the value of competing techniques (and they all compete at least for implementation engineers!) until a useful metric for query processing robustness has been defined. Thus, defining robustness as well as a metric for it is a crucial step towards making progress.

Such a metric can serve multiple purposes. The most mundane purpose might be regression testing, i.e., to ensure that progress, once achieved in a code base, is not lost in subsequent maintenance or improvement of seemingly unrelated code or functionality. The most public purpose might be to compare competing software packages in terms of their robustness in query processing performance and scalability as a complement to existing benchmarks that measure raw performance and scalability without regard to robustness.


Participants
  • Parag Agrawal (Stanford University, US)
  • Anastasia Ailamaki (EPFL - Lausanne, CH) [dblp]
  • Awny Al-Omari (HP Labs - Austin, US) [dblp]
  • Nicolas Bruno (Microsoft Corporation - Redmond, US) [dblp]
  • Surajit Chaudhuri (Microsoft Corporation - Redmond, US) [dblp]
  • Richard L. Cole (ParAccel Inc. - Cupertino, US) [dblp]
  • Amol Deshpande (University of Maryland - College Park, US) [dblp]
  • Jens Dittrich (Universität des Saarlandes, DE) [dblp]
  • Stephan Ewen (TU Berlin, DE)
  • Leo Giakoumakis (Microsoft Corporation - Redmond, US)
  • Goetz Graefe (HP Labs - Madison, US) [dblp]
  • Wey Guy (Microsoft Corporation - Redmond, US) [dblp]
  • Jayant R. Haritsa (Indian Institute of Science, IN) [dblp]
  • Stratos Idreos (CWI - Amsterdam, NL) [dblp]
  • Ihab Francis Ilyas (University of Waterloo, CA) [dblp]
  • Alfons Kemper (TU München, DE) [dblp]
  • Martin L. Kersten (CWI - Amsterdam, NL) [dblp]
  • Arnd Christian König (Microsoft Corporation - Redmond, US) [dblp]
  • Stefan Krompaß (TU München, DE)
  • Harumi Anne Kuno (HP Labs - Palo Alto, US) [dblp]
  • Wolfgang Lehner (TU Dresden, DE) [dblp]
  • Guy Lohman (IBM Almaden Center, US) [dblp]
  • Stefan Manegold (CWI - Amsterdam, NL) [dblp]
  • Volker Markl (TU Berlin, DE) [dblp]
  • Bernhard Mitschang (Universität Stuttgart, DE) [dblp]
  • Thomas Neumann (TU München, DE) [dblp]
  • Anisoara Nica (Sybase - Waterloo, CA) [dblp]
  • Glenn Paulley (Sybase - Waterloo, CA) [dblp]
  • Meikel Poess (Oracle Labs., US) [dblp]
  • Alkis Polyzotis (University of California - Santa Cruz, US)
  • Ken Salem (University of Waterloo, CA) [dblp]
  • Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
  • Harald Schöning (Software AG - Darmstadt, DE) [dblp]
  • Eric Simon (SAP BusinessObjects Divison - Levallois-Perret, FR) [dblp]
  • Mike Waas (EMC Greenplum Inc. - San Mateo, US)
  • Robert Wrembel (Poznan University of Technology, PL) [dblp]

Related Seminars
  • Dagstuhl Seminar 12321: Robust Query Processing (2012-08-05 - 2012-08-10) (Details)
  • Dagstuhl Seminar 17222: Robust Performance in Database Query Processing (2017-05-28 - 2017-06-02) (Details)
  • Dagstuhl Seminar 22111: Database Indexing and Query Processing (2022-03-13 - 2022-03-18) (Details)
  • Dagstuhl Seminar 24101: Robust Query Processing in the Cloud (2024-03-03 - 2024-03-08) (Details)

Classification
  • data bases / information retrieval
  • data structures / algorithms / complexity
  • optimization / scheduling
  • interdisciplinary

Keywords
  • robust query processing
  • adaptive query optimization
  • query execution