Dagstuhl Seminar 24101
Robust Query Processing in the Cloud
( Mar 03 – Mar 08, 2024 )
Permalink
Organizers
- Anastasia Ailamaki (EPFL - Lausanne, CH)
- Goetz Graefe (Google - Madison, US)
- Allison Lee (Snowflake - San Mateo, US)
- Caetano Sauer (Salesforce - München, DE)
Contact
- Michael Gerke (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Following up on earlier Dagstuhl Seminars on robust performance of database query processing, this Dagstuhl Seminar aims to discuss and advance multiple topics in database query processing, with a new focus on cloud-computing environments. Query performance and in particular robust, predictable, reliable query performance (“good performance every time”) remains an open issue in research and in products. The problem is simple to describe, expensive to investigate and to alleviate, and well-known for decades: therefore, it must be a hard research problem that must be continuously investigated and reframed as technology develops (it is not a simple industrial development problem). A new dimension of complexity is added to this long-lasting problem with the ever-increasing popularity of cloud computing and the massive migration of database applications and services to this new environment.
As in our previous seminars, we hope to achieve mutual education as well as concrete solutions for specific hard problems, and possibly publications based on the seminar and collaboration initiated during the seminar. To guide the seminar discussion, we propose five technical topics that will potentially seed discussions in disjoint groups of participants:
- Robust query execution of complex join sequences: As a pragmatic solution to the join sequence problem in query optimization, previous instances of this seminar (in particular 22111 and 17222) investigated robust query execution techniques that are able to adaptively choose among different join sequence alternatives during runtime. In this new seminar, we aim to further investigate techniques like multiplexing plans and dynamic reoptimization to alleviate the burden of choosing “the best plan” upfront. In the context of the cloud, the solution space can be taken even further, since the elastic compute resources and disaggregated storage make it more feasible to run alternative query plans in parallel (i.e., “race plans”).
- Robust database maintenance: database system performance is often impacted by large maintenance operations that run concurrently to user queries and transactions (e.g., indexing, schema changes, physical reorganization, and the computation of samples or statistics). Related prior techniques such as adaptive indexing and log-structured merge trees alleviate this problem to some extent, but the overarching issue of providing robust performance while such large maintenance operations are in flight remains an open problem. In a cloud environment, efficient and robust maintenance operations are crucial, because they directly impact vendors’ profits: since users only pay for a given service and attached service-level agreements, well-maintained internal operation can save substantial costs for the vendor.
- Modern hardware: cloud vendors face the challenge of dealing with a heterogeneous set of hardware devices for different ends (e.g., fast SSDs and non-volatile memory, GPUs, FPGAs, and co-processing units), while offering proper abstractions to those resources with higher-level services. Recent database systems research has focused on optimizing software architectures for different hardware combinations, but we believe that little attention has been paid to how these combinations affect cloud services, or, in other words, how cloud providers should exploit and capitalize on modern hardware, e.g., as premium add-ons to existing services. As with other topics proposed in this seminar series, robustness remains an open challenge: how do we design cloud services that adapt to different hardware combinations, without suffering from performance cliffs or instabilities of service?
- Indexing for data analytics: during the previous instance of this seminar (22111), we investigated for the first time the issue of indexing in data warehouses, particularly cloud-based solutions. There are multiple new avenues of exploration to further advance this topic, including partial and adaptive indexing techniques, automatic and workload-driven creation and disposal of index structures, learned indexes, maintenance of partial replicas that accelerate certain scan predicates, new cost models for index access and maintenance, robust tuple layout reorganization, and robust caching of intermediate results. These topics all touch on the issue of how to keep a low ingest-to-query time – which is critical in data warehouses – while still being able to accelerate queries in a robust manner.
- Scheduling and workload management: The problem of delivering robust performance not for a single query in isolation, but to a set of concurrent queries or a given dynamic workload becomes especially challenging in the cloud, with its multiple layers of virtualization, disaggregated storage, service-level agreements, and different profit models (e.g., bill customers for added guarantees vs. save internal costs). There are issues of scheduling special hardware widgets (if they exist, in limited quantity, for temporary exclusive usage), of scheduling processors and memory and I/O resources, of multi-tenancy and resource-isolation among tenants, of foreground and background processing, and perhaps more. In this new seminar, we plan to investigate more robust ways of scheduling workloads and managing resources in cloud-based data services.
- Angelos Christos Anadiotis (Oracle Switzerland - Zürich, CH) [dblp]
- Manos Athanassoulis (Boston University, US) [dblp]
- Carsten Binnig (TU Darmstadt, DE) [dblp]
- Thomas Bodner (Hasso-Plattner-Institut, Universität Potsdam, DE)
- Matthias Böhm (TU Berlin, DE) [dblp]
- Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
- Nicolas Bruno (Microsoft - Redmond, US) [dblp]
- Yannis Chronis (Google - Sunnyvale, US) [dblp]
- Periklis Chrysogelos (Oracle Switzerland - Zürich, CH) [dblp]
- John Cieslewicz (Google - Mountain View, US) [dblp]
- Sudipto Das (Amazon Web Services - Seattle, US)
- Thanh Do (Celonis Inc. - New York, US) [dblp]
- Kira Isabel Duwe (EPFL - Lausanne, CH)
- Jan Finis (Salesforce - München, DE)
- Campbell Fraser (Google - Mountain View, US) [dblp]
- Goetz Graefe (Google - Madison, US) [dblp]
- Stefan Halfpap (Technische Universität Berlin, DE)
- Alfons Kemper (TU München - Garching, DE) [dblp]
- Kyoungmin Kim (EPFL - Lausanne, CH)
- Andrew Lamb (InfluxData - Boston, US)
- Allison Lee (Snowflake - San Mateo, US) [dblp]
- Viktor Leis (TU München - Garching, DE) [dblp]
- Lucas Lersch (Amazon Web Services - East Palo Alto, US) [dblp]
- Boaz Leskes (MotherDuck - Amsterdam, NL)
- Thomas Neumann (TU München - Garching, DE) [dblp]
- Anisoara Nica (SAP SE - Waterloo, CA) [dblp]
- Danica Porobic (Oracle Switzerland - Zürich, CH) [dblp]
- Daniel Ritter (SAP SE - Walldorf, DE)
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Caetano Sauer (Salesforce - München, DE) [dblp]
- Bernhard Seeger (Universität Marburg, DE) [dblp]
- Knut Stolze (Ocient - Jena, DE) [dblp]
- Pinar Tözün (IT University of Copenhagen, DK) [dblp]
- Nga Tran (InfluxData - Boston, US) [dblp]
- Immanuel Trummer (Cornell University - Ithaca, US) [dblp]
- Juliane Waack (Snowflake - Berlin, DE)
- Marcin Zukowski (Snowflake - San Mateo, US) [dblp]
Related Seminars
- Dagstuhl Seminar 10381: Robust Query Processing (2010-09-19 - 2010-09-24) (Details)
- Dagstuhl Seminar 12321: Robust Query Processing (2012-08-05 - 2012-08-10) (Details)
- Dagstuhl Seminar 17222: Robust Performance in Database Query Processing (2017-05-28 - 2017-06-02) (Details)
- Dagstuhl Seminar 22111: Database Indexing and Query Processing (2022-03-13 - 2022-03-18) (Details)
Classification
- Databases
- Distributed / Parallel / and Cluster Computing
- Performance
Keywords
- databases
- cloud computing
- query processing
- indexing
- scheduling