https://www.dagstuhl.de/20391

September 20 – 25 , 2020, Dagstuhl Seminar 20391

Database Indexing and Query Processing

Organizers

Renata Borovica-Gajic (The University of Melbourne, AU)
Goetz Graefe (Google – Madison, US)
Allison Lee (Snowflake – San Mateo, US)
Caetano Sauer (Tableau – München, DE)
Pinar Tözün (IT University of Copenhagen, DK)

For support, please contact

Annette Beyer for administrative matters

Michael Gerke for scientific matters

Documents

Dagstuhl Seminar Schedule (Upload here)

(Use personal credentials as created in DOOR to log in)

Motivation

Following up on earlier Dagstuhl Seminars on robust performance of database query processing, a new Dagstuhl Seminar in 2020 will discuss and advance multiple topics in database query processing. We hope to achieve mutual education as well as concrete solutions for specific hard problems, and possibly publications based on the seminar and collaboration initiated during the seminar.

In our selection of topics, we focus on problems that are hard, relevant, and unsolved throughout academic research and industrial development. Technical topics of particular interest include:

  1. Robust query performance: resource policies, algorithms, data structures, query execution plans, and query optimization – with specific focus on dynamic sequences of multiple joins and on skew and load balancing in parallel systems.
  2. Sort-based versus hash-based query processing – a question decided in many minds but there are new techniques to consider. For example, pause-and-resume and restart-after-failure are important in highly parallel systems, and waste-free designs may require sorted intermediate results. For another example, storage structures and intermediate results sorted on hash values could combine the advantages of traditional indexes and of traditional hash-based query processing.
  3. Columns, rows, or clusters as storage formats and as intermediate results – with row storage widely favored for transaction processing and traditional line-of-business applications, column storage may or may not hold up to critical inspection and deep optimization of row storage including compressed indexes as well as advanced sorting and merging of index contents.
  4. Modern hardware: accelerators, memory & storage hierarchies – two mostly independent topics, with hardware accelerators continuing a promising opportunity mostly unused in industrial practice and with deep hierarchies of memory and storage hardware a practical reality not fully addressed and exploited in most database research or products. Dedicated instructions already speed up compression, encryption, transactional memory, and sorting, e.g., priority queues and string comparisons.
  5. Compilation, vectorization, or normalized keys – diverging but nonetheless strong opinions and beliefs notwithstanding, we should instead design and optimize hybrid systems that combine the techniques’ advantages. Some initial architectures already exist, e.g., deep compilation of query execution plans while interpreted query execution already begins. With luck, we can integrate all three of these promising techniques for high-bandwidth query execution over large databases.
  6. Stream processing, stream indexing – deferred maintenance and incremental optimization of derived data like Vertica’s write-optimized storage can be taken much further, with log-structured merge-forests and stepped merging widely used in industrial key-value stores but still far from optimal in the three crucial dimensions of insertion (information capture) bandwidth, query efficiency and query performance, and insertion-to-query latency.

As seminar outcomes, we hope to advance the state of the art as well as educate all seminar participants on both current technologies and new ideas. We will structure our activities at Dagstuhl in such a way that each group and individual leaves with the possibility of publishing their results.

Motivation text license
  Creative Commons BY 3.0 DE
  Renata Borovica-Gajic, Goetz Graefe, Allison Lee, Caetano Sauer, and Pinar Tözün

Dagstuhl Seminar Series

Classification

  • Data Bases / Information Retrieval

Keywords

  • Database
  • Query
  • Optimization
  • Execution
  • Hardware
  • Performance

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.