Following up on earlier Dagstuhl Seminars on robust performance of database query processing, a new Dagstuhl Seminar in 2020 will discuss and advance multiple topics in database query processing. We hope to achieve mutual education as well as concrete solutions for specific hard problems, and possibly publications based on the seminar and collaboration initiated during the seminar.
In our selection of topics, we focus on problems that are hard, relevant, and unsolved throughout academic research and industrial development. Technical topics of particular interest include:
- Robust query performance: resource policies, algorithms, data structures, query execution plans, and query optimization – with specific focus on dynamic sequences of multiple joins and on skew and load balancing in parallel systems.
- Sort-based versus hash-based query processing – a question decided in many minds but there are new techniques to consider. For example, pause-and-resume and restart-after-failure are important in highly parallel systems, and waste-free designs may require sorted intermediate results. For another example, storage structures and intermediate results sorted on hash values could combine the advantages of traditional indexes and of traditional hash-based query processing.
- Columns, rows, or clusters as storage formats and as intermediate results – with row storage widely favored for transaction processing and traditional line-of-business applications, column storage may or may not hold up to critical inspection and deep optimization of row storage including compressed indexes as well as advanced sorting and merging of index contents.
- Modern hardware: accelerators, memory & storage hierarchies – two mostly independent topics, with hardware accelerators continuing a promising opportunity mostly unused in industrial practice and with deep hierarchies of memory and storage hardware a practical reality not fully addressed and exploited in most database research or products. Dedicated instructions already speed up compression, encryption, transactional memory, and sorting, e.g., priority queues and string comparisons.
- Compilation, vectorization, or normalized keys – diverging but nonetheless strong opinions and beliefs notwithstanding, we should instead design and optimize hybrid systems that combine the techniques’ advantages. Some initial architectures already exist, e.g., deep compilation of query execution plans while interpreted query execution already begins. With luck, we can integrate all three of these promising techniques for high-bandwidth query execution over large databases.
- Stream processing, stream indexing – deferred maintenance and incremental optimization of derived data like Vertica’s write-optimized storage can be taken much further, with log-structured merge-forests and stepped merging widely used in industrial key-value stores but still far from optimal in the three crucial dimensions of insertion (information capture) bandwidth, query efficiency and query performance, and insertion-to-query latency.
As seminar outcomes, we hope to advance the state of the art as well as educate all seminar participants on both current technologies and new ideas. We will structure our activities at Dagstuhl in such a way that each group and individual leaves with the possibility of publishing their results.
- data bases / information retrieval