Database Management Systems (DBMSs) are used ubiquitously. Due to the ever-growing performance demands and the virtually unlimited hardware resources that are provided by public cloud infrastructure providers, increasingly sophisticated systems and optimizations are developed. This is a major challenge for developers of DBMSs, which have to ensure that their system is both correct and efficient. Recent work on automatic testing of DBMSs found a large number of bugs in widely-used DBMSs, indicating that this issue deserves more attention. The goal of this Dagstuhl Seminar is to identify challenges in the domain of DBMS reliability and robustness, find new ways to tackling existing problems, and connect both practitioners as well as researchers working in this domain.
The seminar takes an interdisciplinary stance and both leading academic and industrial researchers as well as test practitioners from multiple domains have been invited. Specifically, the seminar aims to bring together experts in the domains of databases, automatic testing, as well as formal methods. We will focus the discussion on four distinct, but connected themes:
- Practices and challenges of ensuring the reliability of DBMSs: It is important that the seminar’s discussion is grounded and tackles the actual challenges faced by the community. Understanding the reliability challenges of DBMSs as well as the practices and innovations in the implementation and optimization of DBMSs is crucial to devise new techniques to ensure their reliability, performance, and scalability.
- Test oracles to validate DBMSs: Automatically testing a DBMS requires a test oracle, which determines whether the DBMS or a specific component functions as expected. The goal is to discuss test oracles and testing approaches for validating correctness, performance, security, and other properties.
- Automatic generation of queries and databases to test DBMSs: An effective test case is crucial to expose potential bugs through automatic and manual testing. In the context of testing DBMSs, a test case typically refers to a database and a query. The goal is to investigate different approaches to generating test cases, and how they can be effectively combined with test oracles.
- Formal methods and verification as applied to DBMSs: Formal methods and verification give high confidence in the correctness of a DBMS. The goal is to discuss and advance the formalization of various aspects and components of DBMSs in order to verify critical properties of them.
DataBase Management Systems (DBMSs) are used ubiquitously. Due to the ever-growing number and size of data sets, increasing performance demands, and the virtually unlimited hardware resources that are provided by public cloud infrastructure, sophisticated systems and optimizations are developed continuously. This dynamic and demanding environment is a major challenge for developers of DBMSs, which have to ensure that their systems are both correct and efficient.
Database management systems are a well-established field with several decades of research and engineering attention. These efforts have resulted in a multitude of both open-source and commercial systems that are widely deployed in production today and provide the backbone of a vast range of mission-critical applications. Still, surprisingly, recent work on automatic testing of DBMSs found a large number of bugs in widely-used DBMSs. This clearly indicated that the topic of ensuring the reliability and robustness of DBMS deserves more attention, and that key insights from neighboring domains such as automatic testing and formal methods could potentially help to advance the state of the art in DBMS engineering.
Goals and Outcomes
One of the central goals and outcomes of the seminar was to build a common foundation and understanding for the key challenges of DBMS engineering, and how they can be potentially addressed. To this end, the seminar focused on
- Best practices and challenges in building open source and commercial database engines.
Here, the key objectives include a high developer efficiency, mandating quick feedback by tests and verification tools already during feature development, as well as systematic (stress) testing of the software under high load and error conditions.
- The applicability of formal methods and verification tools to DBMS.
Formal methods can be of great help to prove the correctness of key database system components such as query compilers, distributed consensus protocols, data replication components, or modules dealing with high availability. Still, an important question is how to systematically identify those components that can benefit from formal verification with reasonable implementation effort, and how to best integrate these methods into existing systems.
- Advanced testing techniques such as fuzzers, query synthesis, and workload generators.
These methods allow to significantly increase the test coverage of a DBMS by systematically exploring uncovered code paths and putting stress on individual, important subsystems such as input verification and error handling that are a frequent source of software defects.
- Methods for the automatic generation of test data and testcase reduction.
Occasionally, defects in database software are only found by customers running very complex queries operating on confidential data sets. Thus, to allow for problem reproduction, developers benefit from a minimal data set and a simplified query specification that does not disclose confidential data or exhibit unnecessary complexity.
- Security aspects such as ensuring confidentiality and data integrity in the presence of different classes of attackers.
Attendee Mix and Seminar Structure
The seminar lasted 2.5 days. Its format and attendee mix was significantly influenced by the ongoing pandemic. Of the 34 attendees, 13 attended in person and 21 remotely. All but one of the in-person attendees were based in Europe. Overall, we received the highest response rate from Europe (20 attendees), and a lower one from Asia (8 attendees) and the US (6 attendees). We are grateful to the two Video Conference Assistants (VCAs), Jack Clark and Mark Raasveldt, who managed the equipment to ensure a smooth experience for all attendees.
We started the seminar with an introduction round in which every attendee introduced themselves. We held another such session in the late afternoon, to accommodate the US attendees. Prior to the seminar, we contacted attendees to give overview talks to establish a common discussion basis, which was useful given that the attendees came from different scientific communities. We had such overview talks on the first and second day. On the second and third day, we had in-depth talks. While we had planned breakout sessions, many of the talks were followed by fruitful and unplanned discussions. On the last day, we had a group discussion on the takeaways and future plans.
One major result from the seminar was to identify open problems and areas of future work that the group wants to address in an interdisciplinary manner. Among others, this includes the creation of a reference manual for database engineering groups to avoid redundant work and re-inventing techniques already established (or discarded) by other teams, the identification of database modules (e.g. the query compiler and transaction processing system) that can benefit from formal verification, designing new test oracles to test various data-centric systems for different kind of bugs, as well as the establishment of a common testcase specification format and a test corpus that can be shared between DBMS engineering teams. We discussed proposing another instance of the Dagstuhl seminar to utilize the established discussion basis and work on addressing these specific challenges.
- Alexander Böhm (SAP SE - Walldorf, DE) [dblp]
- Cristian Cadar (Imperial College London, GB) [dblp]
- Alastair F. Donaldson (Imperial College London, GB) [dblp]
- Stefania Dumbrava (ENSIIE - Paris & SAMOVAR - Evry, FR) [dblp]
- Marco Guarnieri (IMDEA Software - Madrid, ES) [dblp]
- Marcel Kost (Salesforce - München, DE) [dblp]
- Burcu Kulahcioglu Ozkan (TU Delft, NL) [dblp]
- Hannes Mühleisen (CWI - Amsterdam, NL) [dblp]
- Danica Porobic (Oracle Labs Switzerland - Zürich, CH) [dblp]
- Mark Raasveldt (CWI - Amsterdam, NL) [dblp]
- Manuel Rigger (ETH Zürich, CH) [dblp]
- Anupam Sanghi (Indian Institute of Science - Bangalore, IN) [dblp]
- Artur Andrzejak (Universität Heidelberg, DE) [dblp]
- Chee-Yong Chan (National University of Singapore, SG) [dblp]
- Yongheng Chen (Georgia Institute of Technology - Atlanta, US) [dblp]
- Maria Christakis (MPI-SWS - Kaiserslautern, DE) [dblp]
- Jack Clark (ETH Zürich, CH)
- Jens Dittrich (Universität des Saarlandes - Saarbrücken, DE) [dblp]
- Paolo Guagliardo (University of Edinburgh, GB) [dblp]
- Jayant R. Haritsa (Indian Institute of Science - Bangalore, IN) [dblp]
- Miryung Kim (UCLA, US) [dblp]
- Kyle Kingsbury (San Francisco, US) [dblp]
- Greg Law (Undo - Cambridge, GB) [dblp]
- Si Liu (ETH Zürich, CH)
- Eric Lo (The Chinese University of Hong Kong, HK) [dblp]
- Muhammad Numair Mansur (MPI-SWS - Kaiserslautern, DE) [dblp]
- Zhou Qiang (PingCAP - Hangzhou, CN) [dblp]
- Tilmann Rabl (Hasso-Plattner-Institut, Universität Potsdam, DE) [dblp]
- Abhik Roychoudhury (National University of Singapore, SG) [dblp]
- Zhendong Su (ETH Zürich, CH) [dblp]
- S. Sudarshan (Indian Institute of Technology - Mumbai, IN) [dblp]
- Tao Xie (Peking University, CN) [dblp]
- Tianyin Xu (University of Illinois - Urbana-Champaign, US) [dblp]
- Mai Zheng (Iowa State University - Ames, US) [dblp]
- Dagstuhl Seminar 23441: Ensuring the Reliability and Robustness of Database Management Systems (2023-10-29 - 2023-11-03) (Details)
- Software Engineering
- Automatic Testing
- Formal Methods
- Database Management Systems