Dagstuhl Seminar 17101
Databases on Future Hardware
( Mar 05 – Mar 10, 2017 )
- Gustavo Alonso (ETH Zürich, CH)
- Michaela Blott (Xilinx - Dublin, IE)
- Jens Teubner (TU Dortmund, DE)
- Annette Beyer (for administrative matters)
- Adopting OpenCAPI for High Bandwidth Database Accelerators : article in Third International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'17), At Denver, USA - Fang, Jian; Mulder, Yvo T. B.; Huang, Kangli; Qiao, Yang; Zeng, Xianwei; Lee, Jinho; Hidders, Jan; Hofstee, H. Peter - ResearchGate, 2017. - 1 p..
- doppioDB : A hardware accelerated database : article in 2017 27th International Conference on Field Programmable Logic and Applications (FPL) - Sidler, David; Owaida, Muhsen; Istvan, Zsolt; Kara, Kaan; Alonso, - Los Alamitos : IEEE, 2017. - 1 p..
- Reproducible Floating-Point Aggregation in RDBMSs - Müller, Ingo; Arteaga, Andrea; Hoefler, Torsten; Alonso, Gustavo - Cornell University : arXiv.org, 2018. - 16 pp..
- Scalable inference of decision tree ensembles : Flexible design for CPU-FPGA platforms : article in 2017 27th International Conference on Field Programmable Logic and Applications (FPL) - Owaida, Muhsen; Zhang, Hantian; Zhang, Ce; Alonso, Gustavo - Los Alamitos : IEEE, 2018. - 8 pp..
It was in the late 1990s, when some researchers realized that database management systems were particularly affected by the ongoing hardware evolution. More than any other class of software, databases suffered from the widening gap between memory and CPU performance--just when main memories began to grow large enough to keep meaningful data sets entirely in RAM. Yet, it took another 15 years until that insight found its way into actual systems, such as SAP HANA, Microsoft's Apollo/Hekaton, IBM DB2 BLU, or Oracle Database In-Memory.
Right now, hardware technology is again at a crossroad that will disrupt the way we build and use database systems. Heterogeneous system architectures will replace the prevalent multi-core designs and leverage the dark silicon principle to combat power limitations. Non-volatile memories bring persistence at the speed of current DRAM chips. High-speed interconnects allow for parallelism at unprecedented scales--but also force software to deal with distributed systems characteristics (e.g., locality, unreliability).
It is not clear yet, how precisely the new systems are going to look like-hardware makers are still figuring out which configurations will yield the best price/performance trade-offs. Nor is it clear how redesigned software could look like to take advantage of the new hardware.
The hard- and software disciplines have evolved as mostly independent communities in the past years. The key goal of this Dagstuhl seminar is to bring them back together. We plan to have intensive and open-minded discussions, with representatives from the relevant areas. During the 5-day workshop, hardware architects, system designers, experts in query processing and transaction management as well as experts in operating systems and networking will discuss the challenges and opportunities involved, so hard- and software can evolve together, rather than only individually and independently.
The seminar will live Dagstuhl's open discussion format. Possible topics for discussions could be
- Shared, cache-coherent memory vs. “distributed system in a box”?
- What are the sweet spots to balance the characteristics of novel hardware components (e.g., latency, capacity/cost, reliability for non-volatile memories; bandwidth and latency for interconnect networks)?
- Co-Processors (“accelerators”) - Which role will they play in tomorrow's data-intensive systems? What is the best way to integrate them into the rest of the hard- and software architecture?
- The characteristics of modern storage technologies are often far away from the classical assumptions of database systems: non-volatile memories offer persistent, yet byte-addressable memory; network storage technologies might allow for new system architectures; etc. What are the consequences on indexing, data access, or recovery mechanisms?
- Networks have evolved from a very slow communication medium to powerful, very high-speed, and often even intelligent interconnects (e.g., InfiniBand, RDMA). How can we embrace these technologies in the end-to-end system?
Computing hardware is undergoing radical changes. Forced by physical limitations (mainly heat dissipation problems), systems trend toward massively parallel and heterogeneous designs. New technologies, e.g., for high-speed networking or persistent storage emerge and open up new opportunities for the design of database systems. This push by technology was the main motivation to bring top researchers from different communities - particularly hard- and software -- together to a Dagstuhl seminar and have them discuss about "Databases on Future Hardware." This report briefly summarizes the discussions that took place during the seminar.
With regards to the mentioned technology push, during the seminar bandwidth; memory and storage technologies; and accelerators (or other forms of specialized computing functionality or instruction sets) were considered the most pressing topic areas in database design.
But it turned out that the field is influenced also by a strong push from economy/market. New types of applications - in particular Machine Learning - as well as the emergence of "compute" as an independent type of resources - e.g., in the form of cloud computing or appliances - can have a strong impact on the viability of a given system design.
Bandwidth; Memory and Storage Technologies
During the seminar, probably the most often stated issue in the field was bandwidth - at various places in the overall system stack, such as CPU <-->,memory; machine <--> machine (network); access to secondary storage (e.g., disk, SSD, NVM). But very interestingly, the issue was not only brought up as a key limitation to database performance by the seminar attendees with a software background. Rather, it also became clear that the hardware side, too, is very actively looking at bandwidth. The networking community is working at ways to provide more bandwidth, but also to provide hooks that allow the software side to make better use of the available bandwidth. On the system architecture side, new interface technologies (e.g., NVlink, available in IBM's POWER8) aim to ease the bandwidth bottleneck.
Bandwidth usually is a problem only between system components. To illustrate, HMC memories ("hybrid memory cube") provide only 320 GB/s of external bandwidth, but internally run at 512 GB/s per cube ("vault"); in a 16-vault configuration, this corresponds to 8 TB/s of internal bandwidth. This may open up opportunities to build heterogeneous system designs with near-data processing capabilities. HMC memory units could, for instance, contain (limited) processing functionality associated with every storage vault. This way, simple tasks, such as data movement, re-organization, or scanning could be off-loaded and performed right where the data resides. Similar concepts have been used, e.g., to filter data in the network, pre-process data near secondary storage, etc.
In breakout sessions during the seminar, attendees discussed the implications that such system designs may have. Most importantly, the designs will require to re-think the existing (programming) interfaces. How does the programmer express the off-loaded task? Which types of tasks can be off-loaded? What are the limitations of the near-data processing unit (e.g., which memory areas can it access)? How do host processor and processing unit exchange tasks, data, and results? Clearly, a much closer collaboration will be needed between the hard- and software sides to make this route viable.
But new designs may also shake up the commercial market. The traditional hardware market is strongly separated between the memory and logic worlds, with different manufacturers and processes. Breaking up the separation may be a challenge both from a technological and from a business/market point of view.
The group found only little time during the seminar to discuss another potential game-changer in the memory/storage space. Companies are about to bring their first non-volatile memory (NVM) components to the market (and, in fact, Intel released its first round of "3D XPoint" products shortly after the seminar). The availability of cheap, high-capacity, byte-addressable, persistent storage technologies will have profound impact on database software. Discussions during the seminar revolved around the question whether classical persistent (disk-based) mechanisms or in-memory mechanisms are more appropriate to deal with the new technology.
A way of dealing with the technology trend toward heterogeneity is to enrich general-purpose systems with more specialized processing units, accelerators. Popular incarnations of this idea are graphics processors (GPUs) or field-programmable gate arrays (FPGAs); but there are also co-processing units for floating-point arithmetics, multimedia processing, or network acceleration.
Accelerators may fit well with what was said above. E.g., they could be used as near-data processing units. But also the challenges mentioned above apply to many accelerator integration strategies. Specifically, the proper programming interface, but also the role of an accelerator in the software system stack - e.g., sharing it between processes - seem to be yet-unsolved challenges.
During the seminar, also the role of accelerators specifically for database systems was discussed. It was mentioned, on the one hand, that accelerators should be used to accelerate functionality outside the database's core tasks, because existing hard- and software is actually quite good at handling typical database tasks. On the other hand, attendees reported that many of the non-core-database tasks, Machine Learning in particular, demand a very high flexibility that is very hard to provide with specialized hardware.
New Applications / Machine Learning
Databases are the classical device to deal with high volumes of data. With the success of Machine Learning in many fields of computing, the question arises how databases and Machine Learning applications should relate to one another, and to which extent the database community should embrace ML functionality in their system designs.
Some of the seminar attendees have, in fact, given examples of very impressive and successful systems that apply ideas from database co-processing to Machine Learning scenarios. In a breakout session on the topic, it was concluded that the two worlds should still be treated separately also in the future.
A key challenge around Machine Learning seems to be the very high expectations with regard to the flexibility of the system. ML tasks are often described in high-level languages (such as R or Python) and demand expressiveness that goes far beyond the capabilities of efficient database execution engines. Attempts to extend these engines with tailor-made ML operators were not very well received, because even the new operators were too restrictive for ML users.
Somewhat unexpectedly, during the seminar it became clear that the interplay of databases and hardware is not only a question of technology. Rather, examples from the past and present demonstrate that even a technologically superior database solution cannot survive today without a clear business case.
The concept of cloud computing plays a particularly important role in these considerations. From a business perspective, compute resources - including database functionality - have become a commodity. Companies move their workloads increasingly toward cloud-based systems, raising the question whether the future of databases is also in the cloud.
A similar line of arguments leads to the concept of database appliances. Appliances package database functionality in a closed box, allowing (a) to treat the service as a commodity (business aspect) and (b) to tailor hard- and software of the appliance specifically to the task at hand, with the promise of maximum performance (technology aspect).
And, in fact, both concepts - cloud computing and appliances - may go well together. Cloud setups enable to control the entire hard- and software stack; large installations may provide the critical mass to include tailor-made (database) functionality also within the cloud.
- Anastasia Ailamaki (EPFL - Lausanne, CH) [dblp]
- Gustavo Alonso (ETH Zürich, CH) [dblp]
- Carsten Binnig (Brown University - Providence, US) [dblp]
- Spyros Blanas (Ohio State University - Columbus, US) [dblp]
- Michaela Blott (Xilinx - Dublin, IE) [dblp]
- Alexander Böhm (SAP SE - Walldorf, DE) [dblp]
- Peter A. Boncz (CWI - Amsterdam, NL) [dblp]
- Sebastian Breß (DFKI - Berlin, DE) [dblp]
- Markus Dreseler (Hasso-Plattner-Institut - Potsdam, DE) [dblp]
- Ken Eguro (Microsoft Research - Redmond, US) [dblp]
- Babak Falsafi (EPFL - Lausanne, CH) [dblp]
- Henning Funke (TU Dortmund, DE) [dblp]
- Goetz Graefe (Google - Madison, US) [dblp]
- Christoph Hagleitner (IBM Research Zurich, CH) [dblp]
- Peter Hofstee (IBM Research Lab. - Austin & TU Delft) [dblp]
- Stratos Idreos (Harvard University - Cambridge, US) [dblp]
- Zsolt Istvan (ETH Zürich, CH) [dblp]
- Viktor Leis (TU München, DE) [dblp]
- Eliezer Levy (Huawei Tel Aviv Research Center - Hod Hasharon, IL) [dblp]
- Stefan Manegold (CWI - Amsterdam, NL) [dblp]
- Andrew W. Moore (University of Cambridge, GB) [dblp]
- Ingo Müller (ETH Zürich, CH) [dblp]
- Onur Mutlu (Carnegie Mellon University - Pittsburgh, US) [dblp]
- Thomas Neumann (TU München, DE) [dblp]
- Gilles Pokam (Intel - Santa Clara, US) [dblp]
- Kenneth Ross (Columbia University - New York, US) [dblp]
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Eric Sedlar (Oracle Labs - Redwood Shores, US) [dblp]
- Margo Seltzer (Harvard University - Cambridge, US) [dblp]
- Jürgen Teich (Friedrich-Alexander-Universität Erlangen-Nürnberg, DE) [dblp]
- Jens Teubner (TU Dortmund, DE) [dblp]
- Pinar Tözün (IBM Almaden Center - San Jose, US) [dblp]
- Annett Ungethüm (TU Dresden, DE) [dblp]
- Stratis D. Viglas (Google - Madison, US) [dblp]
- Thomas Willhalm (Intel Deutschland GmbH - Feldkirchen, DE) [dblp]
- Ce Zhang (ETH Zürich, CH) [dblp]
- Daniel Ziener (TU Hamburg-Harburg, DE) [dblp]
- data bases / information retrieval
- Computer Architecture
- Hardware Support for Databases