http://www.dagstuhl.de/17101

05. – 10. März 2017, Dagstuhl Seminar 17101

Databases on Future Hardware

Organisatoren

Gustavo Alonso (ETH Zürich, CH)
Michaela Blott (Xilinx – Dublin, IE)
Jens Teubner (TU Dortmund, DE)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 7, Issue 3 Dagstuhl Report
Motivationstext
Teilnehmerliste
Gemeinsame Dokumente
Programm des Dagstuhl Seminars [pdf]

Summary

Computing hardware is undergoing radical changes. Forced by physical limitations (mainly heat dissipation problems), systems trend toward massively parallel and heterogeneous designs. New technologies, e.g., for high-speed networking or persistent storage emerge and open up new opportunities for the design of database systems. This push by technology was the main motivation to bring top researchers from different communities - particularly hard- and software -- together to a Dagstuhl seminar and have them discuss about "Databases on Future Hardware." This report briefly summarizes the discussions that took place during the seminar.

With regards to the mentioned technology push, during the seminar bandwidth; memory and storage technologies; and accelerators (or other forms of specialized computing functionality or instruction sets) were considered the most pressing topic areas in database design.

But it turned out that the field is influenced also by a strong push from economy/market. New types of applications - in particular Machine Learning - as well as the emergence of "compute" as an independent type of resources - e.g., in the form of cloud computing or appliances - can have a strong impact on the viability of a given system design.

Bandwidth; Memory and Storage Technologies

During the seminar, probably the most often stated issue in the field was bandwidth - at various places in the overall system stack, such as CPU <-->,memory; machine <--> machine (network); access to secondary storage (e.g., disk, SSD, NVM). But very interestingly, the issue was not only brought up as a key limitation to database performance by the seminar attendees with a software background. Rather, it also became clear that the hardware side, too, is very actively looking at bandwidth. The networking community is working at ways to provide more bandwidth, but also to provide hooks that allow the software side to make better use of the available bandwidth. On the system architecture side, new interface technologies (e.g., NVlink, available in IBM's POWER8) aim to ease the bandwidth bottleneck.

Bandwidth usually is a problem only between system components. To illustrate, HMC memories ("hybrid memory cube") provide only 320 GB/s of external bandwidth, but internally run at 512 GB/s per cube ("vault"); in a 16-vault configuration, this corresponds to 8 TB/s of internal bandwidth. This may open up opportunities to build heterogeneous system designs with near-data processing capabilities. HMC memory units could, for instance, contain (limited) processing functionality associated with every storage vault. This way, simple tasks, such as data movement, re-organization, or scanning could be off-loaded and performed right where the data resides. Similar concepts have been used, e.g., to filter data in the network, pre-process data near secondary storage, etc.

In breakout sessions during the seminar, attendees discussed the implications that such system designs may have. Most importantly, the designs will require to re-think the existing (programming) interfaces. How does the programmer express the off-loaded task? Which types of tasks can be off-loaded? What are the limitations of the near-data processing unit (e.g., which memory areas can it access)? How do host processor and processing unit exchange tasks, data, and results? Clearly, a much closer collaboration will be needed between the hard- and software sides to make this route viable.

But new designs may also shake up the commercial market. The traditional hardware market is strongly separated between the memory and logic worlds, with different manufacturers and processes. Breaking up the separation may be a challenge both from a technological and from a business/market point of view.

The group found only little time during the seminar to discuss another potential game-changer in the memory/storage space. Companies are about to bring their first non-volatile memory (NVM) components to the market (and, in fact, Intel released its first round of "3D XPoint" products shortly after the seminar). The availability of cheap, high-capacity, byte-addressable, persistent storage technologies will have profound impact on database software. Discussions during the seminar revolved around the question whether classical persistent (disk-based) mechanisms or in-memory mechanisms are more appropriate to deal with the new technology.

Accelerators

A way of dealing with the technology trend toward heterogeneity is to enrich general-purpose systems with more specialized processing units, accelerators. Popular incarnations of this idea are graphics processors (GPUs) or field-programmable gate arrays (FPGAs); but there are also co-processing units for floating-point arithmetics, multimedia processing, or network acceleration.

Accelerators may fit well with what was said above. E.g., they could be used as near-data processing units. But also the challenges mentioned above apply to many accelerator integration strategies. Specifically, the proper programming interface, but also the role of an accelerator in the software system stack - e.g., sharing it between processes - seem to be yet-unsolved challenges.

During the seminar, also the role of accelerators specifically for database systems was discussed. It was mentioned, on the one hand, that accelerators should be used to accelerate functionality outside the database's core tasks, because existing hard- and software is actually quite good at handling typical database tasks. On the other hand, attendees reported that many of the non-core-database tasks, Machine Learning in particular, demand a very high flexibility that is very hard to provide with specialized hardware.

New Applications / Machine Learning

Databases are the classical device to deal with high volumes of data. With the success of Machine Learning in many fields of computing, the question arises how databases and Machine Learning applications should relate to one another, and to which extent the database community should embrace ML functionality in their system designs.

Some of the seminar attendees have, in fact, given examples of very impressive and successful systems that apply ideas from database co-processing to Machine Learning scenarios. In a breakout session on the topic, it was concluded that the two worlds should still be treated separately also in the future.

A key challenge around Machine Learning seems to be the very high expectations with regard to the flexibility of the system. ML tasks are often described in high-level languages (such as R or Python) and demand expressiveness that goes far beyond the capabilities of efficient database execution engines. Attempts to extend these engines with tailor-made ML operators were not very well received, because even the new operators were too restrictive for ML users.

Economic/Market Considerations

Somewhat unexpectedly, during the seminar it became clear that the interplay of databases and hardware is not only a question of technology. Rather, examples from the past and present demonstrate that even a technologically superior database solution cannot survive today without a clear business case.

The concept of cloud computing plays a particularly important role in these considerations. From a business perspective, compute resources - including database functionality - have become a commodity. Companies move their workloads increasingly toward cloud-based systems, raising the question whether the future of databases is also in the cloud.

A similar line of arguments leads to the concept of database appliances. Appliances package database functionality in a closed box, allowing (a) to treat the service as a commodity (business aspect) and (b) to tailor hard- and software of the appliance specifically to the task at hand, with the promise of maximum performance (technology aspect).

And, in fact, both concepts - cloud computing and appliances - may go well together. Cloud setups enable to control the entire hard- and software stack; large installations may provide the critical mass to include tailor-made (database) functionality also within the cloud.

License
  Creative Commons BY 3.0 Unported license
  Jens Teubner

Classification

  • Data Bases / Information Retrieval
  • Hardware
  • Networks

Keywords

  • Computer Architecture
  • Hardware Support for Databases
  • Non-Volatile

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.