https://www.dagstuhl.de/19282
07. – 12. Juli 2019, Dagstuhl-Seminar 19282
Data Series Management
Organisatoren
Anthony Bagnall (University of East Anglia – Norwich, GB)
Richard L. Cole (Tableau Software – Palo Alto, US)
Themis Palpanas (Paris Descartes University, FR)
Konstantinos Zoumpatianos (Harvard University – Cambridge, US)
Auskunft zu diesem Dagstuhl-Seminar erteilt
Dokumente
Dagstuhl Report, Volume 9, Issue 7
Motivationstext
Teilnehmerliste
Programm des Dagstuhl-Seminars [pdf]
Press Room
- What’s like going at Schloss Dagstuhl?
Blog entry by Michele Dallachiesa on Stratosphere LABS, July 24, 2019
Summary
We now witness a very strong interest by users across different domains on data series (a.k.a. time series) management systems. It is not unusual for industrial applications that produce data series to involve numbers of sequences (or subsequences) in the order of billions. As a result, analysts are unable to handle the vast amounts of data series that they have to filter and process. Consider for instance that in the health industry, for several of their analysis tasks, neuroscientists are reducing each of their 3,000 point long sequences to just the global average, because they cannot handle the size of the full sequences. Moreover, in the quest towards personalized medicine, scientists are expected to collect around 2-40 ExaBytes of DNA sequence data by 2025. In engineering, there is an abundance of sequential data. Consider for example that each engine of a Boeing Jet generates 10 TeraBytes of data every 30 minutes, while domains such as energy (i.e., wind turbine monitoring, etc.), data center, and network monitoring continuously produce measurements, forcing organizations to develop their custom solutions (i.e., Facebook Gorilla).
The goal of this seminar was to enable researchers and practitioners to exchange ideas in the topic of data series management, towards the definition of the principles necessary for the design of a big sequence management system, and the corresponding open research directions.
The seminar focused on the following key topics related to data series management:
Applications in multiple domains: We examined applications and requirements originating from various fields, including astrophysics, neuroscience, engineering, and operations management. The goal was to allow scientists and practitioners to exchange ideas, foster collaborations, and develop a common terminology.
Data series storage and access patterns: We described some of the existing (academic and commercial) systems for managing data series, examined their differences, and commented on their evolution over time. We identified their shortcomings, debated on the best ways to lay out data series on disk and in memory in order to optimize data series queries, and examined how to integrate domain specific summarizations/indexes and compression schemes in existing systems.
Query optimization: One of the most important open problems in data series management is that of query optimization. However, there has been no work on estimating the hardness/selectivity of data series similarity search queries. This is of paramount importance for effective access path selection. During the seminar we discussed the current work in the topic, and identified promising future research directions.
Machine learning and data mining for data series: Recent developments in deep neural network architectures have also caused an intense interest in examining the interactions between machine learning algorithms and data series management. We discussed machine learning from two perspectives. First, how machine learning techniques can be applied for data series analysis tasks, as well as for tuning data series management systems. Second, we how data series management systems can contribute towards the scalability of machine learning pipelines.
Visualization for data series exploration: There are several research problems in the intersection of visualization and data series management. Existing data series visualization and human interaction techniques only consider very small datasets, yet, they can play a significant role in the tasks of similarity search, analysis, and exploration of very large data series collections. We discussed open research problems along these directions, related to both the frontend and the backend.


Classification
- Data Bases / Information Retrieval
- Data Structures / Algorithms / Complexity
Keywords
- Sequences
- Time series
- Data series analytics
- Machine learning
- Data systems