Dagstuhl Seminar 19282: Data Series Management

Dagstuhl Seminar 19282

Data Series Management

( Jul 07 – Jul 12, 2019 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/19282

Organizers

Anthony Bagnall (University of East Anglia - Norwich, GB)
Richard L. Cole (Tableau Software - Palo Alto, US)
Themis Palpanas (Paris Descartes University, FR)
Konstantinos Zoumpatianos (Harvard University - Cambridge, US)

Contact

Shida Kunz (for scientific matters)
Susanne Bach-Bernhard (for administrative matters)

Publications

Data Series Management (Dagstuhl Seminar 19282). Anthony Bagnall, Richard L. Cole, Themis Palpanas, and Kostas Zoumpatianos. In Dagstuhl Reports, Volume 9, Issue 7, pp. 24-39, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Schedule

Schedule

Motivation

Show Motivation

We now witness a very strong interest by users across different domains on data series (a.k.a. time series) management. It is not unusual for industrial applications that produce data series to involve numbers of sequences (or subsequences) in the order of billions (i.e., multiple TBs). As a result, analysts are unable to handle the vast amounts of data series that they have to manage and process. The goal of this Dagstuhl Seminar is to enable researchers and practitioners to exchange ideas and foster collaborations in the topic of data series management and identify the corresponding open research directions. The main questions answered will be the following: i) What are the data series management needs across various domains and what are the shortcomings of current systems, ii) How can we use machine learning to optimize our current data systems, and how can these systems help in machine learning pipelines? iii) How can visual analytics assist the process of analyzing big data series collections?

The seminar will focus on the following key topics related to data series management:

Data series storage and access patterns: We will describe some of the existing (academic and commercial) systems for managing data series, describe their differences, and comment on their evolution over time. We will try to answer the following questions: What are their shortcomings? What are the best ways to lay out data series on disk and in memory to optimize data series queries? How can we integrate domain specific summarizations/indexes and compression schemes in existing systems?
Query optimization: One of the most important open problems in data series management is that of query optimization. However, there has been no work on estimating the hardness/selectivity of data series similarity search queries. This is of paramount importance for effective access path selection. During the seminar we will discuss the current work in the topic, and examine future research directions.
Machine learning and data mining for data series: Recent developments in deep neural network architectures have also caused an intense interest in examining the interactions between machine learning algorithms and data series management. We will discuss machine learning from two perspectives. First, we will discuss machine learning techniques for data series analysis tasks, as well as for tuning data series management systems. Second, we will discuss how data series management systems can help in the scalability of machine learning pipelines.
Visualization for data series exploration: There are several research problems in the intersection of visualization and data series management. Existing data series visualization and human interaction techniques only consider very small datasets, yet, they can play a significant role in the tasks of similarity search, analysis, and exploration of very large data series collections. We will discuss promising directions for addressing these problems related to both the frontend and the backend.
Applications in multiple domains: We will discuss applications and requirements originating from various fields, including astrophysics, neuroscience, engineering, and operations management. The goal will be to allow scientists and practitioners to exchange ideas, foster collaborations, and develop a common terminology.

Creative Commons BY 3.0 DE

Anthony Bagnall, Richard L. Cole, Themis Palpanas, and Kontantinos Zoumptianos

Press Room

Show Press Room

Press Reviews

What’s like going at Schloss Dagstuhl?
Blog entry by Michele Dallachiesa on Stratosphere LABS, July 24, 2019

Summary

Show Summary

We now witness a very strong interest by users across different domains on data series (a.k.a. time series) management systems. It is not unusual for industrial applications that produce data series to involve numbers of sequences (or subsequences) in the order of billions. As a result, analysts are unable to handle the vast amounts of data series that they have to filter and process. Consider for instance that in the health industry, for several of their analysis tasks, neuroscientists are reducing each of their 3,000 point long sequences to just the global average, because they cannot handle the size of the full sequences. Moreover, in the quest towards personalized medicine, scientists are expected to collect around 2-40 ExaBytes of DNA sequence data by 2025. In engineering, there is an abundance of sequential data. Consider for example that each engine of a Boeing Jet generates 10 TeraBytes of data every 30 minutes, while domains such as energy (i.e., wind turbine monitoring, etc.), data center, and network monitoring continuously produce measurements, forcing organizations to develop their custom solutions (i.e., Facebook Gorilla).

The goal of this seminar was to enable researchers and practitioners to exchange ideas in the topic of data series management, towards the definition of the principles necessary for the design of a big sequence management system, and the corresponding open research directions.

The seminar focused on the following key topics related to data series management:

Applications in multiple domains: We examined applications and requirements originating from various fields, including astrophysics, neuroscience, engineering, and operations management. The goal was to allow scientists and practitioners to exchange ideas, foster collaborations, and develop a common terminology.

Data series storage and access patterns: We described some of the existing (academic and commercial) systems for managing data series, examined their differences, and commented on their evolution over time. We identified their shortcomings, debated on the best ways to lay out data series on disk and in memory in order to optimize data series queries, and examined how to integrate domain specific summarizations/indexes and compression schemes in existing systems.

Query optimization: One of the most important open problems in data series management is that of query optimization. However, there has been no work on estimating the hardness/selectivity of data series similarity search queries. This is of paramount importance for effective access path selection. During the seminar we discussed the current work in the topic, and identified promising future research directions.

Machine learning and data mining for data series: Recent developments in deep neural network architectures have also caused an intense interest in examining the interactions between machine learning algorithms and data series management. We discussed machine learning from two perspectives. First, how machine learning techniques can be applied for data series analysis tasks, as well as for tuning data series management systems. Second, we how data series management systems can contribute towards the scalability of machine learning pipelines.

Visualization for data series exploration: There are several research problems in the intersection of visualization and data series management. Existing data series visualization and human interaction techniques only consider very small datasets, yet, they can play a significant role in the tasks of similarity search, analysis, and exploration of very large data series collections. We discussed open research problems along these directions, related to both the frontend and the backend.

Creative Commons BY 3.0 Unported license

Anthony Bagnall, Richard L. Cole, Themis Palpanas, and Konstantinos Zoumpatianos

Participants

Show Participants

Azza Abouzied (New York University - Abu Dhabi, AE) [dblp]
Anthony Bagnall (University of East Anglia - Norwich, GB) [dblp]
Anastasia Bezerianos (INRIA Saclay - Orsay, FR) [dblp]
Paul Boniol (Paris Descartes University, FR)
Richard L. Cole (Tableau Software - Palo Alto, US) [dblp]
Michele Dallachiesa (Minodes GmbH - Berlin, DE) [dblp]
Karima Echihabi (ENSIAS-Mohammed V University - Rabat, MA) [dblp]
Jean-Daniel Fekete (INRIA Saclay - Orsay, FR) [dblp]
Germain Forestier (University of Mulhouse, FR) [dblp]
Pierre Gaillard (CEA de Saclay - Gif-sur-Yvette, FR) [dblp]
Anna Gogolou (INRIA Saclay - Orsay, FR) [dblp]
Søren Kejser Jensen (Aalborg University, DK) [dblp]
Mourad Khayati (University of Fribourg, CH) [dblp]
Alessandro Longo (University of Rome III, IT) [dblp]
Ammar Mechouche (Airbus Helicopters - Marignane, FR) [dblp]
Abdullah Mueen (University of New Mexico, US) [dblp]
Rodica Neamtu (Worcester Polytechnic Institute, US) [dblp]
Themis Palpanas (Paris Descartes University, FR) [dblp]
John Paparrizos (University of Chicago, US) [dblp]
Patrick Schäfer (HU Berlin, DE) [dblp]
Dennis Shasha (New York University, US) [dblp]
Nesime Tatbul (Intel Labs & MIT - Cambridge, US) [dblp]
Peng Wang (Fudan University - Shanghai, CN) [dblp]
Richard Wesley (Tableau Software - Seattle, US) [dblp]
Konstantinos Zoumpatianos (Harvard University - Cambridge, US) [dblp]

Classification

data bases / information retrieval
data structures / algorithms / complexity

Keywords

sequences
time series
data series analytics
machine learning
data systems

Seminar 19282

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 19282

Data Series Management

( Jul 07 – Jul 12, 2019 )

Permalink

Organizers

Contact

Publications

Schedule

Motivation

Press Room

Press Reviews

Summary

Participants

Classification

Keywords