12.11.17 - 17.11.17, Seminar 17461

Connecting Visualization and Data Management Research

The following text appeared on our web pages prior to the seminar, and was included as part of the invitation.


What prevents analysts from acquiring wisdom from data sources? To use data, to better understand the world and act upon it, we need to understand both the computational and the human-centric aspects of data-intensive work. In this Dagstuhl Seminar, we will establish the foundations for the next generation of data management and visualization systems by bringing together these two largely independent communities. While exploratory data analysis (EDA) has been a pillar of data science for decades, maintaining interactivity during EDA has become difficult, as the data size and complexity continue to grow. In modern day statistical systems, it is assumed that all data need to fit into memory in order to support interactivity. However, when faced with a large amount of data, few techniques can support EDA fluidly. During this process, interactivity is critical: if each operation takes hours or even minutes to finish, analysts lose track of their thought process. Bad analyses cause bad interpretations, bad actions and bad policies.

As data scale and complexity increases, the novel solutions that will ultimately enable interactive, large-scale EDA will have to come from truly interdisciplinary and international work. Today, database researchers can store and query massive amounts of data, including methods for distributed, streaming and approximate computation. Data mining techniques provide ways to discover unexpected patterns and to automate and scale well-defined analysis procedures. Recent systems research has looked at how to develop novel database systems architectures to support the iterative, optimization-oriented workloads of data-intensive algorithms. Of course, both the inputs and outputs of these systems are ultimately driven by people, in support of analysis tasks. The life-cycle of data involves an iterative, interactive process of determining which questions to ask, the data to analyze, appropriate features and models, and interpreting results. In order to achieve better analysis outcomes, data processing systems require improved interfaces that account for the strengths and limitations of human perception and cognition. Meanwhile, to keep up with the rising tide of data, interactive visualization tools need to integrate more techniques from databases and machine learning.

By bringing together the two disparate communities, we will lay the foundations for next generation of data (management, mining, retrieval) and interactive visualization systems. Isolated, computational breakthroughs will forever remain locked behind inadequate interfaces, while improvements in how users experience data analysis will never scale to the volume of present-day datasets. Together, these two communities will both realize their vision for empowering people to use data to understand and improve the world. The main goal of this seminar is to bring together researchers from the data management community and the interactive visualization community to address the challenge of envisioning and developing the next generation of data systems that can support the cognitive, perceptual, and analytical needs of the human. Few existing systems can truly do so at scale, and with the explosive growth in data size and complexity it is more important than ever to gather researchers from the different disciplines to designing a research agenda that can meet the demands of the future. Specifically, we aim to:

  1. Formulate a research agenda around the challenge of reducing latency in interactive data systems. For example, develop novel pre-aggregation strategies that take into account the particular constraints and strengths of human perceptual systems; this will enable at-scale human-centric database indices, human-centric statistical analysis environments, and so on.
  2. Focus on specific theoretical and practical problems that need to be solved in order to enable human-centric, large-scale data exploration.
  3. Run special issues in leading journals such as IEEE CG&A and ACM TiiS to disseminate the developed research agenda and the research outcomes from this community.

Creative Commons BY 3.0 DE
Remco Chang, Jean-Daniel Fekete, Juliana Freire, and Carlos E. Scheidegger