11. – 16. November 2018, Dagstuhl-Seminar 18462

Provenance and Logging for Sense Making


Jean-Daniel Fekete (INRIA Saclay – Orsay, FR)
T. J. Jankun-Kelly (Mississippi State University, US)
Melanie Tory (Tableau Software – Palo Alto, US)
Kai Xu (Middlesex University – London, GB)

Auskunft zu diesem Dagstuhl-Seminar erteilt

Dagstuhl Service Team


Dagstuhl Report, Volume 8, Issue 11 Dagstuhl Report
Dagstuhl's Impact: Dokumente verfügbar
Programm des Dagstuhl-Seminars [pdf]


Sense making is one of the biggest challenges in data analysis faced by both industry and research community. It involves understanding the data and uncovering its model, generating hypothesis, select analysis methods, creating novel solutions, designing evaluation, and the critical thinking and learning wherever needed. Recently many techniques and software tools have become available to address the challenges of so-called `Big Data'. However, these mostly target lower-level sense making tasks such as storage and search. There is limited support for the higher-level sense making tasks mentioned earlier. As a result, these tasks are often performed manually and the limited human cognition capability becomes the bottleneck, negatively impacting data analysis and decision making. This applies to both industry and academia. Scientific research is a sense making process as well: it includes all the sense making tasks mentioned earlier, with an emphasis on the generation of novel solutions. Similar to data analysis, most of these are conducted manually and considerably limit the progress of scientific discovery.

Visual Analytics is a fast-growing field that specifically targets sense making [6]. It achieves this by integrating interactive visualization with data analytics such as Machine Learning. It follows a human-centered principle: instead of replacing human thinking and expertise with algorithms and models, it enables the two to work together to achieve the best sense making result. Fast progress has been made in the last decade or so, which is evidenced by the publications in the Visual Analytics conferences such as IEEE VAST (part of IEEE VIS) and the increasing popularity of visual approaches in many other fields such as Machine Learning, Information Retrieval, and Databases.

One recent advance in Visual Analytics research is the capture, visualization, and analysis of provenance information. Provenance is the history and context of sense making, including the “7W” (Who, When, What, Why, Where, Which, and HoW) of data used and the users’ critical thinking process. The concept of provenance is not entirely new. In 1996, Shneiderman recognized the importance of provenance by classifying history as one of the seven fundamental tasks in data visualization [4]. History allows users to review previous actions during visual exploration, which is typically long and complex. Provenance can provide an overview of what has been examined and reveal the gap of unexplored data or solutions. Provenance can also support collaborative sense making and communication by sharing the rich context of what others have accomplished [7].

The topic of provenance has been studied in many other fields, such as Human-Computer Interaction (HCI), WWW, Database, and Reproducible Science. The HCI research community heavily relies on user information, such as logging and observation, in their study. These closely relate to provenance and share the common goal of making sense of user behavior and thinking. The collaboration between the two fields can potential create novel solutions for some long-standing research challenges. For instance, it has been shown that provenance information can be used to semi-automate part of the qualitative analysis of user evaluation data [3], which is notoriously time-consuming.

The WWW and Database research community has been actively working on provenance for the last decade or so, with a particular focus on tracking data collection and processing. This has led to the recent publication of the W3C reference model on provenance. A important part of these efforts is to make sense of the source and quality of the data and the analyses base on them, which has a significant impact on their uncertainty and trustworthiness [1]. Similarly, there is a fast growing Reproducible Science community, whose interest in provenance is “improving the reliability and efficiency of scientific research ... increase the credibility of the published scientific literature and accelerate discovery” [2].

There is a trend of cross-community collaboration on provenance-related research, which has led to some exciting outcomes such as the work integrating visualization with reproducible science [5, 8]. However, there are still many challenging research questions and many provenance-related research efforts remain disconnected. This seminar brought together researchers from the diverse fields that relate to provenance. Shared challenges were identified and progress has been made towards developing novel solutions.

The main research question that this seminar aims to address is: How to collect, analyze, and summarize provenance information to support the design and evaluation of novel techniques for sense making across related fields. The week-long seminar started with a day of self-introduction, lighting talks, and research topic brain storming. The self-introduction allowed attendees to know each other better, and the lighting talks covered the latest work in the research fields related to provenance. Each participant proposed several research questions, which were then collated and voted on to form the breakout groups. The following are the research areas chosen by the participants:

  • Storytelling and narrative;
  • Provenance standard and system integration;
  • Task abstraction for provenance analysis;
  • Machine learning and provenance;
  • User modeling and intent.

The rest of the week was breakout session, and each participant had the option to change group halfway. The seminar finished with a presentation from each group and discussions on the next steps to continue the collaboration. Many interesting problems were identified, and progress was made towards new solutions. Please refer to the rest of the report for the details on the identified research questions and the progress made by the end of week.


  1. Melanie Herschel and Marcel Hlawatsch. Provenance: On and Behind the Screens. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pages 2213–2217, New York, NY, USA, 2016. ACM.
  2. Marcus R. Munafò, Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. A manifesto for reproducible science. Nature Human Behaviour, 1(1):0021, January 2017.
  3. P. H. Nguyen, K. Xu, A. Wheat, B. L. W. Wong, S. Attfield, and B. Fields. SensePath: Understanding the Sensemaking Process Through Analytic Provenance. IEEE Transactions on Visualization and Computer Graphics, 22(1):41–50, January 2016.
  4. Ben Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the 1996 IEEE Symposium on Visual Languages, pages 336–, Washington, DC, USA, 1996. IEEE Computer Society.
  5. C. T. Silva, J. Freire, and S.P. Callahan. Provenance for Visualizations: Reproducibility and Beyond. Computing in Science Engineering, 9(5):82–89, September 2007.
  6. James J. Thomas and Kristin A. Cook, editors. Illuminating the Path: The Research and Development Agenda for Visual Analytics. National Visualization and Analytics Centre, 2005.
  7. Kai Xu, Simon Attfield, T. J. Jankun-Kelly, Ashley Wheat, Phong H. Nguyen, and Nallini Selvaraj. Analytic provenance for sensemaking: A research agenda. IEEE Computer Graphics and Applications, 35(3):56–64, 2015.
  8. Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, and Tim Kraska. Controlling false discoveries during interactive data exploration. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD’17, pages 527–540, New York, NY, USA, 2017. ACM.
Summary text license
  Creative Commons BY 3.0 Unported license
  Jean-Daniel Fekete, T. J. Jankun-Kelly, Melanie Tory, and Kai Xu


  • Computer Graphics / Computer Vision
  • Data Bases / Information Retrieval
  • Society / Human-computer Interaction


  • Visual Analytics
  • Provenance
  • Logging
  • Sensemaking
  • Reproducible Science


In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.


Download Übersichtsflyer (PDF).

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.


Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.