Dagstuhl Seminar 18462
Provenance and Logging for Sense Making
( Nov 11 – Nov 16, 2018 )
- Jean-Daniel Fekete (INRIA Saclay - Orsay, FR)
- T. J. Jankun-Kelly (Mississippi State University, US)
- Melanie Tory (Tableau Software - Palo Alto, US)
- Kai Xu (Middlesex University - London, GB)
- Michael Gerke (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Sense making is one of the biggest challenges in data analysis faced by both the industry and the research community. It involves understanding the data and uncovering its model, generating a hypothesis, selecting analysis methods, creating novel solutions, designing evaluation, and also critical thinking and learning wherever needed. The research and development for such sense making tasks lags far behind the fast-changing user needs, such as those that emerged recently as the result of so-called “Big Data”. As a result, sense making is often performed manually and the limited human cognition capability becomes the bottleneck of sense making in data analysis and decision making.
One of the recent advances in sense making research is the capture, visualization, and analysis of provenance information. Provenance is the history and context of sense making, including the data/analysis used and the users’ critical thinking process. It has been shown that provenance can effectively support many sense making tasks. For instance, provenance can provide an overview of what has been examined and reveal gaps like unexplored information or solution possibilities. Besides, provenance can support collaborative sense making and communication by sharing the rich context of the sense making process.
Besides data analysis and decision making, provenance has been studied in many other fields, some- times under different names, for different types of sense making. For example, the Human-Computer Interaction community relies on the analysis of logging to understand user behaviors and intentions; the WWW and database community has been working on data lineage to understand uncertainty and trust- worthiness; and finally, reproducible science heavily relies on provenance to improve the reliability and efficiency of scientific research.
This Dagstuhl Seminar aims to bring together researchers from the diverse fields that relate to provenance and sense making to foster cross-community collaboration and develop novel solutions for shared challenges. More specifically, to
- articulate the state of the art in provenance research and software development;
- provide guidelines on how best to use provenance information for different scenarios;
- encourage cross-community collaboration on novel solutions based on provenance; and
- identify open research challenges and provide directions for further provenance research
Sense making is one of the biggest challenges in data analysis faced by both industry and research community. It involves understanding the data and uncovering its model, generating hypothesis, select analysis methods, creating novel solutions, designing evaluation, and the critical thinking and learning wherever needed. Recently many techniques and software tools have become available to address the challenges of so-called `Big Data'. However, these mostly target lower-level sense making tasks such as storage and search. There is limited support for the higher-level sense making tasks mentioned earlier. As a result, these tasks are often performed manually and the limited human cognition capability becomes the bottleneck, negatively impacting data analysis and decision making. This applies to both industry and academia. Scientific research is a sense making process as well: it includes all the sense making tasks mentioned earlier, with an emphasis on the generation of novel solutions. Similar to data analysis, most of these are conducted manually and considerably limit the progress of scientific discovery.
Visual Analytics is a fast-growing field that specifically targets sense making . It achieves this by integrating interactive visualization with data analytics such as Machine Learning. It follows a human-centered principle: instead of replacing human thinking and expertise with algorithms and models, it enables the two to work together to achieve the best sense making result. Fast progress has been made in the last decade or so, which is evidenced by the publications in the Visual Analytics conferences such as IEEE VAST (part of IEEE VIS) and the increasing popularity of visual approaches in many other fields such as Machine Learning, Information Retrieval, and Databases.
One recent advance in Visual Analytics research is the capture, visualization, and analysis of provenance information. Provenance is the history and context of sense making, including the “7W” (Who, When, What, Why, Where, Which, and HoW) of data used and the users’ critical thinking process. The concept of provenance is not entirely new. In 1996, Shneiderman recognized the importance of provenance by classifying history as one of the seven fundamental tasks in data visualization . History allows users to review previous actions during visual exploration, which is typically long and complex. Provenance can provide an overview of what has been examined and reveal the gap of unexplored data or solutions. Provenance can also support collaborative sense making and communication by sharing the rich context of what others have accomplished .
The topic of provenance has been studied in many other fields, such as Human-Computer Interaction (HCI), WWW, Database, and Reproducible Science. The HCI research community heavily relies on user information, such as logging and observation, in their study. These closely relate to provenance and share the common goal of making sense of user behavior and thinking. The collaboration between the two fields can potential create novel solutions for some long-standing research challenges. For instance, it has been shown that provenance information can be used to semi-automate part of the qualitative analysis of user evaluation data , which is notoriously time-consuming.
The WWW and Database research community has been actively working on provenance for the last decade or so, with a particular focus on tracking data collection and processing. This has led to the recent publication of the W3C reference model on provenance. A important part of these efforts is to make sense of the source and quality of the data and the analyses base on them, which has a significant impact on their uncertainty and trustworthiness . Similarly, there is a fast growing Reproducible Science community, whose interest in provenance is “improving the reliability and efficiency of scientific research ... increase the credibility of the published scientific literature and accelerate discovery” .
There is a trend of cross-community collaboration on provenance-related research, which has led to some exciting outcomes such as the work integrating visualization with reproducible science [5, 8]. However, there are still many challenging research questions and many provenance-related research efforts remain disconnected. This seminar brought together researchers from the diverse fields that relate to provenance. Shared challenges were identified and progress has been made towards developing novel solutions.
The main research question that this seminar aims to address is: How to collect, analyze, and summarize provenance information to support the design and evaluation of novel techniques for sense making across related fields. The week-long seminar started with a day of self-introduction, lighting talks, and research topic brain storming. The self-introduction allowed attendees to know each other better, and the lighting talks covered the latest work in the research fields related to provenance. Each participant proposed several research questions, which were then collated and voted on to form the breakout groups. The following are the research areas chosen by the participants:
- Storytelling and narrative;
- Provenance standard and system integration;
- Task abstraction for provenance analysis;
- Machine learning and provenance;
- User modeling and intent.
The rest of the week was breakout session, and each participant had the option to change group halfway. The seminar finished with a presentation from each group and discussions on the next steps to continue the collaboration. Many interesting problems were identified, and progress was made towards new solutions. Please refer to the rest of the report for the details on the identified research questions and the progress made by the end of week.
- Melanie Herschel and Marcel Hlawatsch. Provenance: On and Behind the Screens. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pages 2213–2217, New York, NY, USA, 2016. ACM.
- Marcus R. Munafò, Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. A manifesto for reproducible science. Nature Human Behaviour, 1(1):0021, January 2017.
- P. H. Nguyen, K. Xu, A. Wheat, B. L. W. Wong, S. Attfield, and B. Fields. SensePath: Understanding the Sensemaking Process Through Analytic Provenance. IEEE Transactions on Visualization and Computer Graphics, 22(1):41–50, January 2016.
- Ben Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the 1996 IEEE Symposium on Visual Languages, pages 336–, Washington, DC, USA, 1996. IEEE Computer Society.
- C. T. Silva, J. Freire, and S.P. Callahan. Provenance for Visualizations: Reproducibility and Beyond. Computing in Science Engineering, 9(5):82–89, September 2007.
- James J. Thomas and Kristin A. Cook, editors. Illuminating the Path: The Research and Development Agenda for Visual Analytics. National Visualization and Analytics Centre, 2005.
- Kai Xu, Simon Attfield, T. J. Jankun-Kelly, Ashley Wheat, Phong H. Nguyen, and Nallini Selvaraj. Analytic provenance for sensemaking: A research agenda. IEEE Computer Graphics and Applications, 35(3):56–64, 2015.
- Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, and Tim Kraska. Controlling false discoveries during interactive data exploration. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD’17, pages 527–540, New York, NY, USA, 2017. ACM.
- Sara Alspaugh (Splunk Inc. - San Francisco, US) [dblp]
- Daniel Archambault (Swansea University, GB) [dblp]
- Simon Attfield (Middlesex University - London, GB) [dblp]
- Leilani Battle (University of Maryland - College Park, US) [dblp]
- Christian Bors (Technische Universität Wien, AT) [dblp]
- Remco Chang (Tufts University - Medford, US) [dblp]
- Christopher Collins (UOIT - Oshawa, CA) [dblp]
- Michelle Dowling (Virginia Polytechnic Institute - Blacksburg, US) [dblp]
- Alex Endert (Georgia Institute of Technology - Atlanta, US) [dblp]
- Jean-Daniel Fekete (INRIA Saclay - Orsay, FR) [dblp]
- Melanie Herschel (Universität Stuttgart, DE) [dblp]
- T. J. Jankun-Kelly (Mississippi State University, US) [dblp]
- Andreas Kerren (Linnaeus University - Växjö, SE) [dblp]
- Steffen Koch (Universität Stuttgart, DE) [dblp]
- Robert Kosara (Tableau Software - Seattle, US) [dblp]
- Olga A. Kulyk (DEMCON - Enschede, NL) [dblp]
- Robert S. Laramee (Swansea University, GB) [dblp]
- Sérgio Lifschitz (PUC - Rio de Janeiro, BR) [dblp]
- Aran Lunzer (OS Vision - Los Angeles, US) [dblp]
- Phong H. Nguyen (City - University of London, GB) [dblp]
- William Pike (Pacific Northwest National Lab. - Richland, US) [dblp]
- Ali Sarvghad (University of Massachusetts - Amherst, US) [dblp]
- Claudio T. Silva (New York University, US) [dblp]
- Holger Stitz (Johannes Kepler Universität Linz, AT) [dblp]
- Melanie Tory (Tableau Software - Palo Alto, US) [dblp]
- John Wenskovitch (Virginia Polytechnic Institute - Blacksburg, US) [dblp]
- William Wong (Middlesex University - London, GB) [dblp]
- Kai Xu (Middlesex University - London, GB) [dblp]
- Michelle X. Zhou (Juji Inc. - Saratoga, US) [dblp]
- computer graphics / computer vision
- data bases / information retrieval
- society / human-computer interaction
- Visual Analytics
- Reproducible Science