Dagstuhl Seminar 19182: Multi-Document Information Consolidation

Dagstuhl Seminar 19182

Multi-Document Information Consolidation

( Apr 28 – May 03, 2019 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/19182

Organizers

Ido Dagan (Bar-Ilan University - Ramat Gan, IL)
Iryna Gurevych (TU Darmstadt, DE)
Dan Roth (University of Pennsylvania - Philadelphia, US)
Amanda Stent (Bloomberg - New York, US)

Contact

Shida Kunz (for scientific matters)
Susanne Bach-Bernhard (for administrative matters)

Publications

Multi-Document Information Consolidation (Dagstuhl Seminar 19182). Ido Daga, Iryna Gurevych, Dan Roth, and Amanda Stent. In Dagstuhl Reports, Volume 9, Issue 4, pp. 124-139, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Motivation

Show Motivation

At this 5-day Dagstuhl Seminar, an interdisciplinary collection of leading researchers will discuss and develop research ideas that will lead to advanced multi-document information consolidation systems.

The seminar is centered around four major themes: 1) how to represent information in multi-document repositories; 2) how to support inference over multi-document repositories; 3) how to summarize and visualize multi-document repositories for decision support; and 4) how to do information validation on multi-document repositories. Questions of semantics, pragmatics (author perspectives, argumentation), representation, and reasoning (including spatio-temporal reasoning and entailment) arise across these themes.

Seminar participants, including established experts and promising young researchers from academia and industry, will have the opportunity to present research ideas, to outline their vision regarding the future of multi-document information consolidation technologies, and to collaborate in discussion groups led by the seminar organizers.

This is an action-oriented seminar. Outcomes will include: schemas for representing information from multi-document repositories, algorithms for validating that information and for doing inference over the representations, new use cases for multi-document information consolidation, and ideas for evaluation of approaches to multi-document information consolidation and their assessment within higher-level applications. Most importantly, this seminar represents a unique and first of its kind opportunity to bring together researchers from many fields that may contribute to the development of advanced multi-document information consolidation systems. Thus, an important outcome of the meeting as such will be a roadmap for the establishment of a corresponding research community.

Creative Commons BY 3.0 DE

Ido Dagan, Iryna Gurevych, Dan Roth, and Amanda Stent

Summary

Show Summary

Today's natural language processing (NLP) systems mainly work on individual text pieces like individual sentences, paragraphs, or documents. For example, most question answering systems require that the answer to a user's questions is provided in a single document, ideally in a single sentence. If the information is scattered across documents, most systems will fail. The capability of current systems to link information across multiple documents is often limited.

This is in strong contrast to how humans answer difficult questions or make complex decisions. We usually read multiple documents on a topic and then infer the answer to the question or we make a decision based on the evidence we found. In most cases, we consolidate the information across multiple sources. Further, considering only one document can create a biased or incomplete view on a topic. Many aspects in our life are open for multiple interpretations and each author must limit which and how to present information in a document. By reading multiple documents, we are able to identify overlaps, differences, and opposing views between authors. Considering and merging these possible opposing views can be a crucial step in everydays decision making. For example, when booking a hotel, one might read multiple user reviews and create an internal understanding of positive and negative aspects of the hotel.

At this 5-day Dagstuhl Seminar, an interdisciplinary collection of leading researchers discussed and develop research ideas that will lead to advanced multi-document information consolidation systems and enable modern NLP systems to profit from a multi-document perspective.

The seminar was centered around four major themes: 1) how to represent information in multi-document repositories; 2) how to support inference over multi-document repositories; 3) how to summarize and visualize multi-document repositories for decision support; and 4) how to do information validation on multi-document repositories. Questions of semantics, pragmatics (author perspectives, argumentation), representation, and reasoning (including spatio-temporal reasoning and entailment) arose across these themes.

Information Representations and Inference are the theoretical foundation that allows systems to extract information from multiple documents and to infer new knowledge. The challenge is to find a representation that can broadly be used. Multiple documents are likely to bring up multiple perspectives and identifying the relations between them is at the heart of multi-document inference.

A connection to real applications, used in actual user scenarios, is critical for the advancement of the multi-document information consolidation field. Multi-document systems are especially useful in situations where users must make complex decisions. In such situations, users often search for sources that provide information or arguments for or against certain decisions. Hence, one working group focused on Multi-Document Systems in User Decision Scenarios. In order to provide value to users, the systems must return true statements (accurate syntheses) given all the available context. Otherwise, the user lose their trust in the system. However, the internet is full of statements that are intentionally or unintentionally misleading. So how do we identify these misleading statements and avoid that those are presented to a user without the necessary context? This research question was addressed by a working group focusing on Information Validation for Multi-Document Scenarios.

Seminar participants, including established experts and promising young researchers from academia and industry, had the opportunity to present research ideas, to outline their vision regarding the future of multi-document information consolidation technologies, and to collaborate in discussion groups led by the seminar organizers.

Each seminar participant joined two themes with regular cross-theme meetings. As the topics are quite novel in the research community, no established terminology and task definition exists. Hence, participants discussed how these tasks can be defined such that these can be scientifically studied. For example, what does it mean to validate a claim? The participants discussed issues with existing approaches and proposed new research topics, that could be the content of a Ph.D. thesis.

The last day of the seminar was used to summarize results and to create collaborations for future research projects. In total, 12 joint research ideas were proposed. For most of the ideas, this is a new collaboration.

Creative Commons BY 3.0 Unported license

Ido Dagan, Iryna Gurevych, Dan Roth, and Amanda Stent

Participants

Show Participants

Omri Abend (The Hebrew University of Jerusalem, IL) [dblp]
Sebastian Arnold (Beuth Hochschule für Technik Berlin , DE) [dblp]
Timothy Baldwin (The University of Melbourne, AU) [dblp]
Jonathan Berant (Tel Aviv University, IL) [dblp]
Giuseppe Carenini (University of British Columbia - Vancouver, CA) [dblp]
Ido Dagan (Bar-Ilan University - Ramat Gan, IL) [dblp]
Dipanjan Das (Google - New York, US) [dblp]
Daniel Deutsch (University of Pennsylvania, US) [dblp]
Laura Dietz (University of New Hampshire - Durham, US) [dblp]
Yoav Goldberg (Bar-Ilan University - Ramat Gan, IL) [dblp]
Dan Goldwasser (Purdue University - West Lafayette, US) [dblp]
Iryna Gurevych (TU Darmstadt, DE) [dblp]
Heng Ji (Rensselaer Polytechnic Institute - Troy, US) [dblp]
Ayal Klein (Bar-Ilan University - Ramat Gan, IL) [dblp]
Alexander Koller (Universität des Saarlandes, DE) [dblp]
Chin-Yew Lin (Microsoft Research - Beijing, CN) [dblp]
Fei Liu (University of Central Florida - Orlando, US) [dblp]
Nafise Sadat Moosavi (TU Darmstadt, DE) [dblp]
Barbara Plank (IT University of Copenhagen, DK) [dblp]
Nils Reimers (TU Darmstadt, DE) [dblp]
Dan Roth (University of Pennsylvania - Philadelphia, US) [dblp]
Steve S. Skiena (Stony Brook University, US)
Gabriel Stanovsky (University of Washington - Seattle, US) [dblp]
Amanda Stent (Bloomberg - New York, US) [dblp]
Ivan Titov (University of Edinburgh, GB) [dblp]
Kentaro Torisawa (NICT - Kyoto, JP) [dblp]
Gisela Vallejo (TU Darmstadt, DE) [dblp]
Andreas Vlachos (University of Cambridge, GB) [dblp]
Yue Zhang (Westlake University - Hangzhou, CN) [dblp]

Classification

artificial intelligence / robotics
data bases / information retrieval
semantics / formal methods

Keywords

cross-document representations
cross-document inference
information validation
decision support systems

Seminar 19182

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 19182

Multi-Document Information Consolidation

( Apr 28 – May 03, 2019 )

Permalink

Organizers

Contact

Publications

Motivation

Summary

Participants

Classification

Keywords