https://www.dagstuhl.de/19182

April 28 – May 3 , 2019, Dagstuhl Seminar 19182

Multi-Document Information Consolidation

Organizers

Ido Dagan (Bar-Ilan University – Ramat Gan, IL)
Iryna Gurevych (TU Darmstadt, DE)
Dan Roth (University of Pennsylvania – Philadelphia, US)
Amanda Stent (Bloomberg – New York, US)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 9, Issue 4 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents

Summary

Today's natural language processing (NLP) systems mainly work on individual text pieces like individual sentences, paragraphs, or documents. For example, most question answering systems require that the answer to a user's questions is provided in a single document, ideally in a single sentence. If the information is scattered across documents, most systems will fail. The capability of current systems to link information across multiple documents is often limited.

This is in strong contrast to how humans answer difficult questions or make complex decisions. We usually read multiple documents on a topic and then infer the answer to the question or we make a decision based on the evidence we found. In most cases, we consolidate the information across multiple sources. Further, considering only one document can create a biased or incomplete view on a topic. Many aspects in our life are open for multiple interpretations and each author must limit which and how to present information in a document. By reading multiple documents, we are able to identify overlaps, differences, and opposing views between authors. Considering and merging these possible opposing views can be a crucial step in everydays decision making. For example, when booking a hotel, one might read multiple user reviews and create an internal understanding of positive and negative aspects of the hotel.

At this 5-day Dagstuhl Seminar, an interdisciplinary collection of leading researchers discussed and develop research ideas that will lead to advanced multi-document information consolidation systems and enable modern NLP systems to profit from a multi-document perspective.

The seminar was centered around four major themes: 1) how to represent information in multi-document repositories; 2) how to support inference over multi-document repositories; 3) how to summarize and visualize multi-document repositories for decision support; and 4) how to do information validation on multi-document repositories. Questions of semantics, pragmatics (author perspectives, argumentation), representation, and reasoning (including spatio-temporal reasoning and entailment) arose across these themes.

Information Representations and Inference are the theoretical foundation that allows systems to extract information from multiple documents and to infer new knowledge. The challenge is to find a representation that can broadly be used. Multiple documents are likely to bring up multiple perspectives and identifying the relations between them is at the heart of multi-document inference.

A connection to real applications, used in actual user scenarios, is critical for the advancement of the multi-document information consolidation field. Multi-document systems are especially useful in situations where users must make complex decisions. In such situations, users often search for sources that provide information or arguments for or against certain decisions. Hence, one working group focused on Multi-Document Systems in User Decision Scenarios. In order to provide value to users, the systems must return true statements (accurate syntheses) given all the available context. Otherwise, the user lose their trust in the system. However, the internet is full of statements that are intentionally or unintentionally misleading. So how do we identify these misleading statements and avoid that those are presented to a user without the necessary context? This research question was addressed by a working group focusing on Information Validation for Multi-Document Scenarios.

Seminar participants, including established experts and promising young researchers from academia and industry, had the opportunity to present research ideas, to outline their vision regarding the future of multi-document information consolidation technologies, and to collaborate in discussion groups led by the seminar organizers.

Each seminar participant joined two themes with regular cross-theme meetings. As the topics are quite novel in the research community, no established terminology and task definition exists. Hence, participants discussed how these tasks can be defined such that these can be scientifically studied. For example, what does it mean to validate a claim? The participants discussed issues with existing approaches and proposed new research topics, that could be the content of a Ph.D. thesis.

The last day of the seminar was used to summarize results and to create collaborations for future research projects. In total, 12 joint research ideas were proposed. For most of the ideas, this is a new collaboration.

Summary text license
  Creative Commons BY 3.0 Unported license
  Ido Dagan, Iryna Gurevych, Dan Roth, and Amanda Stent

Classification

  • Artificial Intelligence / Robotics
  • Data Bases / Information Retrieval
  • Semantics / Formal Methods

Keywords

  • Cross-document representations
  • Cross-document inference
  • Information validation
  • Decision support systems

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.