10. – 15. April 2005, Dagstuhl Seminar 05151
Annotating, Extracting and Reasoning about Time and Events
Auskunft zu diesem Dagstuhl Seminar erteilt
Newspaper articles and other natural-language texts describe actions, events, and states of affairs. A crucial first step toward the automatic extraction of information from these texts-for use in such applications as automatic question answering or summarization-is the capacity to identify what events are being described and to make explicit when these events occurred and which temporal relations hold among them. There has recently been a renewed interest in making use of this kind of temporal and event-based information, with a wide variety of proposals and applications having been presented at recent conferences and workshops. The central goal of the seminar was to consolidate the insights that have been made in recent years and to identify and address issues concerning annotation, temporal reasoning and event identification that remain unresolved.
Much of the temporal information conveyed in a natural language text is left implicit. Significant recent work has focused on developing schema for making this information explicit, typically via annotation. An important result of contemporary research has been the adoption of a de facto standard for time and event annotation: TimeML. This XML-based markup language is specifically designed for annotating texts with tags that make explicit the temporal and event-based information conveyed by the text and has been adopted by a number of researchers in this domain. Much of our seminar was concerned with issues specific to this annotation scheme.
There are three basic types of tags used by the TimeML language: TIMEX tags are used to annotate temporal expressions and provide them with a normalized value (e.g. (TIMEX tid="t1" val="2005-04-21") April 21st, 2005 (TIMEX)); EVENT tags are used to annotate event expressions, providing "hooks" to relate them to other events and times introduced in the text (e.g. (EVENT eid="ei") opened (/EVENT)); So-called TLINK tags indicate the temporal relations that hold between times and events (e.g. the stock market opened on April 21st, 2005 at 10:00pm (TLINK event="e1" relatedTime="t1" relation=INCLUDED-BY)). Other tags are used to capture more subtle semantic relations. SLINK tags, for example, are used to indicate various kinds of subordination relations, such as the negation in The stock market did not open on April 21st, 2005 at 10:00pm or the only potential event in Investors hoped that the stock market would open on April 21st, 2005 at 10:00pm. A small corpus of TimeML annotated documents ( TimeBank ) has been generated, and can be browsed at timeml.org.
The main focus of the seminar was on TimeML-based temporal annotation and reasoning. We were concerned with three main points: determining how effectively one can use the TimeML language for consistent annotation, determining how useful such annotation is for further processing, and determining what modifications should be applied to the standard to improve its usefulness in applications such as question-answering and information retrieval.
Highlights of the Seminar
One of the highlights of the seminar was an annotation exercise which was carried out by all participants in groups. This served both as a touchstone for discussing issues that came up in the course of the seminar and as a source of examples of difficulties to be addressed. As the "target text" we choose a newspaper article from the Seattle Times describing the wedding of Prince Charles and Camilla Parker Bowles, an event that had just occurred.
The entire seminar was split up into groups of four or five researchers and each group carried out the annotation in two parts. In the first part, we attempted to identify, making use of the TimeML guidelines, the events and times which were described by the article and to identify the relations that hold among them. We found there to be very clear agreement about what events there were. Issues of event identity (is the waving the same as the greeting ?) were the foremost problems. Also the temporal relations were fairly well agreed upon. Here again there was very little in the way of disagreement, with the major problems being those surrounding the differentiation among simultaneity, overlap and immediate precedence. What was striking, however, was that there were far more events described (and for which TimeML guidelines require annotation) than participants judged would be likely to be useful for any application.
In a second part of the annotation exercise the same groups attempted to do metrical annotation, of the type described by Hobbes. Here we tried to specify how long each of the events was and how long the intervals between events were. In contrast, here there was wide variation in some cases (how long does the state of the couple being newly married hold?), but in other cases fairly close agreement. The highlight of this exercise came when we compared our consensus interpretation of the text to the BBC video of the event described. The very low correlation between our estimated durations for events (the waving, the walking to the car) and their actual durations as shown on the video raised questions, less for the value of annotation, but for the veracity of newspaper texts.