http://www.dagstuhl.de/06491

03. – 08. Dezember 2006, Dagstuhl Seminar 06491

Digital Historical Corpora - Architecture, Annotation, and Retrieval

Organisatoren

Lou Burnard (University of Oxford, GB)
Milena Dobreva (Bulgarian Academy of Sciences, BG)
Norbert Fuhr (Universität Duisburg-Essen, DE)
Anke Lüdeling (HU Berlin, DE)


Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Seminar Proceedings DROPS
Teilnehmerliste

Press Release

"Digitalisierung von historischen Texten" 27.11.06 (German only)

Art Exhibition Till Neu

Vernissage on December 6, 19:30h . More ...

Summary

The seminar brought together scholars from (historical) linguistics, (historical) philology, computational linguistics and computer science who work with collections of historical texts. These texts or digital libraries or corpora1 are collected for a number of different purposes such as lexicography, history, linguistics, philology etc. This, naturally, leads to different decisions in their design and architecture.

The purpose of this seminar was twofold: First we wanted to inform each other about the decisions each of us had taken in building a historical corpus and discuss the options. Second, we wanted to build an international network of people working with historical corpora and explore the options for further partnerships or projects. We think that both goals were reached.

The seminar was very interesting and stimulating. In the final discussion of the workshop, a ‘grand picture’ of the research issues in the area of digital historic corpora was developed (see Figure 1). Here the arcs represent enabling/supporting methods. As can be seen from this picture, the major goal is the research on large historical corpora, which requires work on the areas pointing to it directly or indirectly. A researcher’s workbench should support personalization, collaboration as well as problem solving. It must be complemented by tools for the annotation and the analysis of corpora, as well as providing functions for visualization, browsing and retrieval (especially for spelling variants). These methods should first be applied to and tested on small corpora, before they can be used for large corpora. In this context, evaluation also plays a major role. For large corpora (stored in digital libraries), the choice of an appropriate architecture is a crucial issue.

Another issue that was of interest to all participants is quality control and standardization.

Classification

  • Interdisciplinary (Computer Science
  • Computational Linguistics
  • Corpus Linguistics
  • Literacy
  • Bioinformatics) Own Categories: Corpus Architecture
  • Processing And Representing Multilingual And Multimodal Parallel Text Corpora
  • Annotation Standards
  • Retrieval Facilities In Multilevel Hypertext

Keywords

  • Corpus architecture
  • Annotation standards
  • Multilingual
  • Multimodal corpora
  • Fuzzy search
  • Multilevel hypertext

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.