Dagstuhl-Seminar 20401: Computational Approaches for Digitized Historical Newspapers

Dagstuhl-Seminar 20401

Computational Approaches for Digitized Historical Newspapers Cancelled

( 27. Sep – 02. Oct, 2020 )

Permalink

Bitte benutzen Sie folgende Kurz-Url zum Verlinken dieser Seite: https://www.dagstuhl.de/20401

Ersetzt durch

Dagstuhl-Seminar 22292: Computational Approaches for Digitized Historical Newspapers (2022-07-17 - 2022-07-22) (Details)

Organisatoren

Antoine Doucet (University of La Rochelle, FR)
Marten Düring (University of Luxembourg, LU)
Maud Ehrmann (EPFL - Lausanne, CH)
Clemens Neudecker (Staatsbibliothek zu Berlin, DE)

Kontakt

Shida Kunz (für wissenschaftliche Fragen)
Susanne Bach-Bernhard (für administrative Fragen)

Motivation

Show Motivation

Historical newspapers are mirrors of past societies. Published over centuries on a regular basis, they keep track of the great and small history and reflect the political, moral, and economic environments in which they were produced. They also hold dense, continuous, and multimodal information which, coupled with their inherent contextualization, makes them invaluable primary sources for the humanities. They are in high demand by scholars and the general public, have been digitized in huge numbers, and pose timely challenges for computer scientists and humanities scholars.

Following the decisive efforts led by libraries around the world to improve optical character recognition (OCR) technology and generalize full text digitization and access, recent years have seen a notable increase of academic research initiatives around historical newspaper processing. This momentum can be attributed not only to the long-term interest of humanities scholars in newspapers coupled with their recent digitization, but also to the fact that these digital sources concentrate many challenges for computer science, especially computational linguistics and computer vision, all the more difficult - and interesting - since tackling them requires to take digital (humanities) scholarship needs and knowledge into account. Within interdisciplinary frameworks, various and complementary approaches spanning the areas of natural language processing, computer vision, large-scale computing and visualization, are currently being developed, evaluated, and deployed. Overall, these efforts are contributing a pioneering set of tools, system architectures, technical infrastructures, and interfaces covering several aspects of historical newspaper processing and exploitation. In this context, this Dagstuhl Seminar will gather researchers and practitioners involved in this endeavour in order to share experiences, analyze successes and shortcomings, deepen our understanding of the interplay between computational aspects and digital scholarship, and design a roadmap for future challenges.

Three closely intertwined challenges stand out and will be considered: First, historical newspapers pose great challenges in terms of document and text processing. Recognition of the complex and varying layout and structure is still out of reach of current algorithms, and noisy OCR, language change and lack of domain-specific resources undermine traditional information extraction approaches. Second, system architecture and knowledge representation describe the increased need for standardized and modular information flows between systems. Open questions include the scalable integration of various processing components, and the accommodation of scholar research requirements. Third, historians and other user groups require tools for content discovery and management to reflect their iterative, exploratory research workflows. Here, the potential of personalized recommendation systems and visualizations still awaits full exploitation. To generate trust in systems, users also require more transparency regarding algorithmic outputs and the exploitation of inherently imperfect, uncertain data.

Solutions to these challenges require the close collaboration of experts in computer science, digital history and library, and information science. The proposed seminar will be organized around a set of abstracted problems derived from the above-mentioned challenges and propose strategies to resolve them.

Creative Commons BY 3.0 DE

Antoine Doucet, Marten Düring, Maud Ehrmann, and Clemens Neudecker

Teilnehmer

Zeige Teilnehmer

Antoine Doucet (University of La Rochelle, FR) [dblp]
Marten Düring (University of Luxembourg, LU) [dblp]
Maud Ehrmann (EPFL - Lausanne, CH) [dblp]
Clemens Neudecker (Staatsbibliothek zu Berlin, DE) [dblp]

Klassifikation

data bases / information retrieval
society / human-computer interaction
software engineering

Schlagworte

natural language processing
document structure and layout analysis
information extraction
digital history
digital scholarship

Seminar 20401

Suche auf der Schloss Dagstuhl Webseite

Schloss Dagstuhl Services

Seminare

Innerhalb dieser Seite:

Externe Seiten:

Publishing

Innerhalb dieser Seite:

Externe Seiten:

dblp

Innerhalb dieser Seite:

Externe Seiten:

Dagstuhl-Seminar 20401

Computational Approaches for Digitized Historical Newspapers Cancelled

( 27. Sep – 02. Oct, 2020 )

Permalink

Ersetzt durch

Organisatoren

Kontakt

Motivation

Teilnehmer

Klassifikation

Schlagworte