Dagstuhl Seminar 14302
Digital Palaeography: New Machines and Old Texts
( Jul 20 – Jul 24, 2014 )
- Tal Hassner (The Open University of Israel - Raanana, IL)
- Robert Sablatnig (TU Wien, AT)
- Dominique Stutzmann (CNRS - Paris, FR)
- Ségolène Tarte (University of Oxford, GB)
- Annette Beyer (for administrative matters)
Digital Palaeography emerged as a research community in the late 2000s. Following a successful Dagstuhl Perspectives Workshop on Computation and Palaeography (12382), this seminar focuses on the interaction of Palaeography and computerized tools developed in Computer Vision for the analysis of digital images.
Given the present techniques developed to enhance damaged documents, optical text recognition or computer-assisted transcription, identification and categorisation of scripts and scribes, the current technical challenge is to develop “new machines”, i.e. efficient solutions for palaeographic tasks, and to provide scholars with quantitative evidence towards palaeographical arguments, even beyond the reading of “old texts” (ancient, mediaeval and early modern documents), which is of interest to the industry, to the wider public and to the broad community of genealogists.
The core issue is to create the conditions of a fluid and seamless communication between Humanities and Computer Sciences in order to advance research in Palaeography, Manuscript Studies and History, on the one hand, and in Computer Vision, Semantic Technologies, Image Processing, and Human Computer Interaction (HCI) systems on the other hand. Researchers must articulate their respective systems of proof, in order to produce efficient systems that present palaeographical data quickly and easily, and in a way that scholars can understand, evaluate, and trust, to optimize collaboration, prevent the implementation of “black boxes”, make use of the outreach potential offered by computerized technologies to enrich palaeographical knowledge and facilitate sharing the methodologies.
The primary outcome will be the sharing of insights based on the state-of-the-art of Digital Palaeography, the interdisciplinary discussions on the potentials and limitations of future research in this field and the establishment of a community of practice in Digital Palaeography. Further prospective outcomes include the dissemination of methodologies and current research within the community, a better understanding of how to conduct interdisciplinary research across all the fields of expertise involved in Digital Palaeography, and new research directions in the Computer Sciences and new research strategies in Palaeography.
On the technical side, the following key issues have been identified:
- Ontologies to discuss and qualify the variability of scripts
- Semi-automatic and interactive image processing and classification methods
- Approaches to alignment (establishing a correspondence between images of texts, textual transcriptions and printed editions)
- Search methods across modalities and datasets (texts/images/shapes)
- Methods to define “mid-level features”, which are key to productive communication between scholars and computational systems
- Data sources, data collection, and their use and management
- Use of input devices (e.g., sensitive digital pens, touch-surfaces) to interact with the images of texts, to collect data on letter formation and on the ergonomics of writing.
- Explorations of the cognitive underpinnings of palaeographical research and how digital technologies might assist (e.g., kinaesthetic engagement in reading)
- Methods of visualization
- Dissemination of results, ideas, and developments outside of expert communities and to the general public
- Going beyond Palaeography - applying to the expertise and capabilities developed by the joint efforts of researchers from both fields to new ones, including forensics, recognition of handwriting in business (e.g., postal services), indexing of large scale Cultural Heritage datasets with a large audience (e.g., genealogy)
Digital Palaeography emerged as a research community in the late 2000s. Following a successful Dagstuhl Perspectives Workshop on Computation and Palaeography (12382), this seminar focused on the interaction of Palaeography and computerized tools developed in Computer Vision for the analysis of digital images. Given the present techniques developed to enhance damaged documents, optical text recognition or computer-assisted transcription, identification and categorisation of scripts and scribes, the current technical challenge is to develop "new machines", i.e. efficient solutions for palaeographic tasks, and to provide scholars with quantitative evidence towards palaeographical arguments, even beyond the reading of "old texts" (ancient, medieval and early modern documents), which is of interest to the industry, to the wider public, and to the broad community of genealogists.
The identified core issue was to create the conditions of a fluid and seamless communication between Humanities and Computer Sciences scholars in order to advance research in Palaeography, Manuscript Studies and History, on the one hand, and in Computer Vision, Semantic Technologies, Image Processing, and Human Computer Interaction (HCI) systems on the other hand. Indeed, researchers must articulate their respective systems of proof, in order to produce efficient systems that present palaeographical data quickly and easily, and in a way that scholars can understand, evaluate, and trust. To establish fruitful collaborations, it is thus essential to address the "black box" issue, to make a better use of the outreach potential offered by computerized technologies to enrich palaeographical knowledge, and to facilitate the sharing of both the CS and palaeographical methodologies.
This seminar was able to shed light onto two major evolutions between 2012 and 2014; these notable shifts are to do with interdisciplinary communication and with access to "black box" expertise. On the one hand, the notion of "communication" or "bridging the gap" (as expressed by seminar 14301, which took place in conjunction with our own seminar) has become more specific in that issues and problems are now better identified, understood, and expressed. While the two-fold expression "digital palaeography" might lead one to believe that the communication involves only two sorts of actors, it has been expressed in ways clearer than ever that Digital Palaeography as a field is much more complex than a simplistic adjunction of Computer Sciences and Palaeography; indeed CS research, engineering and software development, support and service, linguistics, palaeography, art history, and cultural heritage institutions (Galleries, Libraries, Archives, and Museums -- GLAM) all form part of the Digital Palaeography research arena. Good communication requires correct identification of the roles and competence of each actor, and a well-balanced project has to associate/include/foresee the participation of the other actors. It is for example important to clarify that palaeographers are not responsible for copyright or image quality provided by GLAM institutions, in the same way as CS researcher are not responsible for designing interfaces. Within each community, a better understanding of methods and interests of the actors of the other communities is needed to find the right partners (e.g.: keyword spotting is not alignment; writer identification is not script classification). On the other hand, the "black box" issue seems to have been addressed by most teams through the introduction or increase of interactivity of the software tools they presented; interactivity was used not only as a means to produce clear and convincing results, but also to overcome the shortcomings of strictly automatic approaches. In this sense, the reintroduction of "the human into the loop" (or "the use of the users") is part of a process allowing a better understanding on both sides. The "human in the loop" can and should be integrated at all stages, and, even if this need is not always perceived, it is crucial that substantial efforts be dedicated to making implicit assumptions or knowledge explicit. Special attention should be given to avoid the development of tools relying on tautological approaches where tools or datasets incorporate expectations as an underlying (and often implicit) model. In this regard, one cannot overestimate that an unclear result is as important for historians as a clear-cut clustering. In the middle, the "human" gives feedback on preliminary results, enables the enhancement and improvement of the model, as well as creates ground-truth. The display of intermediary results and the integration of user feedback within the process are a welcome solution offered by the latest developments. Likewise, palaeographers have developed new strategies, in their ways of formulating tool requirements or expressing requirements for which they can evaluate the results themselves, regardless of the software being an opaque black-box (P. Stokes, D. Stutzmann, M. Lawo with B. Gottfried).
Overall, this seminar seems to have operated a paradigm shift from black-box issues to trust issues, in the sense that when we first identified black-box issues, we focussed on "computational black boxes", when "human black boxes" are in fact just as problematic. Instead of focussing on computational black-boxes as an issue, we were able to formulate that the important endeavour is that of establishing trust in the respective methodological approaches to the research questions of the research domains. This trust in methodologies is usually mediated by human interactions ("humans in the loop" again!), and the ways in which scholars are able to share an intuitive understanding of their respective expertises with non-experts.
It hence follows that a new (technical) challenge arises, consisting in the creation and implementation of an integrated software tool, web service suite, or environment that would allow users to access and work with extant datasets and tools. The impetus to take up this challenge resides as much in the Humanities as it does in the Computer Sciences. By aggregating the multiple, isolated, specific tools developed by CS researchers through a common access point, digital humanists would support the development of better evaluation metrics and promote a wider use of CS technologies among more traditional Humanities scholars, who could thus become more aware of the existing tools, more autonomous (i.e. less dependant on CS researchers) and thereby empowered. As a reciprocal positive effect, CS researchers could more easily validate their results and gain access to a wider range of annotated datasets. This challenge is also naturally related to trending key concepts such as "interoperability" and "open access". It furthermore engages with the question of the nature of success metrics in the Humanities, where a successful tool is not only the one giving the best results, it is also one enjoying wide acceptance and a large number of users. Improving ergonomics is mandatory, to put the user in the middle and to accumulate a consistent critical mass of annotations (both as feedback and ground-truth).
- Orna Almogi (Universität Hamburg, DE)
- Vincent Christlein (Universität Erlangen - Nürnberg, DE) [dblp]
- Nachum Dershowitz (Tel Aviv University, IL) [dblp]
- Véronique Eglin (INRIA / INSA - Lyon, FR) [dblp]
- Jihad El-Sana (Ben Gurion University - Beer Sheva, IL) [dblp]
- Gernot Fink (TU Dortmund, DE) [dblp]
- Björn Gottfried (Universität Bremen, DE) [dblp]
- Anna Gutgarts-Weinberger (The Hebrew University of Jerusalem, IL)
- Tal Hassner (The Open University of Israel - Raanana, IL) [dblp]
- Rolf Ingold (University of Fribourg, CH) [dblp]
- Noga Levy (Tel Aviv University, IL) [dblp]
- Marcus Liwicki (DFKI - Kaiserslautern, DE) [dblp]
- Josep Lladós (Autonomus University of Barcelona, ES) [dblp]
- Frederike Neuber (Karl-Franzens-Universität Graz, AT)
- Jean-Marc Ogier (University of La Rochelle, FR) [dblp]
- Robert Sablatnig (TU Wien, AT) [dblp]
- Joan Andreu Sanchez Peiro (Technical University of Valencia, ES) [dblp]
- Wendy Scase (University of Birmingham, GB)
- Iris Shagrir (The Open University of Israel - Raanana, IL)
- Peter A. Stokes (King's College - London, GB) [dblp]
- Dominique Stutzmann (CNRS - Paris, FR) [dblp]
- Ségolène Tarte (University of Oxford, GB) [dblp]
- Nicole Vincent (Paris Descartes University, FR) [dblp]
- Georg Vogeler (Karl-Franzens-Universität Graz, AT) [dblp]
- Dagstuhl Perspectives Workshop 12382: Computation and Palaeography: Potentials and Limits (2012-09-18 - 2012-09-21) (Details)
- computer graphics / computer vision
- data bases / information retrieval
- society / human-computer interaction
- Digital Palaeography
- Cultural Heritage
- Interdisciplinary Studies