https://www.dagstuhl.de/15122

### 15. – 20. März 2015, Dagstuhl-Seminar 15122

# Formal Models of Graph Transformation in Natural Language Processing

## Organisatoren

Frank Drewes (University of Umeå, SE)

Kevin Knight (USC – Marina del Rey, US)

Marco Kuhlmann (Linköping University, SE)

## Auskunft zu diesem Dagstuhl-Seminar erteilt

## Dokumente

Dagstuhl Report, Volume 5, Issue 3

Motivationstext

Teilnehmerliste

Gemeinsame Dokumente

Dagstuhl-Seminar Wiki

(Zum Einloggen bitte Seminarnummer und Zugangscode verwenden)

## Summary

Strings are fundamental data structures in natural language processing (NLP). Weighted finite-state string acceptors and transducers, first introduced as theoretical constructs, have proven their worth in speech recognition, part-of-speech tagging, transliteration, and many other applications. The string automaton framework provides efficient generic algorithms for composition, bidirectional application, k-best extraction, determinization, minimization, parameter tuning, etc. These algorithms have been packaged in software toolkits that form the core of many state-of-the-art systems.

Tree automata go further in permitting large-scale, syntactically-motivated re-ordering of subtrees. They were originally devised to help formalize Chomsky's linguistic theories, but their subsequent development was largely disconnected from NLP practice. In 2005, tree automata theorists and machine translation (MT) practitioners began working together to come up with a new kind of statistical MT system based on tree automata. This led to some of the best practical results in common evaluations of MT quality, and syntactic methods are now used in industrial MT systems. This work at the intersection of tree automata and NLP created vibrant new research directions for both areas.

Nowadays, *graphs* are becoming an even more general fundamental data structure in practical NLP. Classic feature structures can be seen as rooted, directed, edge- and leaf-labeled graphs. Recent work in dependency parsing produces graphs rather than trees. New work in deep semantic annotation organizes logical meanings into directed graph structures, and several efforts are now being made that in the near future will yield large amounts of linguistic data annotated with these representations. Formal
models of *graph transformation* are therefore of fundamental importance for the development of practical systems for these tasks. The situation is familiar: there exists a formal theory of graph transformation, but this theory is largely disconnected from research and practice in NLP.

The theory of graph transformation studies rule-based mechanisms for the manipulation of graphs. A particularly well-studied subject within the area of graph transformation, and one that has received quite some attention recently within the NLP community, are *context-free graph grammars*. These grammars have many nice properties in common with context-free phrase structure grammars, but are considerably more powerful and versatile; in particular, they can be used to generate context-sensitive string languages (when strings are represented as chain graphs). The price of this expressiveness is a higher computational complexity; in particular, there are context-free graph languages for which parsing is NP-complete. This has triggered research on specialized, more efficient algorithms for restricted classes of graphs. A well-known result in this area is that many in general intractable problems on graphs become solvable in polynomial time when restricted to graphs of bounded tree-width.

With the number of interesting applications and the amount of available data quickly increasing, there is a clear need for the NLP community to acquire knowledge about formal models of graph processing, as such models can greatly simplify practical systems, by providing a uniform knowledge representation and efficient, generic algorithms for inference. Unfortunately, most NLP researchers are unaware of the rich literature on graph transformation, and even those who are find it hard to connect it to their own work. Conversely, few researchers in graph transformation are aware of the new applications of their research within natural language processing, the characteristic properties of the available data, the specific desiderata of these applications, and the research problems that are posed by them.

The overall goal of the seminar was to bring the various research communities together to assess the state of the art, identify areas of common interest, and pave the way for future collaborations. We think that this goal was reached to a very high degree, which will be a major factor in the creation of a new interdisciplinary research community.

### Organization of the Seminar

The seminar was attended by 29 participants from 9 countries in North America, Europe, and Africa. It was held from March 15 to March 20, 2015. Since the intention was to foster the creation of a new research community, it was decided to organize the seminar in the form of a self-organized workshop with many informal discussion meetings on topics suggested by the participants themselves. For this, the seminar roughly followed the idea of Open Space Technology. This worked very well and gave rise to many insightful discussions. (See Section 5 for the list of topics discussed.)

**Summary text license**

Creative Commons BY 3.0 Unported license

Frank Drewes, Kevin Knight, and Marco Kuhlmann

## Classification

- Artificial Intelligence / Robotics
- Data Structures / Algorithms / Complexity

## Keywords

- Automata theory
- Graph transformation
- Natural language processing