TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 17042

From Characters to Understanding Natural Language (C2NLU): Robust End-to-End Deep Learning for NLP

( Jan 22 – Jan 27, 2017 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/17042

Organizers

Contact


Motivation

Deep learning is currently one of most active areas of research in machine learning and its applications, including natural language processing (NLP). One hallmark of deep learning is end-to-end learning: all parameters of a deep learning model are optimized directly on the learning objective; e.g., on the objective of accuracy on the binary classification task: is the input image the image of a cat? Crucially, the set of parameters that are optimized includes "first-layer" parameters that connect the raw input representation (e.g., pixels) to the first layer of internal representations of the network (e.g., edge detectors). In contrast, many other machine learning models employ hand-engineered features to take the role of these first-layer parameters.

Even though deep learning has had a number of successes in NLP, research on true end-to-end learning is just beginning to emerge. Most NLP deep learning models still start with a hand-engineered layer of representation, the level of tokens or words, i.e., the input is broken up into units by manually designed tokenization rules. Such rules often fail to capture structure both within tokens (e.g., morphology) and across multiple tokens (e.g., multi-word expressions).

Another problem of token-based end-to-end systems is that they currently have no principled and general way to generate tokens that are not part of the training vocabulary. Since a token is represented as a vocabulary index and parameters governing system behavior affecting this token are referring to this vocabulary index, a token that does not have a vocabulary index cannot easily be generated in end-to-end systems. In contrast, character-based end-to-end systems can generate new vocabulary items, so that -- at least in theory -- they do not have an out-of-vocabulary problem.

Character-based processing is also interesting from a theoretical point of view for linguistics and computational linguistics. We generally assume that the relationship between signifiers (tokens) and the signified (meaning) is arbitrary. There are well-known cases of non-arbitrariness, including onomatopoeia and regularities in names (female vs male first names), but these are usually considered to be exceptions. Character-based approaches can deal much better with such non-arbitrariness than token-based approaches. Thus, if non-arbitrariness is more pervasive than generally assumed, then character-based approaches would have an additional advantage.

Given the success of end-to-end learning in other domains, it is likely that it will also be widely used in NLP to alleviate these issues and lead to great advances. This workshop will bring together an interdisciplinary group of researchers from deep learning, machine learning and computational linguistics to develop a research agenda for end-to-end deep learning applied to natural language.

Copyright Hinrich Schütze

Summary

Deep learning is currently one of most active areas of research in machine learning and its applications, including natural language processing (NLP). One hallmark of deep learning is end-to-end learning: all parameters of a deep learning model are optimized directly for the learning objective; e.g., for the objective of accuracy on the binary classification task: is the input image the image of a cat? Crucially, the set of parameters that are optimized includes "first-layer" parameters that connect the raw input representation (e.g., pixels) to the first layer of internal representations of the network (e.g., edge detectors). In contrast, many other machine learning models employ hand-engineered features to take the role of these first-layer parameters.

Even though deep learning has had a number of successes in NLP, research on true end-to-end learning is just beginning to emerge. Most NLP deep learning models still start with a hand-engineered layer of representation, the level of tokens or words, i.e., the input is broken up into units by manually designed tokenization rules. Such rules often fail to capture structure both within tokens (e.g., morphology) and across multiple tokens (e.g., multi-word expressions). Given the success of end-to-end learning in other domains, it is likely that it will also be widely used in NLP to alleviate these issues and lead to great advances.

The seminar brought together researchers from deep learning, general machine learning, natural language processing and computational linguistics to develop a research agenda for the coming years. The goal was to combine recent advances in deep learning architectures and algorithms with extensive domain knowledge about language to make true end-to-end learning for NLP possible.

Our goals were to make progress on answering the following research questions.

  • C2NLU approaches so far fall short of the state of the art in cases where token structures can easily be exploited (e.g., in well-edited newspaper text) compared to word-level approaches. What are promising avenues for developing C2NLU to match the state of the art even in these cases of text with well-defined token structures?
  • Character-level models are computationally more expensive than word-level models because detecting syntactic and semantic relationships at the character-level is more expensive (even though it is potentially more robust) than at the word-level. How can we address the resulting challenges in scalability for character-level models?
  • Part of the mantra of deep learning is that domain expertise is no longer necessary. Is this really true or is knowledge about the fundamental properties of language necessary for C2NLU? Even if that expertise is not needed for feature engineering, is it needed to design model architectures, tasks and training regimes?
  • NLP tasks are diverse, ranging from part-of-speech tagging over sentiment analysis to question answering. For which of these problems is C2NLU a promising approach, for which not?
  • More generally, what characteristics make an NLP problem amenable to be addressed using tokenization-based approaches vs. C2NLU approaches?
  • What specifically can each of the two communities involved - natural language processing and deep learning - contribute to C2NLU?
  • Create an NLP/deep learning roadmap for research in C2NLU over the next 5--10 years.
Copyright Phil Blunsom, Kyunghyun Cho, Chris Dyer, and Hinrich Schütze

Participants
  • Heike Adel (LMU München, DE) [dblp]
  • Parnia Bahar (RWTH Aachen, DE) [dblp]
  • Phil Blunsom (University of Oxford, GB) [dblp]
  • Ondrej Bojar (Charles University - Prague, CZ) [dblp]
  • Fabienne Cap (Uppsala University, SE) [dblp]
  • Ryan Cotterell (Johns Hopkins University - Baltimore, US) [dblp]
  • Vera Demberg (Universität des Saarlandes, DE) [dblp]
  • Kevin Duh (Johns Hopkins University - Baltimore, US) [dblp]
  • Chris Dyer (Carnegie Mellon University - Pittsburgh, US) [dblp]
  • Desmond Elliott (University of Amsterdam, NL) [dblp]
  • Manaal Faruqui (Carnegie Mellon University - Pittsburgh, US) [dblp]
  • Orhan Firat (Middle East Technical University - Ankara, TR) [dblp]
  • Alexander M. Fraser (LMU München, DE) [dblp]
  • Vladimir Golkov (TU München, DE) [dblp]
  • Jan Hajic (Charles University - Prague, CZ) [dblp]
  • Georg Heigold (DFKI - Kaiserslautern, DE) [dblp]
  • Karl Moritz Hermann (Google DeepMind - London, GB) [dblp]
  • Thomas Hofmann (ETH Zürich, CH) [dblp]
  • Hang Li (Huawei Technologies - Hong Kong, HK) [dblp]
  • Adam Lopez (University of Edinburgh, GB) [dblp]
  • Marie-Francine Moens (KU Leuven, BE) [dblp]
  • Hermann Ney (RWTH Aachen, DE) [dblp]
  • Jan Niehues (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Laura Rimell (University of Cambridge, GB) [dblp]
  • Helmut Schmid (LMU München, DE) [dblp]
  • Martin Schmitt (LMU München, DE) [dblp]
  • Hinrich Schütze (LMU München, DE) [dblp]
  • Kristina Toutanova (Google - Seattle, US) [dblp]
  • Thang Vu (Universität Stuttgart, DE) [dblp]
  • Yadollah Yaghoobzadeh (LMU München, DE) [dblp]
  • Francois Yvon (LIMSI - Orsay, FR) [dblp]

Classification
  • artificial intelligence / robotics

Keywords
  • Natural language processing
  • computational linguistics
  • deep learning
  • robustness in learning
  • end-to-end learning
  • machine learning