Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Within this website:
External resources:
Within this website:
External resources:
  • the dblp Computer Science Bibliography

Dagstuhl Seminar 15201

Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations

( May 10 – May 13, 2015 )

(Click in the middle of the image to enlarge)

Please use the following short url to reference this page:





Various dimensions of content, namely channels (TV, news, social media), modalities (text, audio, video) and languages (often the native language plus English) are increasingly combined. A single user often consumes content from different dimensions in parallel (e.g., tweeting while watching TV). This new media consumption behavior demands new approaches to content search and selection. Nowadays, search is mostly reduced to keyword search for text documents in the same language. Selection of related content is either based on collaborative filtering or is also restricted to the textual modality of the media item. This way, the semantic similarity across the different content dimensions (channels, modalities, languages) cannot be assessed.

Despite of considerable progress in different research areas like natural language processing or computer vision, cross-lingual or cross-media content retrieval has remained an unsolved research issue. This results from the difficulty to define and link to a joint space from such heterogeneous data representation. This seminar is intended to discuss recent progress in under-connected research areas, all of which tackle the problem of diversity of content, like multiple language, multiple modalities or social vs. mainstream media. The focus will be on finding general representation and computing approaches which can be adapted to bridge any dimension instead of finding the isolated solution for every source of content variation. Two popular classes of approaches can be identified that try to bridge the gap between modalities or languages:

  1. Through linking to a shared conceptual space (like WordNet or DBpedia): Here, the problem becomes one of semantic annotation, entity linking or object recognition, where the task is to detect shared entities, concepts or events expressed in any channel, media type or language. To establish this, state-of-the-art approaches learn an intermediate latent representation of the media item, which is then used for the actual classification task.
  2. Through aligned media collections (like parallel text corpora): Again, the cross-media cross-lingual relatedness of content is inferred based on a mapping to a learned joint latent space.

Thus, various learned latent representations are used in the process but there is a need for a joint effort across research disciplines to relate the latent representations and to use them to enable content linking across languages and media. With this seminar, we build the basis by:

  • preparing a white paper, which surveys the most pressing research questions and tasks, establishes the state of the art in related research areas and identifies common ground and potential for synergies.
  • planning a series of follow-up workshops, based on the white paper, with open challenges dedicated to establishing a standard test bed and benchmark for cross-lingual cross-media content linking.


Different types of content belonging to multiple modalities (text, audio, video) and languages are generated from various sources. These sources either broadcast information on channels like TV and News or allow collaboration in social media forums. Often multiple sources are consumed in parallel. For example, users watching TV tweeting their opinions about a show. This kind of consumption throw new challenges and require innovation in the approaches to enhance content search and recommendations.

Currently, most of search and content based recommendations are limited to monolingual text. To find semantic similar content across different languages and modalities, considerable research contributions are required from various computer science communities working on natural language processing, computer vision and knowledge representation. Despite success in individual research areas, cross-lingual or cross-media content retrieval has remained an unsolved research issue.

To tackle this research challenge, a common platform is provided in this seminar for researchers working on different disciplines to collaborate and identify approaches to find similar content across languages and modalities. After the group discussions between seminar participants, two possible solutions are taken into consideration:

  1. Building a joint space from heterogeneous data generated from different modalities to generate missing or to retrieve modalities. This is achieved through aligned media collections (like parallel text corpora). Now to find cross-media cross-lingual relatedness of the content mapped to a joint latent space, similarity measures can be used.
  2. Another way is to build a shared conceptual space using knowledge bases(KB) like DBpedia etc for semantic annotation of concepts or events shared across modalities and languages. Entities are expressed in any channel, media type or language cam be mapped to a concept space in KB. Identifying a commonality between annotations can be used to find cross-media cross-lingual relatedness.

Thus, implementing these solutions require a joint effort across research disciplines to relate the representations and to use them for linking languages and modalities. This seminar also aimed to build datasets that can be used as standard test bed and benchmark for cross-lingual cross-media content linking. Also, seminar was very well received by all participants. There was a common agreement that the areas of text, vision and knowledge graph should work more closely together and that each discipline would benefit from the other. The participants agreed to continue to work on two cross-modal challenges and discuss progress and future steps in a follow-up meeting in September at Berlin.

Copyright Alexander G. Hauptmann and James Hodson and Juanzi Li and Nicu Sebe and Achim Rettinger

  • Xavier Carreras (Xerox Research Centre Europe - Grenoble, FR) [dblp]
  • Dubravko Culibrk (University of Trento, IT) [dblp]
  • John Davies (BT Research - Ipswich, GB) [dblp]
  • Tiansi Dong (Universität Bonn, DE) [dblp]
  • Anastassia Fedyk (Harvard University, US) [dblp]
  • Blaz Fortuna (Ghent University, BE) [dblp]
  • Marko Grobelnik (Jozef Stefan Institute - Ljubljana, SI) [dblp]
  • Alexander G. Hauptmann (Carnegie Mellon University, US) [dblp]
  • James Hodson (Bloomberg - New York, US) [dblp]
  • Estevam R. Hruschka (University of Sao Carlos, BR) [dblp]
  • Bea Knecht (Zattoo - Zürich, CH)
  • Juanzi Li (Tsinghua University - Beijing, CN) [dblp]
  • Dunja Mladenic (Jozef Stefan Institute - Ljubljana, SI) [dblp]
  • Aditya Mogadala (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Chong-Wah Ngo (City University - Hong Kong, HK) [dblp]
  • Blaz Novak (Jozef Stefan Institute - Ljubljana, SI) [dblp]
  • Stefano Pacifico (Bloomberg - New York, US) [dblp]
  • Achim Rettinger (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Evan Sandhaus (The New York Times, US) [dblp]
  • Nicu Sebe (University of Trento, IT) [dblp]
  • Alan Smeaton (Dublin City University, IE) [dblp]
  • Rudi Studer (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Jie Tang (Tsinghua University - Beijing, CN) [dblp]
  • Andreas Thalhammer (KIT - Karlsruher Institut für Technologie, DE) [dblp]
  • Eduardo Torres Schumann (VICO - Leinfelden-Echterdingen, DE) [dblp]
  • Volker Tresp (Siemens AG - München, DE) [dblp]
  • Christina Unger (Universität Bielefeld, DE) [dblp]
  • Michael Witbrock (Cycorp - Austin, US) [dblp]
  • Lexing Xie (Australian National University - Canberra, AU) [dblp]
  • Lei Zhang (KIT - Karlsruher Institut für Technologie, DE) [dblp]

  • artificial intelligence / robotics
  • data bases / information retrieval
  • multimedia

  • cross-lingual
  • cross-media
  • cross-modal
  • natural language processing
  • computer vision
  • multimedia
  • knowledge representation
  • machine learning
  • information extraction
  • information retrieval