10.05.15 - 13.05.15, Seminar 15201

Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.

Motivation

Various dimensions of content, namely channels (TV, news, social media), modalities (text, audio, video) and languages (often the native language plus English) are increasingly combined. A single user often consumes content from different dimensions in parallel (e.g., tweeting while watching TV). This new media consumption behavior demands new approaches to content search and selection. Nowadays, search is mostly reduced to keyword search for text documents in the same language. Selection of related content is either based on collaborative filtering or is also restricted to the textual modality of the media item. This way, the semantic similarity across the different content dimensions (channels, modalities, languages) cannot be assessed.

Despite of considerable progress in different research areas like natural language processing or computer vision, cross-lingual or cross-media content retrieval has remained an unsolved research issue. This results from the difficulty to define and link to a joint space from such heterogeneous data representation. This seminar is intended to discuss recent progress in under-connected research areas, all of which tackle the problem of diversity of content, like multiple language, multiple modalities or social vs. mainstream media. The focus will be on finding general representation and computing approaches which can be adapted to bridge any dimension instead of finding the isolated solution for every source of content variation. Two popular classes of approaches can be identified that try to bridge the gap between modalities or languages:

  1. Through linking to a shared conceptual space (like WordNet or DBpedia): Here, the problem becomes one of semantic annotation, entity linking or object recognition, where the task is to detect shared entities, concepts or events expressed in any channel, media type or language. To establish this, state-of-the-art approaches learn an intermediate latent representation of the media item, which is then used for the actual classification task.
  2. Through aligned media collections (like parallel text corpora): Again, the cross-media cross-lingual relatedness of content is inferred based on a mapping to a learned joint latent space.

Thus, various learned latent representations are used in the process but there is a need for a joint effort across research disciplines to relate the latent representations and to use them to enable content linking across languages and media. With this seminar, we build the basis by:

  • preparing a white paper, which surveys the most pressing research questions and tasks, establishes the state of the art in related research areas and identifies common ground and potential for synergies.
  • planning a series of follow-up workshops, based on the white paper, with open challenges dedicated to establishing a standard test bed and benchmark for cross-lingual cross-media content linking.