07.04.15 - 10.04.15, Seminar 15152

Machine Learning with Interdependent and Non-identically Distributed Data

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.

Motivation

One of the most common assumptions in many machine learning and data analysis tasks is that given data points are realizations of independent and identically distributed (IID) random variables. However, this assumption is often violated, e.g., when training and test data come from different distributions (dataset bias or domain shift) or the data points are highly interdependent (e.g., when the data exhibits temporal or spatial correlations).
In general, there are three major reasons why the assumption of independent and identically distributed data can be violated:

  1. The draw of a data point influences the outcome of a subsequent draw (inter-dependencies).
  2. The distribution changes at some point (non­stationarity).
  3. The data is not generated by a distribution at all (adversarial).

The seminar will deal with (1) and (2) related to several subfields of machine learning, which we would like to analyze and reconcile: transfer and multi-task learning, learning with interdependent data, and two application fields, that is, visual recognition and computational biology. Both application areas are not only two of the main application areas for machine learning algorithms in general, but their recognition tasks are often characterized by multiple related learning problems that require transfer and multitask learning approaches. For instance, computer vision models can be learned from object-centric internet resources, but are often rather applied to real­world scenes. In computational biology and personalized medicine, training data may be recorded at a particular hospital, but the model is applied to make predictions on data from different hospitals, where patients exhibit a different population structure.

Discussing, presenting, and exploring new machine learning methods that can deal with non-i.i.d. data as well as new application scenarios are the goals of this seminar. The main topics will be:

  1. transfer learning
  2. multi-task learning
  3. learning with inter-dependent data
  4. visual transfer and adaptation
  5. application scenarios in computational biology

The main goals of the seminar are to define the current state of the art of learning in non-i.i.d. scenarios, categorize the underlying assumptions of existing solutions, and finally advance the field by directly pointing out current limitations, important research directions, and future application areas.

Bringing together researchers from the fields of machine learning, computer vision, and computational biology will be a unique opportunity and is the key to accomplish aforementioned goals and milestones.