April 7 – 10 , 2015, Dagstuhl Seminar 15152

Machine Learning with Interdependent and Non-identically Distributed Data


Trevor Darrell (University of California – Berkeley, US)
Marius Kloft (HU Berlin, DE)
Massimiliano Pontil (University College London, GB)
Gunnar Rätsch (Memorial Sloan-Kettering Cancer Center – New York, US)


Erik Rodner (Universität Jena, DE)

For support, please contact

Dagstuhl Service Team


Dagstuhl Report, Volume 5, Issue 4 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents
Dagstuhl Seminar Schedule [pdf]


The seminar broadly dealt with machine learning, the area of computer science that concerns developing computational methods using data to make accurate predictions. The classical machine learning theory is built upon the assumption of independent and identically distributed random variables. In practical applications, however, this assumption is often violated, for instance, when training and test data come from different distributions (dataset bias or domain shift) or when the data exhibits temporal or spatial correlations. In general, there are three major reasons why the assumption of independent and identically distributed data can be violated:

  1. The draw of a data point influences the outcome of a subsequent draw (inter-dependencies).
  2. The distribution changes at some point (non-stationarity).
  3. The data is not generated by a distribution at all (adversarial).

The seminar focused on the scenarios (a) and (b). This general research direction comprises several subfields of machine learning: transfer and multi-task learning, learning with interdependent data, and two application fields, that is, visual recognition and computational biology. Both application areas are not only two of the main application areas for machine learning algorithms in general, but their recognition tasks are often characterized by multiple related learning problems that require transfer and multitask learning approaches. For example, in visual recognition tasks, object categories are often visually related or hierarchically organized, and tasks in computational biology are often characterized by different but related organisms and phenotypes. The problems and techniques discussed during the seminar are also important for other more general application areas, such as scientific data analysis or data-oriented decision making.

Results of the Seminar and Topics Discussed

In the following, the important research fields related to the seminar topic are introduced and we also give a short list of corresponding research questions discussed at the seminar. In contrast to other workshops and seminars often associated with larger conferences, the aim of the Dagstuhl seminar was to reflect on open issues in each of the individual research areas.

Foundations of Transfer Learning

Transfer Learning (TL) [2, 18] refers to the problem of retaining and applying the knowledge available for one or more source tasks, in order to efficiently develop an hypothesis for a new target task. Each task may contain common (domain adaptation [25, 10]) or different label sets (across category transfer). Most of the effort has been devoted to binary classification [23], while interesting practical transfer problems are often intrinsically multi-class and the number of classes can increase in time [17, 22]. Accordingly the following research questions arise:

  • How to formalize knowledge transfer across multi-class tasks and provide theoretical guarantees on this setting?
  • Moreover, can inter-class transfer and incremental class learning be properly integrated?
  • Can learning guarantees be provided when the adaptation relies only on pre-trained source hypotheses without explicit access to the source samples, as it is often the case in real world scenarios?

Foundations of Multi-task Learning

Learning over multiple related tasks can outperform learning each task in isolation. This is the principal assertion of Multi-task learning (MTL) [3, 7, 1] and implies that the learning process may benefit from common information shared across the tasks. In the simplest case, the transfer process is symmetric and all the tasks are considered as equally related and appropriate for joint training. Open questions in this area are:

  • What happens when the condition of equally related tasks does not hold, e.g., how to avoid negative transfer?
  • Moreover, can non-parametric statistics [27] be adequately integrated into the learning process to estimate and compare the distributions underlying the multiple tasks in order to learn the task similarity measure?
  • Can recent semi-automatic methods, like deep learning [9] or multiple kernel learning [13, 12, 11, 4], help to get a step closer towards the complete automatization of multi-task learning, e.g., by learning the task similarity measure?
  • How can insights and views of researcher be shared across domains (e.g., regarding the notation of source task selection in reinforcement learning)?

Foundations of Learning with Inter-dependent Data

Dependent data arises whenever there are inherent correlations in between observations. For example, this is to be expected for time series, where we would intuitively expect that instances with similar time stamps have stronger dependencies than ones that are far away in time. Another domain where dependent data occurs are spatially-indexed sequences, such as windows taken from DNA sequences. Most of the body of work on machine learning theory is on learning with i.i.d. data. Even the few analyses (e.g., [28]) allowing for "slight" violations of the assumption (mixing processes) analyze the same algorithms as in the i.i.d. case, while it should be clear that also novel algorithms are needed to most effectively adapt to rich dependency structures in the data. The following aspects have been discussed during the seminar:

  • Can we develop algorithms that exploit rich dependency structures in the data?
  • Do such algorithms enjoy theoretical generalization guarantees?
  • Can such algorithms be phrased in a general framework in order to jointly analyze them?
  • How can we appropriately measure the degree of inter-dependencies (theoretically) such that it can be also empirically estimated from data (overcoming the so-called mixing assumption)?
  • Can theoretical bounds be obtained for more practical dependency measures than mixing?

Visual Transfer and Adaptation

Visual recognition tasks are one of the main applications for knowledge transfer and adaptation techniques. For instance, transfer learning can put to good use in the presence of visual categories with only a few number of labels, while across category transfer can help to exploit training data available for related categories to improve the recognition performance [14, 21, 20, 22]. Multi-task learning can be applied for learning multiple object detectors [30] or binary image classifiers [19] jointly, which is beneficial because visual features can be shared among categories and tasks. Another important topic is domain adaptation, which is very effective in object recognition applications [24], where the image distribution used for training (source domain) is different from the image distribution encountered during testing (target domain). This distribution shift is typically caused by a data collection bias. Sophisticated methods are needed as in general the visual domains can differ in a combination of (often unknown) factors including scene, object location and pose, viewing angle, resolution, motion blur, scene illumination, background clutter, camera characteristics, etc. Recent studies have demonstrated a significant degradation in the performance of state-of-the-art image classifiers due to domain shift from pose changes [8], a shift from commercial to consumer video [5, 6, 10], and, more generally, training datasets biased by the way in which they were collected [29].

The following open questions have been discussed during the seminar:

  • Which types of representations are suitable for transfer learning?
  • How can we extend and update representations to avoid negative transfer?
  • Are current adaptation and transfer learning methods efficient enough to allow for large-scale continuous visual learning and recognition?
  • How can we exploit huge amounts of unlabeled data with certain dependencies to minimize supervision during learning and adaptation?
  • Are deep learning methods already compensating for common domain changes in visual recognition applications?

Application Scenarios in Computational Biology

Non-i.i.d. data arises in biology, e.g., when transferring information from one organism to another or when learning from multiple organisms simultaneously [31]. A scenario where dependent data occurs is when extracting local features from genomic DNA by running a sliding window over a DNA sequence, which is a common approach to detect transcription start sites (TSS) [26]. Windows close by on the DNA strand – or even overlapping – show stronger dependencies than those far away. Another application scenario comes from statistical genetics. Many efforts in recent years focused on models to correct for population structure [16], which can arise from inter dependencies in the population under investigation. Correcting for such rich dependency structures is also a challenge in prediction problems in machine learning [15]. The seminar brought ideas together from the different fields of machine learning, statistical genetics, Bayesian probabilistic modeling, and frequentist statistics. In particular, we discussed the following open research questions:

  • How can we empirically measure the degree of inter-dependencies, e.g., from a kinship matrix of patients?
  • Do theoretical guarantees of algorithms (see above) break down for realistic values of “the degree of dependency”?
  • What are effective prediction and learning algorithms correcting for population structure and inter-dependencies in general and can they be phrased in a general framework?
  • What are adequate benchmarks to evaluate learning with non-i.i.d. data?
  • How can information be transferred between organisms, taking into account the varying noise level and experimental conditions from which data are derived?
  • How can non-stationarity be exploited in biological applications?
  • What are promising applications of non-i.i.d. learning in the domains of bioinformatics and personalized medicine?


The idea of the seminar bringing together people from theory, algorithms, computer vision, and computational biology, was very successful, since many discussions and joint research questions came up that have not been anticipated in the beginning. These aspects were not completely limited to non-i.i.d. learning and also touched ubiquitous topics like learning with deeper architectures. It was the agreement of all participants that the seminar should be the beginning of an ongoing series of longer Dagstuhl seminars focused on non-i.i.d. learning.


  1. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
  2. Jonathan Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149–198, 2000.
  3. Rich Caruana. Multitask learning. Machine Learning, 28:41–75, July 1997.
  4. C. Cortes, Marius Kloft, and M. Mohri. Learning kernels using local rademacher complexity. In Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013. in press.
  5. L. Duan, I. W. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  6. L. Duan, D. Xu, I. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
  7. Theodoros Evgeniou and Massimiliano Pontil. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 109–117. ACM, 2004.
  8. Ali Farhadi and Mostafa Kamali Tabrizi. Learning to recognize activities from the wrong view point. In Proceedings of the European Conference on Computer Vision (ECCV), 2008.
  9. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, July 2006.
  10. Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, and Kate Saenko. Efficient learning of domain-invariant image representations. In International Conference on Learning Representations (ICLR), 2013.
  11. Marius Kloft and Gilles Blanchard. On the convergence rate of ell_p-norm multiple kernel learning. Journal of Machine Learning Research, 13:2465–2502, Aug 2012.
  12. Marius Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Lp-norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997, Mar 2011.
  13. G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. I. Jordan. Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5:27–72, 2004.
  14. Fei-Fei Li, Rob Fergus, and Pietro Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
  15. Limin Li, Barbara Rakitsch, and Karsten M. Borgwardt. ccsvm: correcting support vector machines for confounding factors in biological data classification. Bioinformatics [ISMB/ECCB], 27(13):342–348, 2011.
  16. Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. FaST linear mixed models for genome-wide association studies. Nat Meth, 8(10):833–835, October 2011.
  17. Jie Luo, Tatiana Tommasi, and Barbara Caputo. Multiclass transfer learning from unconstrained priors. In Proceedings of the International Conference on Computer Vision (ICCV), pages 1863–1870, 2011.
  18. Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
  19. Ariadna Quattoni, Michael Collins, and Trevor Darrell. Transfer learning for image classification with sparse prototype representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2008.
  20. Erik Rodner and Joachim Denzler. Learning with few examples by transferring feature relevance. In Proceedings of the 31st Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 252–261, 2009.
  21. Erik Rodner and Joachim Denzler. One-shot learning of object categories using dependent gaussian processes. In Proceedings of the 32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 232–241, 2010.
  22. Erik Rodner and Joachim Denzler. Learning with few examples for binary and multiclass classification using regularization of randomized trees. Pattern Recognition Letters, 32(2):244–251, 2011.
  23. Ulrich Rückert and Marius Kloft. Transfer learning with adaptive regularizers. In ECML/PKDD (3), pages 65–80, 2011.
  24. Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In European Conference on Computer Vision (ECCV), pages 213–226, 2010.
  25. Gabriele Schweikert, Christian Widmer, Bernhard Schölkopf, and Gunnar Rätsch. An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in Neural Information Processing Systems 21, pages 1433–1440, 2009.
  26. S. Sonnenburg, A. Zien, and G. Rätsch. Arts: Accurate recognition of transcription starts in human.Bioinformatics, 22(14):e472–e480, 2006.
  27. Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010.
  28. Ingo Steinwart, Don R. Hush, and Clint Scovel. Learning from dependent observations. J. Multivariate Analysis, 100(1):175–194, 2009.
  29. Antonio Torralba and Alyosha Efros. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
  30. Antonio Torralba, Kevin P Murphy, and William T Freeman. Sharing features: efficient boosting procedures for multiclass object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages II–762. IEEE, 2004.
  31. C. Widmer, M. Kloft, and G. Rätsch. Multi-task learning for computational biology: Overview and outlook. In B. Schoelkopf, Z. Luo, and V. Vovk, editors, Empirical Inference – Festschrift in Honor of Vladimir N. Vapnik, 2013.
  Creative Commons BY 3.0 Unported license
  Trevor Darrell and Marius Kloft and Massimiliano Pontil and Gunnar Rätsch and Erik Rodner

Related Dagstuhl Seminar


  • Artificial Intelligence / Robotics
  • Bioinformatics
  • Computer Graphics / Computer Vision


  • Machine learning
  • Computer vision
  • Computational biology
  • Domain adaptation
  • Multitask learning
  • Transfer learning
  • Dataset selection bias
  • Dependent and non-i.i.d. data

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support