Dagstuhl Seminar 15152
Machine Learning with Interdependent and Non-identically Distributed Data
( Apr 07 – Apr 10, 2015 )
- Trevor Darrell (University of California - Berkeley, US)
- Marius Kloft (HU Berlin, DE)
- Massimiliano Pontil (University College London, GB)
- Gunnar Rätsch (Memorial Sloan-Kettering Cancer Center - New York, US)
- Erik Rodner (Universität Jena, DE)
- Dagmar Glaser (for administrative matters)
One of the most common assumptions in many machine learning and data analysis tasks is that given data points are realizations of independent and identically distributed (IID) random variables. However, this assumption is often violated, e.g., when training and test data come from different distributions (dataset bias or domain shift) or the data points are highly interdependent (e.g., when the data exhibits temporal or spatial correlations).
In general, there are three major reasons why the assumption of independent and identically distributed data can be violated:
- The draw of a data point influences the outcome of a subsequent draw (inter-dependencies).
- The distribution changes at some point (nonstationarity).
- The data is not generated by a distribution at all (adversarial).
The seminar will deal with (1) and (2) related to several subfields of machine learning, which we would like to analyze and reconcile: transfer and multi-task learning, learning with interdependent data, and two application fields, that is, visual recognition and computational biology. Both application areas are not only two of the main application areas for machine learning algorithms in general, but their recognition tasks are often characterized by multiple related learning problems that require transfer and multitask learning approaches. For instance, computer vision models can be learned from object-centric internet resources, but are often rather applied to realworld scenes. In computational biology and personalized medicine, training data may be recorded at a particular hospital, but the model is applied to make predictions on data from different hospitals, where patients exhibit a different population structure.
Discussing, presenting, and exploring new machine learning methods that can deal with non-i.i.d. data as well as new application scenarios are the goals of this seminar. The main topics will be:
- transfer learning
- multi-task learning
- learning with inter-dependent data
- visual transfer and adaptation
- application scenarios in computational biology
The main goals of the seminar are to define the current state of the art of learning in non-i.i.d. scenarios, categorize the underlying assumptions of existing solutions, and finally advance the field by directly pointing out current limitations, important research directions, and future application areas.
Bringing together researchers from the fields of machine learning, computer vision, and computational biology will be a unique opportunity and is the key to accomplish aforementioned goals and milestones.
The seminar broadly dealt with machine learning, the area of computer science that concerns developing computational methods using data to make accurate predictions. The classical machine learning theory is built upon the assumption of independent and identically distributed random variables. In practical applications, however, this assumption is often violated, for instance, when training and test data come from different distributions (dataset bias or domain shift) or when the data exhibits temporal or spatial correlations. In general, there are three major reasons why the assumption of independent and identically distributed data can be violated:
- The draw of a data point influences the outcome of a subsequent draw (inter-dependencies).
- The distribution changes at some point (non-stationarity).
- The data is not generated by a distribution at all (adversarial).
The seminar focused on the scenarios (a) and (b). This general research direction comprises several subfields of machine learning: transfer and multi-task learning, learning with interdependent data, and two application fields, that is, visual recognition and computational biology. Both application areas are not only two of the main application areas for machine learning algorithms in general, but their recognition tasks are often characterized by multiple related learning problems that require transfer and multitask learning approaches. For example, in visual recognition tasks, object categories are often visually related or hierarchically organized, and tasks in computational biology are often characterized by different but related organisms and phenotypes. The problems and techniques discussed during the seminar are also important for other more general application areas, such as scientific data analysis or data-oriented decision making.
Results of the Seminar and Topics Discussed
In the following, the important research fields related to the seminar topic are introduced and we also give a short list of corresponding research questions discussed at the seminar. In contrast to other workshops and seminars often associated with larger conferences, the aim of the Dagstuhl seminar was to reflect on open issues in each of the individual research areas.
Foundations of Transfer Learning
Transfer Learning (TL) [2, 18] refers to the problem of retaining and applying the knowledge available for one or more source tasks, in order to efficiently develop an hypothesis for a new target task. Each task may contain common (domain adaptation [25, 10]) or different label sets (across category transfer). Most of the effort has been devoted to binary classification , while interesting practical transfer problems are often intrinsically multi-class and the number of classes can increase in time [17, 22]. Accordingly the following research questions arise:
- How to formalize knowledge transfer across multi-class tasks and provide theoretical guarantees on this setting?
- Moreover, can inter-class transfer and incremental class learning be properly integrated?
- Can learning guarantees be provided when the adaptation relies only on pre-trained source hypotheses without explicit access to the source samples, as it is often the case in real world scenarios?
Foundations of Multi-task Learning
Learning over multiple related tasks can outperform learning each task in isolation. This is the principal assertion of Multi-task learning (MTL) [3, 7, 1] and implies that the learning process may benefit from common information shared across the tasks. In the simplest case, the transfer process is symmetric and all the tasks are considered as equally related and appropriate for joint training. Open questions in this area are:
- What happens when the condition of equally related tasks does not hold, e.g., how to avoid negative transfer?
- Moreover, can non-parametric statistics  be adequately integrated into the learning process to estimate and compare the distributions underlying the multiple tasks in order to learn the task similarity measure?
- Can recent semi-automatic methods, like deep learning  or multiple kernel learning [13, 12, 11, 4], help to get a step closer towards the complete automatization of multi-task learning, e.g., by learning the task similarity measure?
- How can insights and views of researcher be shared across domains (e.g., regarding the notation of source task selection in reinforcement learning)?
Foundations of Learning with Inter-dependent Data
Dependent data arises whenever there are inherent correlations in between observations. For example, this is to be expected for time series, where we would intuitively expect that instances with similar time stamps have stronger dependencies than ones that are far away in time. Another domain where dependent data occurs are spatially-indexed sequences, such as windows taken from DNA sequences. Most of the body of work on machine learning theory is on learning with i.i.d. data. Even the few analyses (e.g., ) allowing for "slight" violations of the assumption (mixing processes) analyze the same algorithms as in the i.i.d. case, while it should be clear that also novel algorithms are needed to most effectively adapt to rich dependency structures in the data. The following aspects have been discussed during the seminar:
- Can we develop algorithms that exploit rich dependency structures in the data?
- Do such algorithms enjoy theoretical generalization guarantees?
- Can such algorithms be phrased in a general framework in order to jointly analyze them?
- How can we appropriately measure the degree of inter-dependencies (theoretically) such that it can be also empirically estimated from data (overcoming the so-called mixing assumption)?
- Can theoretical bounds be obtained for more practical dependency measures than mixing?
Visual Transfer and Adaptation
Visual recognition tasks are one of the main applications for knowledge transfer and adaptation techniques. For instance, transfer learning can put to good use in the presence of visual categories with only a few number of labels, while across category transfer can help to exploit training data available for related categories to improve the recognition performance [14, 21, 20, 22]. Multi-task learning can be applied for learning multiple object detectors  or binary image classifiers  jointly, which is beneficial because visual features can be shared among categories and tasks. Another important topic is domain adaptation, which is very effective in object recognition applications , where the image distribution used for training (source domain) is different from the image distribution encountered during testing (target domain). This distribution shift is typically caused by a data collection bias. Sophisticated methods are needed as in general the visual domains can differ in a combination of (often unknown) factors including scene, object location and pose, viewing angle, resolution, motion blur, scene illumination, background clutter, camera characteristics, etc. Recent studies have demonstrated a significant degradation in the performance of state-of-the-art image classifiers due to domain shift from pose changes , a shift from commercial to consumer video [5, 6, 10], and, more generally, training datasets biased by the way in which they were collected .
The following open questions have been discussed during the seminar:
- Which types of representations are suitable for transfer learning?
- How can we extend and update representations to avoid negative transfer?
- Are current adaptation and transfer learning methods efficient enough to allow for large-scale continuous visual learning and recognition?
- How can we exploit huge amounts of unlabeled data with certain dependencies to minimize supervision during learning and adaptation?
- Are deep learning methods already compensating for common domain changes in visual recognition applications?
Application Scenarios in Computational Biology
Non-i.i.d. data arises in biology, e.g., when transferring information from one organism to another or when learning from multiple organisms simultaneously . A scenario where dependent data occurs is when extracting local features from genomic DNA by running a sliding window over a DNA sequence, which is a common approach to detect transcription start sites (TSS) . Windows close by on the DNA strand – or even overlapping – show stronger dependencies than those far away. Another application scenario comes from statistical genetics. Many efforts in recent years focused on models to correct for population structure , which can arise from inter dependencies in the population under investigation. Correcting for such rich dependency structures is also a challenge in prediction problems in machine learning . The seminar brought ideas together from the different fields of machine learning, statistical genetics, Bayesian probabilistic modeling, and frequentist statistics. In particular, we discussed the following open research questions:
- How can we empirically measure the degree of inter-dependencies, e.g., from a kinship matrix of patients?
- Do theoretical guarantees of algorithms (see above) break down for realistic values of “the degree of dependency”?
- What are effective prediction and learning algorithms correcting for population structure and inter-dependencies in general and can they be phrased in a general framework?
- What are adequate benchmarks to evaluate learning with non-i.i.d. data?
- How can information be transferred between organisms, taking into account the varying noise level and experimental conditions from which data are derived?
- How can non-stationarity be exploited in biological applications?
- What are promising applications of non-i.i.d. learning in the domains of bioinformatics and personalized medicine?
The idea of the seminar bringing together people from theory, algorithms, computer vision, and computational biology, was very successful, since many discussions and joint research questions came up that have not been anticipated in the beginning. These aspects were not completely limited to non-i.i.d. learning and also touched ubiquitous topics like learning with deeper architectures. It was the agreement of all participants that the seminar should be the beginning of an ongoing series of longer Dagstuhl seminars focused on non-i.i.d. learning.
- Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
- Jonathan Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149–198, 2000.
- Rich Caruana. Multitask learning. Machine Learning, 28:41–75, July 1997.
- C. Cortes, Marius Kloft, and M. Mohri. Learning kernels using local rademacher complexity. In Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013. in press.
- L. Duan, I. W. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- L. Duan, D. Xu, I. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
- Theodoros Evgeniou and Massimiliano Pontil. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 109–117. ACM, 2004.
- Ali Farhadi and Mostafa Kamali Tabrizi. Learning to recognize activities from the wrong view point. In Proceedings of the European Conference on Computer Vision (ECCV), 2008.
- Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, July 2006.
- Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, and Kate Saenko. Efficient learning of domain-invariant image representations. In International Conference on Learning Representations (ICLR), 2013.
- Marius Kloft and Gilles Blanchard. On the convergence rate of ell_p-norm multiple kernel learning. Journal of Machine Learning Research, 13:2465–2502, Aug 2012.
- Marius Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Lp-norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997, Mar 2011.
- G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. I. Jordan. Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5:27–72, 2004.
- Fei-Fei Li, Rob Fergus, and Pietro Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
- Limin Li, Barbara Rakitsch, and Karsten M. Borgwardt. ccsvm: correcting support vector machines for confounding factors in biological data classification. Bioinformatics [ISMB/ECCB], 27(13):342–348, 2011.
- Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. FaST linear mixed models for genome-wide association studies. Nat Meth, 8(10):833–835, October 2011.
- Jie Luo, Tatiana Tommasi, and Barbara Caputo. Multiclass transfer learning from unconstrained priors. In Proceedings of the International Conference on Computer Vision (ICCV), pages 1863–1870, 2011.
- Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
- Ariadna Quattoni, Michael Collins, and Trevor Darrell. Transfer learning for image classification with sparse prototype representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8. IEEE, 2008.
- Erik Rodner and Joachim Denzler. Learning with few examples by transferring feature relevance. In Proceedings of the 31st Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 252–261, 2009.
- Erik Rodner and Joachim Denzler. One-shot learning of object categories using dependent gaussian processes. In Proceedings of the 32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 232–241, 2010.
- Erik Rodner and Joachim Denzler. Learning with few examples for binary and multiclass classification using regularization of randomized trees. Pattern Recognition Letters, 32(2):244–251, 2011.
- Ulrich Rückert and Marius Kloft. Transfer learning with adaptive regularizers. In ECML/PKDD (3), pages 65–80, 2011.
- Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In European Conference on Computer Vision (ECCV), pages 213–226, 2010.
- Gabriele Schweikert, Christian Widmer, Bernhard Schölkopf, and Gunnar Rätsch. An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in Neural Information Processing Systems 21, pages 1433–1440, 2009.
- S. Sonnenburg, A. Zien, and G. Rätsch. Arts: Accurate recognition of transcription starts in human.Bioinformatics, 22(14):e472–e480, 2006.
- Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010.
- Ingo Steinwart, Don R. Hush, and Clint Scovel. Learning from dependent observations. J. Multivariate Analysis, 100(1):175–194, 2009.
- Antonio Torralba and Alyosha Efros. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
- Antonio Torralba, Kevin P Murphy, and William T Freeman. Sharing features: efficient boosting procedures for multiclass object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages II–762. IEEE, 2004.
- C. Widmer, M. Kloft, and G. Rätsch. Multi-task learning for computational biology: Overview and outlook. In B. Schoelkopf, Z. Luo, and V. Vovk, editors, Empirical Inference – Festschrift in Honor of Vladimir N. Vapnik, 2013.
- Shai Ben-David (University of Waterloo, CA) [dblp]
- Gilles Blanchard (Universität Potsdam, DE) [dblp]
- Trevor Darrell (University of California - Berkeley, US) [dblp]
- Joachim Denzler (Universität Jena, DE) [dblp]
- Philipp Drewe (Max-Delbrück-Centrum, DE) [dblp]
- Mario Fritz (MPI für Informatik - Saarbrücken, DE) [dblp]
- Judy Hoffman (University of California - Berkeley, US) [dblp]
- Josef Kittler (University of Surrey, GB) [dblp]
- Marius Kloft (HU Berlin, DE) [dblp]
- Brian Kulis (Ohio State University - Columbus, US) [dblp]
- Christoph H. Lampert (IST Austria - Klosterneuburg, AT) [dblp]
- Soeren Laue (Universität Jena, DE) [dblp]
- Alessandro Lazaric (INRIA - University of Lille 1, FR) [dblp]
- Victor Lempitsky (Skoltech - Scolkovo, RU) [dblp]
- Christoph Lippert (Los Angeles, US) [dblp]
- Stephan Mandt (Columbia University, US) [dblp]
- Shin Nakajima (TU Berlin, DE) [dblp]
- Francesco Orabona (Yahoo! Labs - New York, US) [dblp]
- Massimiliano Pontil (University College London, GB) [dblp]
- Gunnar Rätsch (Memorial Sloan-Kettering Cancer Center - New York, US) [dblp]
- Erik Rodner (Universität Jena, DE) [dblp]
- Kate Saenko (University of Massachusetts - Lowell, US) [dblp]
- Tobias Scheffer (Universität Potsdam, DE) [dblp]
- Dino Sejdinovic (University of Oxford, GB) [dblp]
- Fei Sha (University of Southern California - Los Angeles, US) [dblp]
- Oliver Stegle (European Bioinformatics Institute - Cambridge, GB) [dblp]
- Ingo Steinwart (Universität Stuttgart, DE) [dblp]
- Ilya Tolstikhin (MPI für Intelligente Systeme - Tübingen, DE) [dblp]
- Ruth Urner (MPI für Intelligente Systeme - Tübingen, DE) [dblp]
- Alexander Zimin (IST Austria - Klosterneuburg, AT) [dblp]
- Dagstuhl Seminar 18291: Extreme Classification (2018-07-15 - 2018-07-20) (Details)
- artificial intelligence / robotics
- computer graphics / computer vision
- machine learning
- computer vision
- computational biology
- domain adaptation
- multitask learning
- transfer learning
- dataset selection bias
- dependent and non-i.i.d. data