27. November – 02. Dezember 2016, Dagstuhl-Seminar 16481

New Directions for Learning with Kernels and Gaussian Processes


Arthur Gretton (University College London, GB)
Philipp Hennig (MPI für Intelligente Systeme – Tübingen, DE)
Carl Edward Rasmussen (University of Cambridge, GB)
Bernhard Schölkopf (MPI für Intelligente Systeme – Tübingen, DE)

Auskunft zu diesem Dagstuhl-Seminar erteilt

Dagstuhl Service Team


Dagstuhl's Impact: Dokumente verfügbar
Programm des Dagstuhl-Seminars [pdf]


Machine learning is a young field that currently enjoys rapid, almost dizzying advancement both on the theoretical and the practical side. On account of either, the until quite recently obscure discipline is increasingly turning into a central area of computer science. Dagstuhl seminar 16481 on "New Directions for Learning with Kernels and Gaussian Processes" attempted to allow a key community within machine learning to gather its bearings at this crucial moment in time.

Positive definite kernels are a concept that dominated machine learning research in the first decade of the millennium. They provide infinite-dimensional hypothesis classes that deliver expressive power in an elegant analytical framework. In their probabilistic interpretation as Gaussian process models, they are also a fundamental concept of Bayesian inference:

A positive definite kernel k: X times X to Re on some input domain X is a function with the property that, for all finite sets {x_1,...,x_N} subset X, the matrix K in Re^{NxN}, with elements k_{ij}=k(x_i,x_j), is positive semidefinite. According to a theorem by Mercer, given certain regularity assumptions, such kernels can be expressed as a potentially infinite expansion
k(x,x') = sum_{i=1} ^infty lambda_i phi_i(x) phi_i ^* (x'), qquad with qquad sum_{i=1} ^{infty} lambda_i < infty,

where * is the conjugate transpose, lambda_i in Re_+ is a non-negative eigenvalue and phi_i is an eigenfunction with respect to some measure u(x): a function satisfying

int k(x,x') phi_i(x) du(x) = lambda_i phi_i(x').

Random functions f(x) drawn by independently sampling Gaussian weights for each eigenfunction,

f(x) = sum_{j=1} ^infty f_j phi_j(x) qquad where qquad f_j sim N(0,lambda_i),

are draws from the centered Gaussian process (GP) p(f)=GP(f;0,k) with covariance function k. The logarithm of this Gaussian process measure is, up to constants and some technicalities, the square of the norm |f |^2 _k associated with the reproducing kernel Hilbert space (RKHS) of functions reproduced by k.

Supervised machine learning methods that infer an unknown function f from a data set of input-output pairs (X,Y):={(x_i,y_i)}_{i=1,dots,N} can be constructed by minimizing an empirical risk ell(f(X);Y) regularized by |cdot|^2 _k. Or, algorithmically equivalent but with different philosophical interpretation, by computing the posterior Gaussian process measure arising from conditioning GP(f;0,k) on the observed data points under a likelihood proportional to the exponential of the empirical risk.

The prominence of kernel/GP models was founded on this conceptually and algorithmically compact yet statistically powerful description of inference and learning of nonlinear functions. In the past years, however, hierarchical ('deep') parametric models have bounced back and delivered a series of impressive empirical successes. In areas like speech recognition and image classification, deep networks now far surpass the predictive performance previously achieved with nonparametric models. One central goal of the seminar was to discuss how the superior adaptability of deep models can be transferred to the kernel framework while retaining at least some analytical clarity. Among the central lessons from the 'deep resurgence' identified by the seminar participants is that the kernel community has been too reliant on theoretical notions of universality. Instead, representations must be learned on a more general level than previously accepted. This process is often associated with an 'engineering' approach to machine learning, in contrast to the supposedly more 'scientific' air surrounding kernel methods. But its importance must not be dismissed. At the same time, participants also pointed out that deep learning is often misrepresented, in particular in popular expositions, as an almost magic kind of process; when in reality the concept is closely related to kernel methods, and can be understood to some degree through this connection: Deep models provide a hierarchical parametrization of the feature functions phi_i(x) in terms of a finite-dimensional family. The continued relevance of the established theory for kernel/GP models hinges on how much of the power of deep models can be understood from within the RKHS view, and how much new concepts are required to understand the expressivity of a deep learning machine.

There is also unconditionally good news: In a separate but related development, kernels have had their own renaissance lately, in the young areas of probabilistic programming ('computing of probability measures') and probabilistic numerics ('probabilistic descriptions of computing'). In both areas, kernels and Gaussian processes have been used as a descriptive language. And, similar to the situation in general machine learning, only a handful of comparably simple kernels have so far been used. The central question here, too, is thus how kernels can be designed for challenging, in particular high-dimensional regression problems. In contrast to the wider situation in ML, though, kernel design here should take place at compile-time, and be a structured algebraic process mapping source code describing a graphical model into a kernel. This gives rise to new fundamental questions for the theoretical computer science of machine learning.

A third thread running through the seminar concerned the internal conceptual schism between the probabilistic (Gaussian process) view and the statistical learning theoretical (RKHS) view on the model class. Although the algorithms and algebraic ides used on both sides overlap almost to the point of equivalence, their philosophical interpretations, and thus also the required theoretical properties, differ strongly. Participants for the seminar were deliberately invited from both "denominations" in roughly equal number. Several informal discussions in the evenings, and in particular a lively break-out discussion on Thursday helped clear up the mathematical connections (while also airing key conceptual points of contention from either side). Thursday's group is planning to write a publication based on the results of the discussion; this would be a highly valuable concrete contribution arising from the seminar, that may help drawing this community closer together.

Despite the challenges to some of the long-standing paradigms of this community, the seminar was infused with an air of excitement. The participants seemed to share the sensation that machine learning is still only just beginning to show its full potential. The mathematical concepts and insights that have emerged from the study of kernel/GP models may have to evolve and be adapted to recent developments, but their fundamental nature means they are quite likely to stay relevant for the understanding of current and future model classes. Far from going out of fashion, mathematical analysis of the statistical and numerical properties of machine learning model classes seems slated for a revival in coming years. And much of it will be leveraging the notions discussed at the seminar.

Summary text license
  Creative Commons BY 3.0 Unported license
  Arthur Gretton, Philipp Hennig, Carl Edward Rasmussen, and Bernhard Schölkopf


  • Artificial Intelligence / Robotics
  • Data Structures / Algorithms / Complexity
  • Modelling / Simulation


  • Machine Learning
  • Kernel Methods
  • Gaussian Processes
  • Probabilistic Programming
  • Probabilistic Numerics


In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.


Download Übersichtsflyer (PDF).

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.


Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.