### 27.11.16 - 02.12.16, Seminar 16481

# New Directions for Learning with Kernels and Gaussian Processes

### The following text appeared on our web pages prior to the seminar, and was included as part of the invitation.

## Motivation

Positive definite kernels dominated machine learning research in the first decade of the millennium. They provide infinite-dimensional hypothesis classes that deliver expressive power in an elegant analytical framework. In their probabilistic interpretation as Gaussian process models, they are also a fundamental concept of Bayesian inference.

In the past years, hierarchical (‘deep’) parametric models have bounced back and delivered a series of impressive empirical successes. In areas like speech recognition and image classification, deep networks now far surpass the predictive performance previously achieved with nonparametric models. The central lessons from the ‘deep resurgence’ is that the kernel community has been too reliant on theoretical notions of universality. Instead, representations must be learned on a more general level than previously accepted. This process is often associated with an ‘engineering’ approach to machine learning, in contrast to the supposedly more ‘scientific’ air surrounding kernel methods. One central goal of this seminar is to discuss how the superior adaptability of deep models can be transferred to the kernel framework while retaining at least some analytical clarity.

In a separate but related development, kernels have had their own renaissance lately, in the young areas of probabilistic programming (‘computing of probability measures’) and probabilistic numerics (‘probabilistic descriptions of computing’). In both areas, kernels and Gaussian processes have been used as a descriptive language. And, similar to the situation in general machine learning, only a handfulof comparably simple kernels have so far been used. The central question here, too, is thus how kernels can be designed for challenging, in particular high-dimensional regression problems. In contrast to the wider situation in ML, though, kernel design here should take place at compile-time, and be a structured algebraic process mapping source code describing a graphical model into a kernel. This gives rise to new fundamental questions for the theoretical computer science part of machine learning. With the goal to spur research progress in these two related areas of research, some of the questions to be discussed at the seminar are:

- Are there ‘deep’ kernel methods? What are they? Are they necessary?
- How can nonparametric kernel models be scaled to Big Data? Is non-parametricity actually necessary, or does ‘big’ suffice in some clear sense?
- If a computational task is defined by source code, is it possible to parse this code and map it to a Gaussian process hypothesis class? What are the theoretical limits of this parsing process, and can it be practically useful? Is it possible for kernel models to solve the fundamental problems of high-dimensional integration (i.e. marginalization in graphical models)? If so, how?

We believe that this is a crucial point in time for these discussions to take place.