https://www.dagstuhl.de/15181

# Challenges and Trends in Probabilistic Programming

## Organisatoren

Gilles Barthe (IMDEA Software – Madrid, ES)
Andrew D. Gordon (Microsoft Research UK – Cambridge, GB)
Joost-Pieter Katoen (RWTH Aachen, DE)
Annabelle McIver (Macquarie University – Sydney, AU)

## Summary

### Probabilistic programming languages

Probabilistic programs are programs, written in languages like C, Java, LISP, or ML, with two added constructs: (1) the ability to draw values at random from probability distributions, and (2) the ability to condition values of variables in a program through observations. A variety of probabilistic programming languages have been defined such as Church, Infer.NET, and IBAL. Church is based on the Lisp model of the lambda calculus, containing pure Lisp as its deterministic subset, whereas Infer.NET is a Microsoft developed language akin to C# and compiles probabilistic programs into inference code. Probabilistic programs can be used for modelling complex phenomena from biology and social sciences. By doing so, we get the benefits of programming languages (rigorous semantics, execution, testing and verification) to these problem domains. More than a decade ago, McIver and Morgan defined a probabilistic programming language in the style of Dijkstra's guarded command language, referred to as pGCL. Besides the usual language constructs in Dijkstra's GCL such as non-deterministic choice, it features a probabilistic choice where the probability distribution may be parametric. For instance, the assignment x+= 1 [p] skip increments the variable x by one with probability p, and keeps the value of x unchanged with probability 1-p, where p is an unknown real value from the range [0,1]. Quantum programming languages such as qGCL and a quantum extension of C++ are also related, as their operational semantics is typically a probabilistic model so as to model the effect of measurements on the quantum state.

### The importance of probabilistic programming

The applications of probabilistic programs mainly lie in four domains: (1) machine learning, (2) security, (3) randomised algorithms, and - though to a somewhat lesser extent - (4) quantum computing. Whereas the application in the field of randomised algorithms is evident, let us briefly describe the importance for the other three fields.

#### Machine learning

A Bayesian generative model consists of a prior distribution over some parameters, together with a sampling distribution (or likelihood) that predicts outputs of the model given its inputs and parameters. Bayesian inference in machine learning consists of training such a model to infer the posterior distribution of the parameters and hence to make predictions. In the probabilistic programming approach to Bayesian inference, the user simply writes the prior and sampling distributions as probabilistic programs, and relies on a compiler to generate code to perform inference and make predictions. Such compilers often operate by considering the program as defining a probabilistic graphical model. Graphical models were pioneered by Judea Pearl and others, and are extensively described in the comprehensive text by Koller and Friedman (2009). They are widely used in statistics and machine learning, with diverse application areas including speech recognition, computer vision, biology, and reliability analysis. Probabilistic graphical models allow specification of dependences between random variables via generative models, as well as conditioning of random variables using phenomena or data observed in the real world. A variety of inference algorithms have been developed to analyse and query such models, e.g., Gibbs sampling methods, Metropolis-Hastings and belief propagation. The probabilistic programming approach has seen growing interest within machine learning over the last 10 years and it is believed -- see http://probabilistic-programming.org/wiki/Home -- that this approach within AI has the potential to fundamentally change the way that community understands, designs, builds, tests and deploys probabilistic systems.

#### Security

Ever since Goldwasser and Micali - recipients of the ACM Turing Award in 2013 - introduced probabilistic encryption, probability has played a central role in cryptography: virtually all cryptographic algorithms are randomized, and have probabilistic security guarantees. Similarly, perturbing outputs with probabilistic noise is a standard tool for achieving privacy in computations; for instance, differential privacy achieves privacy-preserving data-mining using probabilistic noise. Cryptographic algorithms and differentially private algorithms are implemented as probabilistic programs; more singularly, one common approach for reasoning about these algorithms is using the code-based game-based approach, proposed by Bellare and Rogaway, in which not only the algorithms, but also their security properties and the hardness properties upon which their security relies, are expressed as probabilistic programs , and can be verified using (a relational variant of) Hoare logic. This code-based approach is key to recent developments in verified cryptography. Quantitative information flow is another important field in security where probabilistic programs and models play an important role. Here, the key question is to obtain quantitative statements about the leakage of certain information from a given program.

#### Quantum computing

Quantum programs are used to describe quantum algorithms and typically are quantum extensions of classical while-programs. Whereas in classical computation, we use a type to denote the domain of a variable, in quantum computation, a type is the state space of a quantum system denoted by some quantum variable. The state space of a quantum variable is the Hilbert space denoted by its type. According to a basic postulate of quantum mechanics, the unique way to acquire information about a quantum system is to measure it. Therefore, the essential ingredient in a quantum program is the ability to perform measurements of quantum registers, i.e., finite sequences of distinct quantum variables. The state space of a quantum register is the tensor product of the state spaces of its quantum variables. In executing the statement measure M[q]; S, quantum measurement M will first be performed on quantum register q, and then a sub-program S will be selected to be executed next according to the outcome of the measurement. The essential difference between a measurement statement and a classical conditional statement is that the state of program variables is changed after performing the measurement. As the outcome of a measurement is probabilistic, quantum programs are thus inherently probabilistic.

#### Program analysis

On the other hand, there is a recent rapidly growing trend in research on probabilistic programs which is more in line with traditional programming languages. This focuses on aspects such as efficient compilation, static analysis, program transformations, and program verification. To mention a few, Cousot et al. recently extended the framework of abstract interpretation to probabilistic programs (2012), Gordon et al. introduced Tabular, a new probabilistic programming language (2014), Di Pierro et al. apply probabilistic static analysis (2010), Rajamani, Gordon et al. have used symbolic execution to perform Bayesian reasoning on probabilistic programs with loops (2013), Katoen, McIver et al. have developed invariant synthesis technique for linear probabilistic programs (2010), and Geldenhuys et al. considered probabilistic symbolic execution (2012).

### Achievements of this seminar

The objective of the seminar was a to bring together researchers from four separate (but related) communities to learn from each other, with the expectation that a better understanding between these communities would open up new opportunities for research and collaboration.

Participants attending the seminar represented all four themes of the original proposal: machine learning, quantitative security, (probabilistic) program analysis and quantum computing. The programme consisted of both tutorials and presentations on any topic within these themes. The tutorials provided a common ground for discussion, and the presentations gave insight into the current state of an area, and summarised the challenges that still remain. The tutorial topics were determined by consulting the participants prior to the seminar by means of a questionnaire.

Although the programme was primarily constructed around the tutorials and standard-length presentations (each around 30 minutes), the organisers made sure that time was always available for short, impromptu talks (sometimes of only 5 minutes) where participants were able to outline a relevant challenge problem or to draw attention to a new research direction or connection that had become apparent during the meeting.

This open forum for exploring links between the communities has led to the following specific achievements:

1. An increased understanding between the disciplines, especially between program verification and probabilistic programming.
2. A demonstration that the mathematical models for reasoning about machine learning algorithms and quantitative security are very similar, but that their objectives are very different. This close relationship at a foundational level suggests theoretical methods to tackle the important challenge of understanding privacy in a data mining context.
3. Evidence that probabilistic programming, analysis and verification of probabilistic programs, can have a broad impact in the design of emerging infrastructures, such as software-defined networks.

The feedback by the participants was very positive, and it was encouraged to organise a workshop or similar event in the future to foster the communication between the different communities, in particular between program verification and probabilistic programming.

We were aware of many new conversations between researchers inspired by the formal talks as well as the mealtime discussions. Already at least one paper (see below) with content inspired by the meeting is accepted for publication, and we are aware of several other new lines of work.

#### Acknowledgement

The organisers thank Benjamin Kaminski for his support in compiling this report and in several organisational issues.

#### References

1. A. Scibior, Z. Ghahramani and A. D. Gordon. Practical Probabilistic Programming with Monads. ACM SIGPLAN Haskell Symposium 2015, Vancouver, Canada, 3–4 September 2015.
Creative Commons BY 3.0 Unported license
Gilles Barthe, Andrew D. Gordon, Joost-Pieter Katoen, and Annabelle McIver

## Classification

• Artificial Intelligence / Robotics
• Security / Cryptology
• Verification / Logic

## Keywords

• Bayesian networks
• Differential privacy
• Machine learning
• Probabilistic programs
• Security
• Semantics
• Static analysis
• Verification

## Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.