##### Dagstuhl Seminar 15181

### Challenges and Trends in Probabilistic Programming

##### ( Apr 26 – Apr 30, 2015 )

##### Permalink

##### Organizers

- Gilles Barthe (IMDEA Software - Madrid, ES)
- Andrew D. Gordon (Microsoft Research UK - Cambridge, GB)
- Joost-Pieter Katoen (RWTH Aachen, DE)
- Annabelle McIver (Macquarie University - Sydney, AU)

##### Contact

- Dagmar Glaser (for administrative matters)

##### Schedule

In this Dagstuhl Seminar, we bring together researchers working in the fields of machine learning, quantitative security, quantum computing, and (probabilistic) program analysis.

Probabilistic programming is at the heart of machine learning for describing distribution functions; Bayesian inference is pivotal in their analysis. Based on probabilistic programs describing the prior and sampling distributions, a compiler generates code to perform inference and make predictions. Programs are often given as a probabilistic graphical model allowing for the specification of dependencies between random variables via generative models and the conditioning of random variables using phenomena or data observed in the real world. A variety of inference algorithms have been developed to analyse and query such models, e.g., Gibbs sampling methods, Metropolis-Hastings and belief propagation. The probabilistic programming approach within AI has the potential to fundamentally change how the community understands, designs, builds, tests and deploys probabilistic systems.

Virtually all cryptographic algorithms are randomized and have probabilistic security guarantees. Similarly, perturbing outputs with probabilistic noise is a standard tool for achieving privacy in computations; e.g., differential privacy achieves privacy-preserving data-mining using probabilistic noise. Cryptographic algorithms and differentially private algorithms are implemented as probabilistic programs. More singularly, one common approach for reasoning about these algorithms is to use the code- and game-based approach proposed by Bellare and Rogaway, in which not only the algorithms, but also their security properties and the hardness properties upon which their security relies, are expressed as probabilistic programs. Quantitative information flow is another important field in security where probabilistic programs and models play an important role. Here, the key question is to obtain quantitative statements about the leakage of certain information from a given program.

Quantum programs are used to describe quantum algorithms and typically are quantum extensions of classical while-programs. According to a basic postulate of quantum mechanics, the unique way to acquire information about a quantum system is to measure it. Therefore, the essential ingredient in a quantum program is the ability to perform measurements of quantum registers, i.e., finite sequences of distinct quantum variables. As the outcome of a measurement is probabilistic, quantum programs are inherently probabilistic.

On the other hand, one recent rapidly growing trend in research on probabilistic programs is more in line with traditional programming languages. It focuses on such topics as efficient compilation, static analysis, program transformations, and program verification using model checking and theorem proving.

Given the diversity of the different communities currently working on probabilistic programs, there is an urgent need to bring these communities together in order to exploit synergies and cross-fertilize. This Dagstuhl Seminar aims to do that. The seminar will offer technically-focused tutorial talks on each of the different topics, but first and foremost will focus on discussions of prime research questions.

### Probabilistic programming languages

Probabilistic programs are programs, written in languages like C, Java, LISP, or ML, with two added constructs: (1) the ability to draw values at random from probability distributions, and (2) the ability to condition values of variables in a program through observations. A variety of probabilistic programming languages have been defined such as **Church**, **Infer.NET**, and **IBAL**. **Church** is based on the Lisp model of the lambda calculus, containing pure Lisp as its deterministic subset, whereas **Infer.NET** is a Microsoft developed language akin to **C#** and compiles probabilistic programs into inference code. Probabilistic programs can be used for modelling complex phenomena from biology and social sciences. By doing so, we get the benefits of programming languages (rigorous semantics, execution, testing and verification) to these problem domains. More than a decade ago, McIver and Morgan defined a probabilistic programming language in the style of Dijkstra's guarded command language, referred to as **pGCL**. Besides the usual language constructs in Dijkstra's **GCL** such as non-deterministic choice, it features a probabilistic choice where the probability distribution may be parametric.
For instance, the assignment x+= 1 [p] **skip** increments the variable **x** by one with probability p, and keeps the value of **x** unchanged with probability 1-p, where p is an unknown real value from the range [0,1].
Quantum programming languages such as **qGCL** and a quantum extension of **C++** are also related, as their operational semantics is typically a probabilistic model so as to model the effect of measurements on the quantum state.

### The importance of probabilistic programming

The applications of probabilistic programs mainly lie in four domains: (1) machine learning, (2) security, (3) randomised algorithms, and - though to a somewhat lesser extent - (4) quantum computing. Whereas the application in the field of randomised algorithms is evident, let us briefly describe the importance for the other three fields.

#### Machine learning

A Bayesian generative model consists of a prior distribution over some parameters, together with a sampling distribution (or likelihood) that predicts outputs of the model given its inputs and parameters. Bayesian inference in machine learning consists of training such a model to infer the posterior distribution of the parameters and hence to make predictions. In the probabilistic programming approach to Bayesian inference, the user simply writes the prior and sampling distributions as probabilistic programs, and relies on a compiler to generate code to perform inference and make predictions. Such compilers often operate by considering the program as defining a probabilistic graphical model. Graphical models were pioneered by Judea Pearl and others, and are extensively described in the comprehensive text by Koller and Friedman (2009). They are widely used in statistics and machine learning, with diverse application areas including speech recognition, computer vision, biology, and reliability analysis. Probabilistic graphical models allow specification of dependences between random variables via generative models, as well as conditioning of random variables using phenomena or data observed in the real world. A variety of inference algorithms have been developed to analyse and query such models, e.g., Gibbs sampling methods, Metropolis-Hastings and belief propagation. The probabilistic programming approach has seen growing interest within machine learning over the last 10 years and it is believed -- see http://probabilistic-programming.org/wiki/Home -- that this approach within AI has the potential to fundamentally change the way that community understands, designs, builds, tests and deploys probabilistic systems.

#### Security

Ever since Goldwasser and Micali - recipients of the ACM Turing Award in 2013 - introduced probabilistic encryption, probability has played a central role in cryptography: virtually all cryptographic algorithms are randomized, and have probabilistic security guarantees. Similarly, perturbing outputs with probabilistic noise is a standard tool for achieving privacy in computations; for instance, differential privacy achieves privacy-preserving data-mining using probabilistic noise. Cryptographic algorithms and differentially private algorithms are implemented as probabilistic programs; more singularly, one common approach for reasoning about these algorithms is using the code-based game-based approach, proposed by Bellare and Rogaway, in which not only the algorithms, but also their security properties and the hardness properties upon which their security relies, are expressed as probabilistic programs , and can be verified using (a relational variant of) Hoare logic. This code-based approach is key to recent developments in verified cryptography. Quantitative information flow is another important field in security where probabilistic programs and models play an important role. Here, the key question is to obtain quantitative statements about the leakage of certain information from a given program.

#### Quantum computing

Quantum programs are used to describe quantum algorithms and typically are quantum extensions of classical while-programs. Whereas in classical computation, we use a type to denote the domain of a variable, in quantum computation, a type is the state space of a quantum system denoted by some quantum variable. The state space of a quantum variable is the Hilbert space denoted by its type. According to a basic postulate of quantum mechanics, the unique way to acquire information about a quantum system is to measure it.
Therefore, the essential ingredient in a quantum program is the ability to perform measurements of quantum registers, i.e., finite sequences of distinct quantum variables. The state space of a quantum register is the tensor product of the state spaces of its quantum variables. In executing the statement **measure M[q]; S**, quantum measurement M will first be performed on quantum register q, and then a sub-program S will be selected to be executed next according to the outcome of the measurement. The essential difference between a measurement statement and a classical conditional statement is that the state of program variables is changed after performing the measurement.
As the outcome of a measurement is probabilistic, quantum programs are thus inherently probabilistic.

#### Program analysis

On the other hand, there is a recent rapidly growing trend in research on probabilistic programs which is more in line with traditional programming languages. This focuses on aspects such as efficient compilation, static analysis, program transformations, and program verification.
To mention a few, Cousot *et al.* recently extended the framework of abstract interpretation to probabilistic programs (2012), Gordon *et al.* introduced **Tabular**, a new probabilistic programming language (2014), Di Pierro *et al.* apply probabilistic static analysis (2010), Rajamani, Gordon *et al.* have used symbolic execution to perform Bayesian reasoning on probabilistic programs with loops (2013), Katoen, McIver *et al.* have developed invariant synthesis technique for linear probabilistic programs (2010), and Geldenhuys *et al.* considered probabilistic symbolic execution (2012).

### Achievements of this seminar

The objective of the seminar was a to bring together researchers from four separate (but related) communities to learn from each other, with the expectation that a better understanding between these communities would open up new opportunities for research and collaboration.

Participants attending the seminar represented all four themes of the original proposal: machine learning, quantitative security, (probabilistic) program analysis and quantum computing. The programme consisted of both tutorials and presentations on any topic within these themes. The tutorials provided a common ground for discussion, and the presentations gave insight into the current state of an area, and summarised the challenges that still remain. The tutorial topics were determined by consulting the participants prior to the seminar by means of a questionnaire.

Although the programme was primarily constructed around the tutorials and standard-length presentations (each around 30 minutes), the organisers made sure that time was always available for short, impromptu talks (sometimes of only 5 minutes) where participants were able to outline a relevant challenge problem or to draw attention to a new research direction or connection that had become apparent during the meeting.

This open forum for exploring links between the communities has led to the following specific achievements:

- An increased understanding between the disciplines, especially between program verification and probabilistic programming.
- A demonstration that the mathematical models for reasoning about machine learning algorithms and quantitative security are very similar, but that their objectives are very different. This close relationship at a foundational level suggests theoretical methods to tackle the important challenge of understanding privacy in a data mining context.
- Evidence that probabilistic programming, analysis and verification of probabilistic programs, can have a broad impact in the design of emerging infrastructures, such as software-defined networks.

The feedback by the participants was very positive, and it was encouraged to organise a workshop or similar event in the future to foster the communication between the different communities, in particular between program verification and probabilistic programming.

We were aware of many new conversations between researchers inspired by the formal talks as well as the mealtime discussions. Already at least one paper (see below) with content inspired by the meeting is accepted for publication, and we are aware of several other new lines of work.

#### Acknowledgement

The organisers thank Benjamin Kaminski for his support in compiling this report and in several organisational issues.

#### References

- A. Scibior, Z. Ghahramani and A. D. Gordon. Practical Probabilistic Programming with Monads. ACM SIGPLAN Haskell Symposium 2015, Vancouver, Canada, 3–4 September 2015.

- Christel Baier (TU Dresden, DE) [dblp]
- Gilles Barthe (IMDEA Software - Madrid, ES) [dblp]
- Johannes Borgström (Uppsala University, SE) [dblp]
- Michael Carbin (Microsoft Research - Redmond, US) [dblp]
- Aleksandar Chakarov (University of Colorado - Boulder, US) [dblp]
- Ugo Dal Lago (University of Bologna, IT) [dblp]
- Alessandra Di Pierro (University of Verona, IT) [dblp]
- Jason Eisner (Johns Hopkins University - Baltimore, US) [dblp]
- Yuan Feng (University of Technology - Sydney, AU) [dblp]
- Luis Maria Ferrer Fioriti (Universität des Saarlandes, DE) [dblp]
- Cédric Fournet (Microsoft Research UK - Cambridge, GB) [dblp]
- Cameron Freer (MIT - Cambridge, US) [dblp]
- Marco Gaboardi (University of Dundee, GB) [dblp]
- Andrew D. Gordon (Microsoft Research UK - Cambridge, GB) [dblp]
- Friedrich Gretz (RWTH Aachen, DE) [dblp]
- Johannes Hölzl (TU München, DE) [dblp]
- Chung-Kil Hur (Seoul National University, KR) [dblp]
- Benjamin Kaminski (RWTH Aachen, DE) [dblp]
- Joost-Pieter Katoen (RWTH Aachen, DE) [dblp]
- Stefan Kiefer (University of Oxford, GB) [dblp]
- Angelika Kimmig (KU Leuven, BE) [dblp]
- Boris Köpf (IMDEA Software - Madrid, ES) [dblp]
- Pasquale Malacaria (Queen Mary University of London, GB) [dblp]
- Vikash Mansinghka (MIT - Cambridge, US) [dblp]
- Piotr Mardziel (University of Maryland - College Park, US) [dblp]
- Annabelle McIver (Macquarie University - Sydney, AU) [dblp]
- Joel Ouaknine (University of Oxford, GB) [dblp]
- Catuscia Palamidessi (INRIA Saclay - Île-de-France, FR) [dblp]
- David Parker (University of Birmingham, GB) [dblp]
- Avi Pfeffer (Charles River Analytics - Cambridge, US) [dblp]
- Tahiry Rabehaja (Macquarie University - Sydney, AU) [dblp]
- Sriram K. Rajamani (Microsoft Research India - Bangalore, IN) [dblp]
- Norman Ramsey (Tufts University - Medford, US) [dblp]
- Chung-chieh Shan (Indiana University - Bloomington, US) [dblp]
- Alexandra Silva (Radboud University Nijmegen, NL) [dblp]
- Sameer Singh (University of Washington - Seattle, US) [dblp]
- Geoffrey Smith (Florida International University - Miami, US) [dblp]
- Andreas Stuhlmüller (MIT - Cambridge, US & Stanford University, US) [dblp]
- Frank Wood (University of Oxford, GB) [dblp]
- Mingsheng Ying (University of Technology - Sydney, AU) [dblp]
- Lijun Zhang (Chinese Academy of Sciences, CN) [dblp]

##### Classification

- artificial intelligence / robotics
- security / cryptology
- verification / logic

##### Keywords

- Bayesian networks
- differential privacy
- machine learning
- probabilistic programs
- security
- semantics
- static analysis
- verification