May 16 – 20 , 2016, Dagstuhl Seminar 16201

Synergies among Testing, Verification, and Repair for Concurrent Programs


Julian Dolby (IBM TJ Watson Research Center – Yorktown Heights, US)
Orna Grumberg (Technion – Haifa, IL)
Peter Müller (ETH Zürich, CH)
Omer Tripp (IBM TJ Watson Research Center – Yorktown Heights, US)

For support, please contact

Dagstuhl Service Team


Dagstuhl Report, Volume 6, Issue 5 Dagstuhl Report
Aims & Scope
List of Participants


Context and Motivation

Major trends in computing infrastructure, such as multicore processors and data centers, increase the demand for concurrent software that utilizes the available resources. However, concurrent programs are notoriously difficult to develop. They are susceptible to a number of specific errors that do not occur in sequential code, such as data races, deadlock, atomicity violations, starvation, and violations of consistency models. These errors typically manifest themselves only in certain executions (for instance, under certain thread schedules), which makes them extremely difficult to detect, reproduce, localize, and repair. Established techniques for testing, verifying and repairing sequential programs are insufficient to handle concurrent software. In particular, they do not address the following challenges:

  • State space explosion: The execution of a concurrent program depends not only on the inputs but also on the thread schedule and optimizations, such as memory reordering. This results in an state space that is orders of magnitude larger than for sequential programs. Bug-finding techniques, such as testing and bounded model checking, require effective ways of pruning the state space. Static verification techniques, such as deductive verification and abstract interpretation, require suitable abstractions that allow one to reason about all possible program behaviors. Finally, program repair requires techniques to predict the impact of a program change on the set of possible executions.
  • Modularity: Modular techniques, such as unit testing or compositional verification, scale to large applications. However, for many properties of concurrent programs there are no modular techniques, or they require a large annotation overhead, for instance to denote the locations protected by a lock or to specify a locking order (or discipline) that ensures deadlock freedom. It is crucial to develop techniques that allow programs to be checked and repaired modularly, for instance to fix an atomicity violation by adding more thread synchronization, but without introducing a deadlock globally.
  • Specifications: Testing, verification and repair may rely on specifications that express the intended program behavior, for instance in the form of test oracles or program invariants. In addition to functional properties, specifications for concurrent programs also have to express how threads cooperate, for instance via a global locking strategy. While various specification approaches exist for concurrent programs, there is no uniform formalism that handles the full range of concurrency idioms and that supports testing, verification and repair.
  • Error reporting: Testing, verification and repair techniques need to disambiguate true problems from spurious defects, which is often difficult in concurrent programs. For instance, a data race is not necessarily a bug. If a race occurs within a lock-free data structure, then it may be admissible as part of some higher-level transactional behavior enforced by the data-structure operation. Moreover, it is important to present bugs in an understandable manner, for instance by providing reports with only a small number of threads and by determining whether a bug is inherently concurrent or may also arise in a sequential context.
  • Liveness: Whereas for most sequential programs, termination is the only relevant liveness property, liveness (such as fairness or the absence of livelocks) is often more prevalent in concurrent programs. It is, therefore, important to develop techniques to check and enforce progress.

Program testing, verification, and repair each offer partial solutions to these challenges. This seminar was conceived with the goal of bringing together these three communities in order to develop a common understanding of the issues as well as to enable collaboration at the level of techniques and tools.

Main Themes

The first step toward exposing, and enabling, synergies between the three main threads of research on correctness and reliability of concurrent programs -- verification, testing and repair -- is to analyze the challenges and contributions pertaining to each of these areas in isolation. We survey work that has been done in each of these communities, based on the available literature and presentations given in the seminar, to summarize the current state of the three communities.


A main challenge in verification of concurrency properties is the prohibitive state space unfolded by thread interleavings. A hybrid solution to this problem is to specialize the static abstraction according to necessary proof conditions, arising during dynamic runs, such that the verification algorithm can scale with fine-grained abstractions (Naik, Yang). Another approach is to retain correlations among local thread states as well as the shared program state (Sagiv, Segalov). In this way, useful invariants can be proved and exploited by the verifier even if an unbounded number of threads is assumed. Refinement techniques are useful when little information is required about the environment to prove a property (Gupta). A useful idea in error reporting is to pinpoint concurrency-specific bugs (differentiating them from sequential bugs) by also running a sequential verifier and performing delta analysis (Joshi). Much like other techniques, verification greatly benefits from user specifications. For example, a parallelizing compiler is more likely to prove disjointness between loop iterations if relevant data structures (or operations) are specified as linearizable (Rinard, Diniz). This also provides a measure of modularity, enabling the separation between library linearizability checking and client verification. Modern program logics (O’Hearn, Parkinson, Gardner) provide a way of constructing correctness proofs for concurrent programs, though in general modular verification of concurrent software remains a hard problem.


Similarly to verification, testing techniques are also challenged by the state-space problem. Several ideas have been proposed in response to this problem. Open-world testing, whereby data structures or libraries referenced by an application are tested in isolation for concurrency bugs (e.g., atomicity violations), reduces the scope of testing considerably (Shacham). Interestingly, even open-world issues that cannot be recreated within the client application are often fixed by developers, which encourages further research into modular consistency properties (e.g., linearizability) (Shacham). Predictive analysis is a recent form of testing that holds the promise of high coverage at an affordable cost (Smaragdakis). Starting from a concrete trace, predictive analysis applies feasibility-preserving transformations (reordering trace events, typically through constraint solving) to detect concurrency bugs, such that soundness is guaranteed (Dolby, Huang). Another source of state-space reduction is to exploit high-level semantic guarantees, like atomicity, to abstract away intermediate trace transitions (Shacham, Tripp). This also relates to error reporting, where certain read/write conflicts give rise to spurious conflicts that can be eliminated with a higher-level view of conflict as lack of commutativity between atomic operations (Koskinen, Kulkarni). Contrary to memory-level conflict detection, commutativity-based testing requires a specification (Shacham, Tripp). Another form of specification refers to consistency relaxations, e.g. permitting certain types of read/write conflict (Thies) or specifying a computation as nondeterministic (Burnim, Tripp).


In program repair, error reporting (or localization) plays a key role, deciding the effective scope and nature of the fix. Pinpointing the exact conditions that give rise to a concurrency bug is thus critical, emphasizing the need for better testing and verification tools. Importantly, incorrect fixing may introduce concurrency bugs (e.g., a deadlock resulting from additional synchronization to fix an atomicity violation), which again highlights the need for better synergy between repair and testing/verification (Liu). Incorrect fixing also turns liveness into a concrete concern: Assuming the program previously terminated, does it also terminate after the fix? Existing solutions that ensure termination rely on iterative transformation methods as well as specialized models like Petri nets (Liu, Zhang). A common assumption in the repair community, to hold back the state-space challenge, is that concurrency bugs involve a small number of threads (typically 2) (Liblit, Liu). The hope is that better synergy with testing and verification can work toward relaxing this assumption. Semantic lifting of the concrete code, exploiting e.g. linearizability, has recently been demonstrated as a useful means to apply bottom-up/top-down fixing: First, the code is lifted into an abstract workflow, and then the workflow is concretized into a correct reimplementation (Liu, Tripp). This motivates further exploration of useful specification media for repair of concurrency defects.

Goals of the Seminar

The goal of the seminar was to promote cross fertilization among the verification, testing and repair communities, as they seem to be running into the same challenges, thereby solving increasingly similar problems. At the extreme, verification is about all possible program behaviors, testing is about running the program to see what it does, and repair is about generating new code. However, many techniques in all communities now blur the distinction. Use of dynamic information to guide abstractions in verification is one example; another is how predictive testing looks for bugs in possible executions close to a dynamic one, leading to a form of verification; finally, program repair increasingly uses solvers to synthesize new programs and test them, which overlaps with techniques from the other areas. We intended for the seminar to bring out further areas in which these fields are closely related, and inspire further techniques that fuse these areas, which was fulfilled by some of the discussions throughout the seminar.

Below are concrete examples of connections that we meant to expose, some of which were discussed throughout the seminar:


Each area has a variety of benchmarks and competitions, and many of them ultimately focus on concurrency-specific challenges like interleavings. It seems likely that the different communities could benefit from sharing. For instance, predictive testing and verification could surely share many benchmarks, and a more standard set of benchmarks could make evaluations easier. At the same time, potential users could help ensure that any benchmarks actually measure what they care about.


Much progress in both testing and verification has been made possible by progress in solver technology, and a variety of solvers are now common in both areas. There is room to share the infrastructure itself and the common remaining challenges.

Hybrid Tools

The path-specific focus of testing and the global focus of verification can aid each other, e.g. current work such as CLAP using a control flow from a specific execution to make model checking more scalable.

Though the seminar touched on techniques and approaches that generalize beyond analysis and repair of concurrent software, we feel that the overall focus on challenges posed by concurrency was justified. With this focus, we were able to stir concrete discussion and tightly connected talks.

Summary text license
  Creative Commons BY 3.0 Unported license
  Julian Dolby, Orna Grumberg, Peter Müller, and Omer Tripp


  • Programming Languages / Compiler
  • Semantics / Formal Methods
  • Verification / Logic


  • Concurrency
  • Testing
  • Program analysis
  • Program verification
  • Model checking
  • Program repair
  • Fault localization


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.