16.05.16 - 20.05.16, Seminar 16201

Synergies among Testing, Verification, and Repair for Concurrent Programs

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.

Motivation

Concurrency bugs are notoriously hard to reproduce, localize and repair. This is because of the explosion in behavior space of concurrent software systems compared to sequential codes. This presents a challenge for bug-finding, verification, and bug-fixing tools, which are all required to disambiguate true problems from spurious defects and represent the defects in an effective and actionable manner.

As an example, a concrete-level data-race situation, whereby two threads access a shared memory location without proper synchronization, and at least one of these threads writes to that location, is not necessarily a bug. If such a race occurs within a lock-free data structure, then it may be admissible as part of some higher-level transactional behavior enforced by the data-structure operation. In other situations, where a race indeed proves to be a bug, there is the challenging question of linking the race back to its cause such as insufficient synchronization or failure to check a precondition.

Other questions that arise are whether the bug can be reproduced with only a small number of threads (or more generally, what the minimal reproduction scenario is for the bug); whether it is inherently concurrent or may also arise in a sequential context; whether it arises due to erroneous application code, library code, or a violation of the contract between the client and library; etc.

The inherent difficulties of reasoning about concurrency bugs complicate testing, verification, and repair. For a testing algorithm, the challenge is not only to come up with effective inputs, but also to induce thread interleavings that are likely to expose concurrency defects, and beyond that, the testing algorithm must also be able to characterize the bugs it observes to output actionable and reproducible bug reports. For a verification technique, there are for instance the challenges to support specifications that permit thread-modular and procedure-modular verification (even if threads share resources), to reason about thread interactions more abstractly than by exploring all possible interleavings to achieve scalability, and to capture the subtle semantics of weak memory models. For a repair algorithm, it is pertinent to introduce the “minimal” amount of synchronization to eliminate a problem, as otherwise the fix becomes a performance bottleneck, obviating the benefit of parallelism. The threat of introducing other bugs through the fix also exists (a classic example being the introduction of a deadlock when fixing an atomicity violation).

The testing, verification and repair communities have largely worked independently on the problems of concurrency, which remain extremely difficult for all three of these communities. What is missing is more sharing of techniques and algorithms between the communities to create “hybrid” approaches for tackling concurrency bugs. One example would be to guide the formulation and refinement of verification models by dynamic testing. Another example is to derive dynamic performance metrics from testing to inform the repair algorithm. Yet another possibility is to instantiate exhaustive testing as a form of bounded model checking or utilize testing within verification to confirm whether an alarm is correct.

A main goal of our proposed seminar is to bring together researchers from the bug-finding, verification and repair communities to think through these and other ideas, present their techniques and algorithms, and examine together hybrid analysis models. We hope to also benefit from industry practitioners, who have developed their own techniques and practices for detecting, isolating and fixing concurrency bugs, and could orient the discussion toward practical challenges and solutions.