21.04.14 - 25.04.14, Seminar 14171

Evaluating Software Verification Systems: Benchmarks and Competitions

The following text appeared on our web pages prior to the seminar, and was included as part of the invitation.

Motivation

In the last decade, technologies for the formal verification of software - such as abstract interpretation, deductive verification, and model checking - have been rapidly maturing and can be expected to complement and partly replace traditional software engineering methods. It has also become accepted wisdom that regular comparative evaluation of systems helps focus research, identify relevant problems, bolster development, and advance the field in general. Benchmark libraries and competitions are two popular evaluation approaches.

The desire to evaluate software verification systems has raised many still unanswered questions. What are the proper empirical approaches and criteria for effective comparative evaluation? What are the appropriate hardware and software environments? How to assess usability of verification systems? How to design, acquire, structure, publish, and use benchmarks and problem collections? For verification systems that require user interaction, there is an additional dimension of questions involving the human factor.

The seminar aims to advance comparative empirical evaluation of software verification systems by bringing together current and future competition organizers and participants, benchmark maintainers, as well as practitioners and researchers interested in the topic.

Objectives

  1. To advance the technical state of comparative empirical evaluation of software verification tools.
  2. To achieve cross-fertilization between verification communities on common/related issues such as selection of relevant problems, notions of correctness, questions of programming language semantics, etc.
  3. To explore common evaluation of different kinds of verification tools, its appropriate scope and techniques.
  4. To raise mutual awareness between verification communities concerning terminology, techniques and trends.
  5. To promote comparative empirical evaluation in the larger formal methods community.

Structure

  1. Several half-day hands-on tool evaluation sessions in the style of competitions. We intend to experiment with different formats and setups.
  2. Discussions on open issues outlined above, further fueled by the results of evaluation sessions.
  3. Presentations on latest developments in the field.
  4. Several longer, tutorial-style presentations to give participants a better understanding of a particular tool and its capabilities resp. provide an overview of a particular class of tools.