http://www.dagstuhl.de/12341

19. – 24. August 2012, Dagstuhl Seminar 12341

Verifying Reliability

Organisatoren

Görschwin Fey (Universität Bremen, DE)
Masahiro Fujita (University of Tokyo, JP)
Kaushik Roy (Purdue University – West Lafayette, US)
Matteo Sonza Reorda (Polytechnic University of Torino, IT)

Koordinatoren

Natasa Miskov-Zivanov (University of Pittsburgh, US)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 2, Issue 8 Dagstuhl Report
Teilnehmerliste
Gemeinsame Dokumente
Programm des Dagstuhl Seminars [pdf]

Summary

Moore's law predicted the ever increasing computing power of the past decades from an economic perspective based on doubling the number of elements in a circuit about every two years. Moreover, Moore's law is expected to continue for another 10-20 years. On the physical level this integration is enabled by continuously shrinking feature sizes of basic components. But for future technology nodes reliability problems triggered by different sources are expected to increase rapidly. Process variations in the production process are one issue. While production processes become more accurate considering absolute measures, the relative inaccuracy compared to the component's size is increasing. One consequence are transistors with a wide range of threshold levels resulting in slightly faster or slower operating logic circuitry (both die-to die and within die). This may result for example in delay errors under certain operating conditions of a device. Increasing sensitivity to the omnipresent environmental radiation is another issue. In the past some errors induced by radiation have been observed infrequently while systems in space missions are already specified to be radiation resistant. Shrinking feature sizes result in sensitivity to radiation with lower energy causing more radiation induced events like Single Event Upsets (SEUs) even on sea level. Such effects are summarized as transient faults resulting in soft errors (as opposed to permanent faults resulting in a change of the functionality due to a modification of the physical structure). Consequently, approaches to design reliable circuits tolerating such transient faults without causing soft errors have been proposed. These design approaches to mitigate soft errors comprise all levels of design abstraction from the system specification down to the layout. Examples for these approaches are, e.g., fault tolerant algorithms and operating systems, fault tolerant processors, self-calibrating architectures, block level redundancy and error checking, synthesis approaches on the gate level, or hardening techniques on the layout level. In practical systems typically multiple mitigation techniques are implemented to guarantee reliability across the full system stack. Functional verification has been and still is a challenge in current designs containing up to hundreds millions of transistors. Mature techniques for the formal verification and the dynamic verification of large systems exist. Research in verification is ongoing to match the rapid increase of the size of the systems. The verification of reliability is an interdisciplinary topic involving at least testing technology, verification methodology, and design experience. This makes the verification of reliable implementations an even harder problem. The testing community provides underlying models for transient faults to understand the effects at the functional and eventually at the system level. Using these models, the verification community designs efficient analysis tools and verification techniques to handle large systems. As in standard verification of large circuits a concerted action of formal methods, semi-formal techniques and simulation-based validation will be required. Still knowledge from the design community is required, to further speed up the verification task. Understanding the implemented approach to reliability on the application level and the system level is required to achieve a high degree of automation in the verification task.

Organization

The seminar was organized in short slots for talks followed by extensive discussions. A panel discussion in the afternoon summarized each day and focused on further questions. Each day was devoted to a special topic:

  • Design -- Techniques to ensure reliability by design.
  • Fault models -- Different types of fault models are required depending on the abstraction level and the type of design considered.
  • Metrics -- Measuring reliability requires some kinds of metrics. These metrics can be defined with respect to the fault models. But they should also reflect potential inaccuracies.
  • Engines -- Different types of engines are used in Electronic Design Automation (EDA) for circuits and systems.

Results

Documenting the results of intensive discussions in a compact manner is difficult. However, some results can be formulated in crisp statements. Approximate computing is a powerful technique for reliable design where the applications permit inaccuracy of operations up to a certain extent. Computing considering statistical nature of devices may be able to produce very accurate results, but providing compatible computing fabric at acceptable costs is a challenge. No single fault model will cover all aspects of reliability. In particular, fault models must be adapted to the application domain, the level of criticality and the step in the design process that is being considered. Appropriate metrics will then be applied to bridge gaps, e.g., between different levels of abstraction. An orchestration of reasoning engines ranging from formal techniques to simulation and emulation will always be required to gather data required for the different metrics. Design for Reliability will always affect all levels of abstraction. Only by concerted effort the same performance gains can be expected that we have seen in the past 50 years.

As a follow-up of the Dagstuhl Seminar, an Embedded Tutorial was successfully proposed for the DATE conference 2013. The Embedded Tutorial's title is "Reliability Analysis Reloaded: How Will We Survive?" and will include two presentations given by participants of the seminar or colleagues belonging to the research group of a participant.

Classification

  • Hardware
  • Verification/logic
  • Modeling/simulation

Keywords

  • Reliability
  • Fault modeling
  • Formal methods

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.