- Natasa Miskov-Zivanov (University of Pittsburgh, US)
- Susanne Bach-Bernhard (for administrative matters)
Moore's law predicted the ever increasing computing power of the past decades from an economic perspective based on doubling the number of elements in a circuit about every two years. Moreover, Moore's law is expected to continue for another 10-20 years. On the physical level this integration is enabled by continuously shrinking feature sizes of basic components. But for future technology nodes reliability problems triggered by different sources are expected to increase rapidly. Process variations in the production process are one issue. While production processes become more accurate considering absolute measures, the relative inaccuracy compared to the component's size is increasing. One consequence are transistors with a wide range of threshold levels resulting in slightly faster or slower operating logic circuitry (both die-to die and within die). This may result for example in delay errors under certain operating conditions of a device. Increasing sensitivity to the omnipresent environmental radiation is another issue. In the past some errors induced by radiation have been observed infrequently while systems in space missions are already specified to be radiation resistant. Shrinking feature sizes result in sensitivity to radiation with lower energy causing more radiation induced events like Single Event Upsets (SEUs) even on sea level. Such effects are summarized as transient faults resulting in soft errors (as opposed to permanent faults resulting in a change of the functionality due to a modification of the physical structure). Consequently, approaches to design reliable circuits tolerating such transient faults without causing soft errors have been proposed. These design approaches to mitigate soft errors comprise all levels of design abstraction from the system specification down to the layout. Examples for these approaches are, e.g., fault tolerant algorithms and operating systems, fault tolerant processors, self-calibrating architectures, block level redundancy and error checking, synthesis approaches on the gate level, or hardening techniques on the layout level. In practical systems typically multiple mitigation techniques are implemented to guarantee reliability across the full system stack. Functional verification has been and still is a challenge in current designs containing up to hundreds millions of transistors. Mature techniques for the formal verification and the dynamic verification of large systems exist. Research in verification is ongoing to match the rapid increase of the size of the systems. The verification of reliability is an interdisciplinary topic involving at least testing technology, verification methodology, and design experience. This makes the verification of reliable implementations an even harder problem. The testing community provides underlying models for transient faults to understand the effects at the functional and eventually at the system level. Using these models, the verification community designs efficient analysis tools and verification techniques to handle large systems. As in standard verification of large circuits a concerted action of formal methods, semi-formal techniques and simulation-based validation will be required. Still knowledge from the design community is required, to further speed up the verification task. Understanding the implemented approach to reliability on the application level and the system level is required to achieve a high degree of automation in the verification task.
The seminar was organized in short slots for talks followed by extensive discussions. A panel discussion in the afternoon summarized each day and focused on further questions. Each day was devoted to a special topic:
- Design -- Techniques to ensure reliability by design.
- Fault models -- Different types of fault models are required depending on the abstraction level and the type of design considered.
- Metrics -- Measuring reliability requires some kinds of metrics. These metrics can be defined with respect to the fault models. But they should also reflect potential inaccuracies.
- Engines -- Different types of engines are used in Electronic Design Automation (EDA) for circuits and systems.
Documenting the results of intensive discussions in a compact manner is difficult. However, some results can be formulated in crisp statements. Approximate computing is a powerful technique for reliable design where the applications permit inaccuracy of operations up to a certain extent. Computing considering statistical nature of devices may be able to produce very accurate results, but providing compatible computing fabric at acceptable costs is a challenge. No single fault model will cover all aspects of reliability. In particular, fault models must be adapted to the application domain, the level of criticality and the step in the design process that is being considered. Appropriate metrics will then be applied to bridge gaps, e.g., between different levels of abstraction. An orchestration of reasoning engines ranging from formal techniques to simulation and emulation will always be required to gather data required for the different metrics. Design for Reliability will always affect all levels of abstraction. Only by concerted effort the same performance gains can be expected that we have seen in the past 50 years.
As a follow-up of the Dagstuhl Seminar, an Embedded Tutorial was successfully proposed for the DATE conference 2013. The Embedded Tutorial's title is "Reliability Analysis Reloaded: How Will We Survive?" and will include two presentations given by participants of the seminar or colleagues belonging to the research group of a participant.
- Jacob A. Abraham (University of Texas - Austin, US)
- Robert Aitken (ARM Inc. - San Jose, US)
- Eli Arbel (IBM - Haifa, IL)
- Bernd Becker (Universität Freiburg, DE) [dblp]
- Shawn Blanton (Carnegie Mellon University, US)
- Cecile Braunstein (UPMC - Paris, FR)
- Mehdi Dehbashi (Universität Bremen, DE)
- Giuseppe Di Guglielmo (University of Verona, IT)
- Rolf Drechsler (Universität Bremen, DE) [dblp]
- Görschwin Fey (Universität Bremen, DE)
- Masahiro Fujita (University of Tokyo, JP) [dblp]
- Carsten Gebauer (Robert Bosch GmbH - Schwieberdingen, DE)
- Jie Han (University of Alberta, CA) [dblp]
- John P. Hayes (University of Michigan - Ann Arbor, US)
- Bodo Hoppe (IBM Deutschland - Böblingen, DE)
- Ravishankar K. Iyer (University of Illinois - Urbana Champaign, US) [dblp]
- Seiji Kajihara (Kyushu Institute of Technology, JP) [dblp]
- Yoshiki Kinoshita (AIST - Hyogo, JP) [dblp]
- Wenchao Li (University of California - Berkeley, US) [dblp]
- Igor L. Markov (University of Michigan - Ann Arbor, US)
- Yusuke Matsunaga (Kyushu University - Fukuoka, JP)
- Subhasish Mitra (Stanford University, US) [dblp]
- Michael Orshansky (University of Texas - Austin, US) [dblp]
- Laurence Pierre (TIMA - Grenoble, FR)
- Ilia Polian (Universität Passau, DE) [dblp]
- Anand Raghunathan (Purdue University - West Lafayette, US) [dblp]
- Sudhakar M. Reddy (University of Iowa - Iowa City, US)
- Kaushik Roy (Purdue University - West Lafayette, US) [dblp]
- Sachin Sapatnekar (University of Minnesota - Minneapolis, US)
- Yasuo Sato (Kyushu Institute of Technology, JP)
- Matthias Sauer (Universität Freiburg, DE) [dblp]
- Ulf Schlichtmann (TU München, DE) [dblp]
- Marcela Simková (Brno University of Technology, CZ)
- Adit Singh (Auburn University, US) [dblp]
- Matteo Sonza Reorda (Polytechnic University of Torino, IT) [dblp]
- Mehdi B. Tahoori (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Massimo Violante (Polytechnic University of Torino, IT)
- Tomohiro Yoneda (National Institute of Informatics - Tokyo, JP)
- fault modeling
- formal methods