http://www.dagstuhl.de/08371

September 7 – 10 , 2008, Dagstuhl Seminar 08371

Fault-Tolerant Distributed Algorithms on VLSI Chips

Organizers

Bernadette Charron-Bost (Ecole Polytechnique – Palaiseau, FR)
Shlomi Dolev (Ben Gurion University – Beer Sheva, IL)
Jo Ebergen (Sun Microsystems – Menlo Park, US)
Ulrich Schmid (TU Wien, AT)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Seminar Proceedings DROPS
List of Participants
Dagstuhl's Impact: Documents available

Summary

The Dagstuhl seminar 08371 on Fault-Tolerant Distributed Algorithms on VLSI Chips was devoted to exploring whether the wealth of existing fault-tolerant distributed algorithms research can be utilized for meeting the challenges of future-generation VLSI chips. Participants from both the distributed fault-tolerant algorithms community, interested in this emerging application domain, and from the VLSI systems-on-chip and digital design community, interested in well-founded system-level approaches to fault-tolerance, surveyed the current state-of-the-art and tried to identify possibilities to work together. The seminar clearly achieved its purpose: It became apparent that most existing research in Distributed Algorithms is too heavy-weight for being immediately applied in the "core" VLSI design context, where power, area etc. are scarce resources. At the same time, however, it was recognized that emerging trends like large multicore chips and increasingly critical applications create new and promising application domains for fault-tolerant distributed algorithms. We are convinced that the very fruitful cross-community interactions that took place during the Dagstuhl seminar will contribute to new research activities in those areas.

Description

Shrinking feature sizes and increasing clock speeds are the most visible signs of the tremendous advances in VLSI design, which will accommodate billions of transistors on a single in the near future. This comes, however, at the price of increased system-level complexity: In today’s deep submicron technology with GHz clock speeds, wiring delays dominate transistor switching delays, and signals cannot traverse the whole die within single clock cycle any more. In fact, a modern VLSI chip can no longer be viewed as a monolithic block of synchronous hardware, where all state transitions occur simultaneously. Rather, VLSI chips are nowadays considered as systems of interacting subsystems — the advent of Systems-on-Chip (SoC)and Networks-on-Chip (NoC).

In addition, ever-increasing manufacturing variabilities increase the defect ratio, and the reduced voltage swing needed for high clock speeds and low power consumption also increases the adverse effects of -particle and neutron hits during operation, as well as cross-talk and ground-bouncing sensitivity. The resulting increase of the transient failure rate (soft-error rate), which was negligible in most former-generation chips, has hence raised general concerns about the dependability of future generation VLSI chips. Consequently, suitable fault-tolerance mechanisms with respect to timing errors or value errors are vital for such devices: Fine-grained fault-tolerance like radiation-hardening, fault masking at transistor or gate level, error-correcting codes or error detection and recovery are the primary methods of choice here.

Due to the above trends, however, modern VLSI chips have much in common with the loosely-coupled distributed systems that have been studied by the fault-tolerant distributed algorithms community for decades. System-level fault tolerance based on replication and distributed agreement is the dominant approach here, and a wealth of different computing and failure models, algorithms & protocols, and theoretical results regarding solvability of problems and achievable performance have been established in the past.

The purpose of our Dagstuhl seminar was to explore whether fault-tolerant distributed algorithms research can indeed be utilized for meeting the challenges of future-generation VLSI chips: Just as Temporal Logic, established in the distributed computing scope decades ago, found its way to the VLSI domain, other radically new solutions and methods may also find their way. And indeed, some recent research suggested a positive answer to this question: For example, demonstrated that distributed fault-tolerant clock generation algorithms can be adapted to the very special requirements of VLSI chips, and demonstrated that self-stabilization is a very promising approach for designing robust VLSI chips.

Fifteen participants from the distributed fault-tolerant algorithms community (and related fields, like verification), interested in the new application domain of VLSI chips, and twelve participants from the VLSI community, interested in system-level approaches to fault-tolerance, joined at Dagstuhl in order to survey the current state-of-the-art and identify possibilities to work together.

The presentations and the unique setting of Dagstuhl, with its relaxed and stimulating atmosphere, fully achieved their purpose: Long discussions during the official seminar, and many fruitful cross-community interactions during the free times were stimulated, which even exceeded the amount of available time.

Classification

  • Data Structures / Algorithms / Complexity
  • Networks
  • Hardware

Keywords

  • Fault-tolerant distributed algorithms
  • System-level fault tolerance
  • VLSI systems-on-chip
  • Digital logic
  • Formal specification

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support