http://www.dagstuhl.de/10181

02. – 07. Mai 2010, Dagstuhl Seminar 10181

Program Development for Extreme-Scale Computing

Organisatoren

Jesus Labarta (Barcelona Supercomputing Center, ES)
Barton P. Miller (University of Wisconsin – Madison, US)
Bernd Mohr (Jülich Supercomputing Centre, DE)
Martin Schulz (LLNL – Livermore, US)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Seminar Proceedings DROPS
Teilnehmerliste
Programm des Dagstuhl Seminars [pdf]

Press Room

Summary

The number of processor cores available in high-performance computing systems is steadily increasing. A major factor is the current trend to use multi-core and many-core processor chip architectures. In the November 2009 list of the TOP500 Supercomputer Sites, 98.4% of the systems listed have more than 2048 processor cores and the average is about 9300. While these machines promise ever more compute power and memory capacity to tackle today's complex simulation problems, they force application developers to greatly enhance the scalability of their codes to be able to exploit it. This often requires new algorithms, methods or parallelization schemes as many well-known and accepted techniques stop working at such large scales. It starts with simple things like opening a file per process to save checkpoint information, or collecting simulation results of the whole program via a gather operation on a single process, or previously unimportant order O(n2)-type operations that now quickly dominate the execution. Unfortunately many of these performance problems only show up when executing with very high numbers of processes and cannot be easily diagnosed or predicted from measurements at lower scales. Detecting and diagnosing these performance and scalability bottlenecks requires sophisticated performance instrumentation, measurement and analysis tools. Simple tools typically scale very well but the information they provide proves to be less and less useful at these high scales. Clearly, understanding performance and correctness problems of applications requires running, analyzing, and drawing insight into these issues at the largest scale.

Consequently, a strategy for software development tools for extreme-scale systems must address a number of dimensions. First, the strategy must include elements that directly address extremely large task and thread counts. Such a strategy is likely to use mechanisms that reduce the number of tasks or threads that must be monitored. Second, less clear but equally daunting, is the fact that several planned systems will be composed of heterogeneous computing devices. Performance and correctness tools for these systems are very immature. Third, the strategy requires a scalable and modular infrastructure that allows rapid creation of new tools that respond to the unique needs that may arise as extreme-scale systems evolve. Further, a successful tools strategy must enable productive use of systems that are by definition unique. Thus, it must provide the full range of traditional software development tools, from debuggers and other code correctness tools such as memory analyzers, performance analysis tools as well as build environments for complex codes that rely on a diverse and rapidly changing set of support libraries.

Many parallel tools research groups have already started to work on scaling their methods, techniques, and tools to extreme processor counts. In this Dagstuhl seminar, we wanted participants from Universities, government laboratories and industry to report on their successes or failures in scaling their tools, review existing working and promising new methods and techniques, and discuss strategies for solving unsolved issues and problems.

This meeting was the forth in a series of seminars related to the topic "Performance Analysis of Parallel and Distributed Programs", with previous meetings being the Dagstuhl Seminar 07341 on "Code Instrumentation and Modeling for Parallel Performance Analysis" in August 2007, Seminar 02341 on "Performance Analysis and Distributed Computing" held in August 2002, and Seminar 05501 on "Automatic Performance Analysis" in December 2005.

The seminar brought together a total of 46 researchers and developers working in the area of performance from universities, national research laboratories and, especially important, from three major computer vendors. The goals were to increase the exchange of ideas, knowledge transfer, foster a multidisciplinary approach to attacking this very important research problem with direct impact on the way in which we design and utilize parallel systems to achieve high application performance.

Dagstuhl Seminar Series

Classification

  • Modeling/simulation
  • Optimization/scheduling
  • Porgramming Languages/compiler
  • Sw-engineering

Keywords

  • Program instrumentation
  • Performance analysis
  • Parallel computing

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.