Dagstuhl-Seminar 13142: Correct and Efficient Accelerator Programming

Dagstuhl-Seminar 13142

Correct and Efficient Accelerator Programming

( 01. Apr – 04. Apr, 2013 )

(zum Vergrößern in der Bildmitte klicken)

Permalink

Bitte benutzen Sie folgende Kurz-Url zum Verlinken dieser Seite: https://www.dagstuhl.de/13142

Organisatoren

Albert Cohen (ENS - Paris, FR)
Alastair F. Donaldson (Imperial College London, GB)
Marieke Huisman (University of Twente, NL)
Joost-Pieter Katoen (RWTH Aachen, DE)

Kontakt

Annette Beyer (für administrative Fragen)

Publikationen

Correct and Efficient Accelerator Programming (Dagstuhl Seminar 13142). Albert Cohen, Alastair F. Donaldson, Marieke Huisman, and Joost-Pieter Katoen. In Dagstuhl Reports, Volume 3, Issue 4, pp. 17-33, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2013)

Motivation

Show Motivation

In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users. They overcome the memory in-efficiency in multi-core CPUs by equipping each processor element (PE) with a small amount of on-chip private memory, and by providing local memory that is shared among groups of PEs. As a result, private and local memory can be accessed by a PE with no or minimal contention. An accelerator typically has a special-purpose instruction set architecture and organization, geared towards the application domain that it has been designed to accelerate. Thus, if the right accelerator is applied to the right application, special-purpose hardware support can lead to very high performance. Finally, in massively parallel accelerators it is common to reduce the clock frequency of individual PEs, allowing more PEs to be accommodated. Summarizing, as stated by Garland and Kirk in their CACM paper in 2010:

"For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs"

Using accelerators, tasks such as media processing, simulation and eye-tracking can be accelerated to beat CPU performance by orders of magnitude. Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices.

Accelerators present however a serious challenge for software developers. A system may contain one or more of the plethora of accelerators on the market, with many more products anticipated in the immediate future. Software for accelerators is currently written in low-level languages, such as OpenCL, CUDA, or architecture-specific assembly code. This leads to increased development costs and complex maintenance of multiple platforms. In addition, performance problems occur as applications optimised for one platform may not perform well on others, and due to the increased usage of GPUs in safety-critical domains (such as medical image processing), correctness of accelerator programs is of vital importance. Applications must thus exhibit portable correctness, operating correctly on any configuration of accelerators, and have portable performance, exploiting processing power and energy efficiency offered by a wide range of devices.

The aim of this Dagstuhl Seminar is to bring together researchers from various sub- disciplines of computer science, such as programming languages for multi-core systems and their compilation, and researchers working on the verification of multi-core programs and their data structures, to brainstorm and discuss on design techniques and tools for correct and efficient accelerator programming:

Novel & attractive methods for constructing system-independent accelerator programs;
Advanced code generation techniques to produce highly optimized system-specific code from system-independent programs;
Scalable static techniques for analysing system-independent and system-specific accelerator programs both qualitatively and quantitatively.

Central topics of the seminar are portable performance and portable correctness. Software exhibits portable performance (time and energy-wise) if it performs acceptably well across accelerator devices in general, and near optimally on specific, widely used platforms. Portable correctness, with respect to a programming language specification, is achieved when correctness can be established in a device-independent manner.

Summary

Show Summary

In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users. Accelerators offer tremendous compute power at a low cost, and tasks such as media processing, simulation and eye-tracking can be accelerated to beat CPU performance by orders of magnitude. Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices. Accelerators present a serious challenge for software developers. A system may contain one or more of the plethora of accelerators on the market, with many more products anticipated in the immediate future. Applications must exhibit portable correctness, operating correctly on any configuration of accelerators, and portable performance, exploiting processing power and energy efficiency offered by a wide range of devices.

The seminar focussed on the following areas:

Novel and attractive methods for constructing system-independent accelerator programs;
Advanced code generation techniques to produce highly optimised system-specific code from system-independent programs;
Scalable static techniques for analysing system-independent and system-specific accelerator programs both qualitatively and quantitatively.

The seminar featured five tutorials providing an overview of the landscape of accelerator programming:

Architecture -- Anton Lokhmotov, ARM
Programming models -- Lee Howes, AMD
Compilation techniques -- Sebastian Hack, Saarland University
Verification - Ganesh Gopalakrishnan, University of Utah
Memory models -- Jade Alglave, University College London

In addition, there were short presentations from 12 participants describing recent results or work-in-progress in these areas, and two discussion sessions:

Domain specific languages for accelerators;
Verification techniques for GPU-accelerated software.

Due to the "correctness" aspect of this seminar, there was significant overlap of interest with a full week seminar on Formal Verification of Distributed Algorithms running in parallel. To take advantage of this overlap a joint session was organised, featuring a talk on verification of GPU kernels by Alastair Donaldson, Imperial College London (on behalf of the Correct and Efficient Accelerator Programming seminar) and a talk on GPU-accelerated runtime verification by Borzoo Bonakdarpour, University of Waterloo, on behalf of the Formal Verification of Distributed Algorithms seminar.

Creative Commons BY 3.0 Unported license

Albert Cohen, Alastair F. Donaldson, Marieke Huisman, and Joost-Pieter Katoen

Teilnehmer

Zeige Teilnehmer

Jade Alglave (University College London, GB) [dblp]
Adam Betts (Imperial College London, GB) [dblp]
Albert Cohen (ENS - Paris, FR) [dblp]
Christian Dehnert (RWTH Aachen, DE) [dblp]
Dino Distefano (Queen Mary University of London, GB) [dblp]
Alastair F. Donaldson (Imperial College London, GB) [dblp]
Jeremy Dubreil (Monoidics Ltd. - London, GB) [dblp]
Benoit Dupont de Dinechin (Kalray - Orsay, FR) [dblp]
Ganesh L. Gopalakrishnan (University of Utah - Salt Lake City, US) [dblp]
Sebastian Hack (Universität des Saarlandes, DE) [dblp]
Lee Howes (AMD - Sunnyvale, US) [dblp]
Marieke Huisman (University of Twente, NL) [dblp]
Christina Jansen (RWTH Aachen, DE) [dblp]
Joost-Pieter Katoen (RWTH Aachen, DE) [dblp]
Jeroen Ketema (Imperial College London, GB) [dblp]
Alexander Knapp (Universität Augsburg, DE) [dblp]
Georgia Kouveli (ARM Ltd. - Cambridge, GB) [dblp]
Alexey Kravets (ARM Ltd. - Cambridge, GB) [dblp]
Anton Lokhmotov (ARM Ltd. - Cambridge, GB) [dblp]
Roland Meyer (TU Kaiserslautern, DE) [dblp]
Cedric Nugteren (TU Eindhoven, NL) [dblp]
Zvonimir Rakamaric (University of Utah - Salt Lake City, US) [dblp]
Oliver Reiche (Universität Erlangen-Nürnberg, DE) [dblp]
Philipp Rümmer (Uppsala University, SE) [dblp]
Ana Lucia Varbanescu (TU Delft, NL) [dblp]
Sven Verdoolaege (INRIA - Le Chesnay, FR) [dblp]
Jules Villard (University College London, GB) [dblp]
Heike Wehrheim (Universität Paderborn, DE) [dblp]
Anton Wijs (TU Eindhoven, NL) [dblp]
Marina Zaharieva-Stojanovski (University of Twente, NL) [dblp]
Dong Ping Zhang (AMD - Sunnyvale, US) [dblp]

Klassifikation

hardware
programming languages / compiler
semantics / formal methods

Schlagworte

Intermediate languages
multi-core programming
polyhedral compilation
portability
formal verification
correctness
efficiency

Seminar 13142

Suche auf der Schloss Dagstuhl Webseite

Schloss Dagstuhl Services

Seminare

Innerhalb dieser Seite:

Externe Seiten:

Publishing

Innerhalb dieser Seite:

Externe Seiten:

dblp

Innerhalb dieser Seite:

Externe Seiten:

Dagstuhl-Seminar 13142

Correct and Efficient Accelerator Programming

( 01. Apr – 04. Apr, 2013 )

Permalink

Organisatoren

Kontakt

Publikationen

Motivation

Summary

Teilnehmer

Klassifikation

Schlagworte