Dagstuhl Seminar 17202
Challenges and Opportunities of User-Level File Systems for HPC
( May 14 – May 19, 2017 )
- André Brinkmann (Universität Mainz, DE)
- Kathryn Mohror (LLNL - Livermore, US)
- Weikuan Yu (Florida State University - Tallahassee, US)
- Susanne Bach-Bernhard (for administrative matters)
- Ad Hoc File Systems for High-Performance Computing : article - Brinkmann, Andre; Pfreundt, Franz Josef; Mohror, Kathryn; Cortes, Toni; Yu, Weikuan; Carns, Philip; Klasky, Scott A.; Miranda, Alberto; Ross, Robert B.; Vef, Marc-André - Berlin : Springer, 2020. - pp. 4-26 - (Journal of Computer Science and Technology ; 35. 2020, 1).
Although the benefits of hierarchical storage have been adequately demonstrated to the point that the newest leadership class HPC systems will employ burst buffers, critical questions remain for supporting hierarchical storage systems, including: How should we present hierarchical storage systems to user applications, such that they are easy to use and that application code is portable across systems? How should we manage data movement through a storage hierarchy for best performance and resilience of data? How do the particular I/O use cases mandate the way we manage data? There have been many efforts to explore this space in the form of file systems, with increasingly more implemented at the user level. This is because it is relatively easy to swap in new, specialized user-level file systems for use by applications on a case-by-case basis, as opposed to the current mainstream approach of using general-purpose, system-level file systems which may not be optimized for HPC workloads and must be installed by administrators. In contrast, file systems at the user level can be tailored for specific HPC workloads for high performance and can be used by applications without administrator intervention.
Many such user-level file system developers have found themselves “having to reinvent the wheel” to implement various optimizations in their file systems. Thus, it would benefit the larger community if we could develop a common framework for HPC file system development. The framework we envision would be based on solid software engineering practices and would enable “plug and play” interchange of basic file system components. Having this framework would facilitate both the operation needs of current production systems and the research efforts for novel features in future systems. On the production side, the framework would provide a trusted backbone upon which users and developers could rely for basic services. For research, the backbone of the framework would enable rapid prototyping of file system components so that students and other researchers did not have to implement the entire framework simply to experiment with a specific optimization.
The goal of this Dagstuhl Seminar is to bring together experts in I/O performance, file systems, and storage, and collectively explore the space of current and future problems and solutions for I/O on hierarchical storage systems in order to begin a community effort in designing a user-level file system framework for HPC systems. We expect a lively week of learning about each other’s approaches as well as unique I/O use cases that can influence the design of a community-driven file system framework. Our hope is that the effort initiated during this seminar will result in a long-term collaboration that will benefit the HPC community as a whole.
The primary goal of this Dagstuhl Seminar was to bring together experts in I/O performance, file systems, and storage, and collectively explore the space of current and future problems and solutions for I/O on hierarchical storage systems. We had a lively week of learning about each other’s approaches as well as unique I/O use cases that can influence the design of a community-driven file and storage system standards. We also engaged in several informal, in-depth discussions on questions surrounding how we should best move forward in the I/O and storage community.
A portion of agenda for this meeting was partitioned into sessions containing short talks. The short talk sessions were grouped into high level topic areas: high performance computing and storage systems today; user needs for I/O; user level file system implementations; object stores and alternatives; and file systems building blocks. The intention behind the short talks was to acquaint the attendees with each other's work and to inspire further discussions. Following each talk topic, we had panel-style discussions with the talk speakers serving as the panel. In these panel-style discussions, the audience had the opportunity to ask questions about the speakers' talks as well as note and discuss commonalities and differences across the presentations.
The remainder of the agenda for the meeting was reserved for open discussions with the entire group. The participants engaged in lengthy discussions on various questions that arose from the talks. Additionally, participants were encouraged to propose and vote for discussion topics on a white board. The proposed topics with the most votes were included in the agenda. The in-depth discussion topics included:
- How are stage-in and stage-out operations actually going to work?
- How can we fairly judge the performance of storage systems - IO 500?
- What is a user-level file system? What do we mean when we say that?
- How can we characterize what users need from storage systems?
- Are we ready to program to a memory hierarchy versus block devices?
- and What should we do about POSIX?
The combination of short talks and open discussions resulted in a fruitful meeting. Since the work of the participants was not necessarily familiar to all, the short talks provided a foundation for getting everyone oriented with each other's efforts. Once that was achieved, we were able to productively dive into the informal topic discussions. Overall, several common themes emerged from the talks and discussions. The participants agreed that these themes were important to address to meet the needs of HPC applications on next-generation storage systems. We include these themes in this report in Section 9 to serve as suggestions for further investigations.
Here we present an overview of the topics in this report to guide the reader. Our goal in this report is to capture as much information as possible from the seminar so that those who could not attend can benefit from the talks and discussions.
We detail the short talk sessions in Sections 3-7. First, we provide a summary of the notes from the session note taker and other comments from the talks and panel discussions. Following this summary we provide a listing of each talk in the session and its abstract.
In Sections 8.1-8.6 we give summaries of the informal discussion sessions. The summaries in this case are in outline format in order to capture the conversational and informal nature of the sessions. In many cases, the discussions drew out many interesting questions instead of clear paths forward, so the outline format captured this well.
Following the summaries of the sessions, in Section 9 we conclude with a discussion of recurring themes, including issues for future discussion and work, that occurred during the meeting. We feel that these themes are the true product of this meeting and can serve as a foundation for future meetings or other community efforts.
- Stergios V. Anastasiadis (University of Ioannina, GR) [dblp]
- Ned Bass (LLNL - Livermore, US)
- John Bent (Seagate Government Solutions - Herndon, US) [dblp]
- Thomas Bönisch (HLRS - Stuttgart, DE) [dblp]
- Luc Bougé (INRIA - Rennes, FR) [dblp]
- André Brinkmann (Universität Mainz, DE) [dblp]
- Suren Byna (Lawrence Berkeley National Laboratory, US) [dblp]
- Philip Carns (Argonne National Laboratory, US) [dblp]
- Andreas Dilger (Intel Corporation - Vancouver, CA) [dblp]
- Lance Evans (Cray Inc. - Seattle, US) [dblp]
- Wolfgang Frings (Jülich Supercomputing Centre, DE) [dblp]
- Hermann Härtig (TU Dresden, DE) [dblp]
- Dean Hildebrand (IBM Almaden Center - San Jose, US) [dblp]
- Nathan Hjelm (Los Alamos National Laboratory, US) [dblp]
- Scott Klasky (Oak Ridge National Laboratory, US) [dblp]
- Michael Kluge (TU Dresden, DE) [dblp]
- Michael Kuhn (Universität Hamburg, DE) [dblp]
- Jay Lofstead (Sandia National Labs - Albuquerque, US) [dblp]
- Satoshi Matsuoka (Tokyo Institute of Technology, JP) [dblp]
- Alberto Miranda (Barcelona Supercomputing Center, ES) [dblp]
- Kathryn Mohror (LLNL - Livermore, US) [dblp]
- David Montoya (Los Alamos National Lab., US) [dblp]
- Federico Padua (Universität Mainz, DE) [dblp]
- Mark Parsons (University of Edinburgh, GB) [dblp]
- Maria S. Perez (Universidad Politécnica de Madrid, ES) [dblp]
- Franz Josef Pfreundt (Fraunhofer ITWM - Kaiserslautern, DE) [dblp]
- Raghunath Raja Chandrasekar (Cray Inc. - Seattle, US) [dblp]
- Robert B. Ross (Argonne National Laboratory, US) [dblp]
- Brad Settlemyer (Los Alamos National Laboratory, US) [dblp]
- Osamu Tatebe (University of Tsukuba, JP) [dblp]
- Tianqi Xu (Tokyo Institute of Technology, JP) [dblp]
- Weikuan Yu (Florida State University - Tallahassee, US) [dblp]
- Yue Zhu (Florida State University - Tallahassee, US) [dblp]
- Dagstuhl Seminar 21332: Understanding I/O Behavior in Scientific and Data-Intensive Computing (2021-08-15 - 2021-08-20) (Details)
- operating systems
- I/O performance
- file systems
- high performance computing workflows