For decades distributed computing has been mainly an academic subject. Today, it has become mainstream: our connected world demands applications that are inherently distributed, and the usage of shared, distributed, peer-to-peer or cloud-computing infrastructures are increasingly common. However, writing distributed applications that are both correct and well distributed (e.g., highly available) is extremely challenging.
In fact, there exists a fundamental trade-off between data consistency, availability, and the ability to tolerate failures. This trade-off has implications on the design of the entire distributed computing infrastructure, including storage systems, compilers and runtimes, application development frameworks and programming languages. Unfortunately, this also has significant implications on the programming model exposed to the designers and developers of applications. We need to enable programmers who are not experts in these subtle aspects to build distributed applications that remain correct in the presence of concurrency, failures, churn, replication, dynamically-changing and partial information, high load, absence of a single line of time, etc.
This Dagstuhl Seminar proposes to bring together researchers and practitioners in the areas of distributed systems, programming languages, verifications, and databases. We would like to understand the lessons learnt in building scalable and correct distributed systems, the design patterns that have emerged, and explore opportunities for distilling these into programming methodologies, programming tools, and languages to make distributed computing easier and more accessible.
Possible topics for discussions:
- Application writers are constantly making trade-offs between consistency and availability. What kinds of tools and methodologies can we provide to simplify this decision making? How does one understand the implications of a design choice?
- Available systems are hard to design, test and debug. Do existing testing and debugging tools suffice for identifying and isolating bugs due to weak consistency? How can these problems be identified in production using live monitoring?
- Can we formalize commonly desired (generic) correctness (or performance) properties? How can we teach programmers about these formalisms and make them accessible to a wide audience?
- Can we build verification or testing tools to check that systems have these desired correctness properties?
- How do applications achieve the required properties, while ensuring adequate performance, in practice? What design patterns and idioms work well?
- To what degree can these properties be guaranteed by the platform (programming language, libraries, and runtime system)? What are the responsibilities of the application developer, and what tools and information does she have?
Large-scale distributed systems have become ubiquitous, and there are a variety of options to develop, deploy, and operate such applications. Typically, this type of application is data-centric: it retrieves, stores, modifies, forwards, and processes data from different sources. However, guaranteeing availability, preventing data loss, and providing efficient storage solutions are still major challenges that a growing number of programmers are facing when developing large-scale distributed systems. In our seminar, we brought together academic and industrial researchers and practitioners to discuss the status quo of data consistency in distributed systems. As result of talks and discussions, we identified several topics of interest that can be grouped into the following four areas.
Theoretical foundations: The seminar included a tutorial on specification of consistency guarantees provided by distributed systems and talks on comparing different styles of specification and expressing replicated data type semantics in Datalog. Different specification styles are suitable for different purposes and more work is needed to identify the most appropriate ones. The seminar also included talks on formally reasoning about which consistency levels are enough to satisfy correctness properties of applications. The talks demonstrated that formal verification is a promising approach to cope with the challenge of selecting appropriate consistency levels.
Distributed systems and database technologies: With the growing number of replicated data stores, the two fields of distributed systems and databases are moving closer together. The communities should be made more aware of each others results. A common concern in agreement, i.e., ensuring that database copies are updated correctly. Traditionally, the distributed systems community has based many of their approaches on classical consensus algorithms or looked at weaker consistency models. In contrast, database systems focused most work on 2-phase commit protocols and eager update protocols. At the same time, the database community also considered other ACID aspects that required to combine commit protocols with concurrency control protocols and recovery schemes. In the last decade however, and in particular with practical implementations of the Paxos consensus algorithms, and the use of file replication in storage systems for availability, work of the two communities has come closer together. A challenge in this context is that work that emerges from the different communities still makes slightly different assumptions about failure and correctness models. They can often be quite subtle so that the differences are not obvious, even to the experts. And they can lead to very different approaches to find solutions. Bridging this gap in terms of understanding each other, and the implications of correctness and failure models remains a challenging task. As an example, the separation of the concepts of atomicity, isolation and durability in the database world offers many opportunities for optimization, but includes extra complexity when analyzing which algorithms are appropriate in which situations.
Conflict-handling in highly-scalable systems: In the last years, conflict-free replicated data types (CRDTs) have been adopted by an ever-growing number of products and companies to deal with high-availability requirements under concurrent modifications of data. Recent advances in related techniques for collaborative editing might make it possible that hundreds of people work together on a shared document or data item with limited performance impact. Several talks presented programming guidelines, static analyses, and related tools for safe usage of CRDTs in situations where eventual consistency is not enough to maintain application invariants.
Programming models for distributed systems: Micro-services have become a standard approach for constructing large-scale distributed systems, though microservice composition and scalability raises a lot of questions. Some presentations discussed current work on actor-based and data-flow programming. Design for testability and test frameworks are crucial for providing reliable services, but they currently require a lot of experience as of today. We believe that future progress on programming models and new results in theoretical foundations will help to simplify this challenging task and support programmers in building safe systems.
- Peter Alvaro (University of California - Santa Cruz, US) [dblp]
- Mahesh Balakrishnan (Yale University - New Haven, US) [dblp]
- Carlos Baquero (University of Minho - Braga, PT) [dblp]
- Annette Bieniusa (TU Kaiserslautern, DE) [dblp]
- Ahmed Bouajjani (University Paris-Diderot, FR) [dblp]
- Manuel Bravo (INESC-ID - Lisbon, PT) [dblp]
- Sebastian Burckhardt (Microsoft Research - Redmond, US) [dblp]
- Andrea Cerone (Imperial College London, GB) [dblp]
- Gregory Chockler (Royal Holloway, University of London, GB) [dblp]
- Khuzaima Daudjee (University of Waterloo, CA) [dblp]
- Diego Didona (EPFL - Lausanne, CH) [dblp]
- Amr El Abbadi (University of California - Santa Barbara, US) [dblp]
- Carla Ferreira (New University of Lisbon, PT) [dblp]
- Alexey Gotsman (IMDEA Software - Madrid, ES) [dblp]
- Suresh Jagannathan (Purdue University - West Lafayette, US) [dblp]
- Bettina Kemme (McGill University - Montreal, CA) [dblp]
- Brad King (Scality - Paris, FR) [dblp]
- Kyle Kingsbury (San Francisco, US) [dblp]
- Martin Kleppmann (University of Cambridge, GB) [dblp]
- Christopher Meiklejohn (UC Louvain, BE) [dblp]
- Roland Meyer (TU Braunschweig, DE) [dblp]
- Maged M. Michael (Facebook - New York, US) [dblp]
- Pascal Molli (University of Nantes, FR) [dblp]
- Roberto Palmieri (Lehigh University - Bethlehem, US) [dblp]
- Matthieu Perrin (University of Nantes, FR) [dblp]
- Gustavo Petri (University Paris-Diderot, FR) [dblp]
- Nuno Preguica (New University of Lisbon, PT) [dblp]
- Luis Rodrigues (INESC-ID - Lisbon, PT) [dblp]
- Rodrigo Rodrigues (INESC-ID - Lisbon, PT) [dblp]
- Masoud Saeida Ardekani (Samsung Research - Mountain View, US) [dblp]
- Sebastian Schweizer (TU Braunschweig, DE) [dblp]
- Marc Shapiro (University Pierre & Marie Curie - Paris, FR) [dblp]
- Pierre Sutra (Télécom SudParis - Évry, FR) [dblp]
- Viktor Vafeiadis (MPI-SWS - Kaiserslautern, DE) [dblp]
- Peter Van Roy (UC Louvain, BE) [dblp]
- Dagstuhl Seminar 13081: Consistency In Distributed Systems (2013-02-17 - 2013-02-22) (Details)
- data bases / information retrieval
- programming languages / compiler
- verification / logic
- Distributed Computing