TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 17441

Big Stream Processing Systems

( Oct 29 – Nov 03, 2017 )


Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/17441

Organizers

Coordinator
  • Martin Hirzel (IBM TJ Watson Research Center - Yorktown Heights, US)

Contact


Impacts

Motivation

Currently, the world is entirely living in the era of the information age. The world is progressively moving towards being a data-driven society where data is the most valuable asset. Therefore, the digital transformation is representing a revolution that cannot be missed. It is significantly transforming and changing various aspects in our modern life including the way we live, socialize, think, work, do business, conduct research and govern society. The digital transformation is characterized through the enormous amounts of data that are produced and analyzed. Big data has commonly been characterized by the defining 3V's properties which refer to huge in Volume, consisting of terabytes or petabytes of data; high in Velocity, being created in or near real time; and diversity in Variety of type, being structured and unstructured in nature.

As the world gets more instrumented and connected, we are witnessing a flood of digital data that is getting generated, in a high velocity, from different hardware (e.g., sensors) or software in the format of streams of data. Examples of this phenomena are crucial for several applications and domains including financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure. In these applications and domains, there is a crucial requirement to collect, process, and analyse big streams of data in order to extract valuable information, discover new insights in real-time and to detect emerging patterns and outliers.

Stream computing is a new paradigm necessitated by new data generating scenarios, such as the ubiquity of mobile devices, location services, and sensor pervasiveness. In general, stream processing systems support a large class of applications (e.g., financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure) in which data are generated from multiple sources and are pushed asynchronously to servers which are responsible for processing them. Recently, several systems (e.g., Apache Storm, Apache Heron, Apache Flink, Spark Streamin, Apache Apex) have been introduced to tackle the real-time processing of big streaming data. However, there are several challenges and open problems that need to be addressed in order improve the state-of-the- art in this domain and push big stream processing systems to make them widely used by large number of users and enterprises. Thus, this application proposes a seminar bringing together researchers, developers and practitioners actively working in this domain to discuss very relevant open challenges in this domain with a focus on two main topics: benchmarking and high-level declarative programming abstracts of big streaming jobs.

Copyright Irini Fundulaki, Tilmann Rabl, and Sherif Sakr

Summary

As the world gets more instrumented and connected, we are witnessing a flood of digital data that is getting generated, in a high velocity, from different hardware (e.g., sensors) or software in the format of streams of data. Examples of this phenomena are crucial for several applications and domains including financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure. In these applications and domains, there is a crucial requirement to collect, process, and analyze big streams of data in order to extract valuable information, discover new insights in real-time and to detect emerging patterns and outliers. Recently, several systems (e.g., Apache Apex, Apache Flink, Apache Storm, Heron, Spark Streaming,) have been introduced to tackle the real-time processing of big streaming data. However, there are several challenges and open problems that need to be addressed in order improve the state-of-the-art in this domain and push big stream processing systems to make them widely used by large number of users and enterprises. The aim of this seminar was to bring together active and prominent researchers, developers and practitioners actively working in the domain of big stream processing to discuss very relevant open challenges and research directions. The plan was to work on specific challenges including the trade-offs of the various design decisions of big stream processing systems, the declarative stream querying and processing languages, and the benchmarking challenges of big stream processing systems.

On Monday morning, the workshop officially kicked off with a round of introductions about the participants where adhoc clusters for the interests of the participants have been defined. The clusters have been revolving around the topics of systems, query languages, benchmarking, stream mining and semantic stream processing. The program of the seminar included 4 tutorials, one per day. On Monday, Martin Strohbach from AGT International presented different case studies and scenarios for large scale stream processing in different application domains. On Tuesday, we enjoyed the systems tutorial which has been presented by Paris Carbone from KTH Royal Institute of Technology, Thomas Weise from Data Torrent Inc. and Matthias J. Sax from Confluent Inc. Paris presented an interesting overview of the journey of stream processing systems, Thomas presented the recent updates about the Apache Apex system while Matthias presented an overview about the Apache Kafka and Kafka Streams projects. On Wednesday, Martin Hirzel from IBM TJ Watson Research Center presented a tutorial about the taxonomy and classifications of stream processing languages. On Thursday, Tilmann Rabl from TU Berlin presented a tutorial about the challenges of benchmarking big data systems in general in addition to the specific challenges for benchmarking big stream processing systems. All tutorials have been very informative, interactive and involved very deep technical discussions. On Thursday evening, we had a lively demo session where various participants demonstrated their systems to the audience on parallel round-table interactive discussions. On Wednesday, the participants split into two groups based on common interest in selected subset of the open challengers and problems. The selected 2 topics of the groups were systems and query languages. Thursday schedule was dedicated to working group efforts. Summary about the outcomes of these 2 groups is included in this report. It is expected that work from at least one of the groups to be submitted for publication, and we expect further research publications to result directly from the seminar.

We believe that the most interesting aspect of the seminar was providing the opportunity to freely engage in direct and interactive discussions with solid experts and researchers in various topics of the field with common focused passion and interest. We believe that this is a unique feature for Dagstuhl seminars. We received very positive feedback from the participants and we believe that most of the participants were excited with the scientific atmosphere at the seminar and reported that the program of the seminar was useful for them. In summary, we consider the organization of this seminar as a success. We are grateful for the Dagstuhl team for providing the opportunity and full support to organize it. The success of this seminar motivated us to plan for future follow-up seminars to continue the discussions on the rapid advancements on the domain and plan for narrower and more focused discussion with concrete outputs for the community.

Copyright Martin Hirzel, Tilmann Rabl, and Sherif Sakr

Participants
  • Pramod Bhatotia (University of Edinburgh, GB) [dblp]
  • Albert Bifet (Telecom ParisTech, FR) [dblp]
  • Michael H. Böhlen (Universität Zürich, CH) [dblp]
  • Angela Bonifati (University Claude Bernard - Lyon, FR) [dblp]
  • Jean-Paul Calbimonte (HES-SO Valais - Sierre, CH) [dblp]
  • Paris Carbone (KTH Royal Institute of Technology, SE) [dblp]
  • Emanuele Della Valle (Polytechnic University of Milan, IT) [dblp]
  • Javier David Fernández-García (Wirtschaftsuniversität Wien, AT)
  • Ashish Gehani (SRI - Menlo Park, US) [dblp]
  • Manfred Hauswirth (TU Berlin, DE) [dblp]
  • Günter Hesse (Hasso-Plattner-Institut - Potsdam, DE) [dblp]
  • Martin Hirzel (IBM TJ Watson Research Center - Yorktown Heights, US) [dblp]
  • Ali Intizar (National University of Ireland - Galway, IE) [dblp]
  • Asterios Katsifodimos (TU Delft, NL) [dblp]
  • Nikos Katsipoulakis (University of Pittsburgh, US) [dblp]
  • Henning Kropp (Hortonworks - München, DE) [dblp]
  • Danh Le Phuoc (TU Berlin, DE) [dblp]
  • Alessandro Margara (Polytechnic University of Milan, IT) [dblp]
  • Gianmarco Morales (QCRI - Doha, QA) [dblp]
  • Christoph Quix (Fraunhofer FIT - Sankt Augustin, DE) [dblp]
  • Tilmann Rabl (TU Berlin, DE) [dblp]
  • Sherif Sakr (KSAU - Riyadh, SA) [dblp]
  • Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
  • Matthias J. Sax (Confluent Inc - Palo Alto, US) [dblp]
  • Martin Strohbach (AGT International - Darmstadt, DE) [dblp]
  • Hong-Linh Truong (TU Wien, AT) [dblp]
  • Akrivi Vlachou (University of Thessaly - Lamia, GR) [dblp]
  • Thomas Weise (Mountain View, US) [dblp]
  • Yongluan Zhou (University of Copenhagen, DK) [dblp]

Classification
  • data bases / information retrieval
  • optimization / scheduling
  • programming languages / compiler

Keywords
  • Big Data
  • Big Streams
  • Stream Processing Systems
  • Benchmarking
  • Declarative Programming