- Martin Hirzel (IBM TJ Watson Research Center - Yorktown Heights, US)
- Benchmarking Distributed Stream Data Processing Systems : article in 2018 IEEE 34th International Conference on Data Engineering (ICDE) - Karimov, Jeyhun; Rabl, Tilmann; Katsifodimos, Asterios; Samarev, Roman; Heiskanen, Henri; Markl, Volker - Los Alamitos : IEEE, 2018. - pp. 1507-1518.
- Continuous Queries : article in Encyclopedia of Big Data Technologies - Berlin : Springer, 2018. - 6 pp. - Hirzel, Martin - Berlin : Springer, 2018. - 6 pp..
- Cost-Aware Streaming Data Analysis : Industry Paper : Distributed vs Single-Thread : article in DEBS '18 Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems - Balduini, Marco; Pasupathipillai, Sivam; Valle, Emanuele Della; - New York : ACM, 2018. - pp. 160 - 170.
- Dagstuhl Seminar on Big Stream Processing : article : pp. 36-39 - Sakr, Sherif; Rabl, Tilmann; Hirzel, Martin; Carbone, Paris; Strohbach, Martin - New York : ACM, 2018 - (ACM SIGMOD Record ; 47. 2018, 3).
- Stream Processing Languages and Abstractions : article in Encyclopedia of Big Data Technologies - Berlin : Springer, 2018. - 8 pp. - Hirzel, Martin; Baudart, Guillaume - Berlin : Springer, 2018. - 8 pp..
- Stream Processing Languages in the Big Data Era - Hirzel, Martin; Baudart, Guillaume; Bonifati, Angela; Della Valle, Emanuele; Sakr, Sherif; Vlachou, Akrivi - New York : ACM, 2018. - pp. 29-40 - (SIGMOD record ; 47. 2018, 2).
- Stream Query Optimization : article in Encyclopedia of Big Data Technologies - Berlin : Springer, 2018. - 9 pp. - Hirzel, Martin; Soule, Robert; Gedik, Bugra - Berlin : Springer, 2018. - 9 pp..
- Streams and Tables : Two Sides of the Same Coin : article in BIRTE '18 Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics - Sax, Matthias J.; Wang, Guozhang; Weidlich, Matthias; Freytag, Johann-Christoph - New York : ACM, 2018. - 11 pp..
- Towards Making Distributed RDF Processing FLINKer : article in 2018 4th International Conference on Big Data Innovations and Applications - Azzam, Amr; Kirrane, Sabrina; Polleres, Axel - Los Alamitos : IEEE, 2018. - pp. 9-16.
Currently, the world is entirely living in the era of the information age. The world is progressively moving towards being a data-driven society where data is the most valuable asset. Therefore, the digital transformation is representing a revolution that cannot be missed. It is significantly transforming and changing various aspects in our modern life including the way we live, socialize, think, work, do business, conduct research and govern society. The digital transformation is characterized through the enormous amounts of data that are produced and analyzed. Big data has commonly been characterized by the defining 3V's properties which refer to huge in Volume, consisting of terabytes or petabytes of data; high in Velocity, being created in or near real time; and diversity in Variety of type, being structured and unstructured in nature.
As the world gets more instrumented and connected, we are witnessing a flood of digital data that is getting generated, in a high velocity, from different hardware (e.g., sensors) or software in the format of streams of data. Examples of this phenomena are crucial for several applications and domains including financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure. In these applications and domains, there is a crucial requirement to collect, process, and analyse big streams of data in order to extract valuable information, discover new insights in real-time and to detect emerging patterns and outliers.
Stream computing is a new paradigm necessitated by new data generating scenarios, such as the ubiquity of mobile devices, location services, and sensor pervasiveness. In general, stream processing systems support a large class of applications (e.g., financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure) in which data are generated from multiple sources and are pushed asynchronously to servers which are responsible for processing them. Recently, several systems (e.g., Apache Storm, Apache Heron, Apache Flink, Spark Streamin, Apache Apex) have been introduced to tackle the real-time processing of big streaming data. However, there are several challenges and open problems that need to be addressed in order improve the state-of-the- art in this domain and push big stream processing systems to make them widely used by large number of users and enterprises. Thus, this application proposes a seminar bringing together researchers, developers and practitioners actively working in this domain to discuss very relevant open challenges in this domain with a focus on two main topics: benchmarking and high-level declarative programming abstracts of big streaming jobs.
As the world gets more instrumented and connected, we are witnessing a flood of digital data that is getting generated, in a high velocity, from different hardware (e.g., sensors) or software in the format of streams of data. Examples of this phenomena are crucial for several applications and domains including financial markets, surveillance systems, manufacturing, smart cities and scalable monitoring infrastructure. In these applications and domains, there is a crucial requirement to collect, process, and analyze big streams of data in order to extract valuable information, discover new insights in real-time and to detect emerging patterns and outliers. Recently, several systems (e.g., Apache Apex, Apache Flink, Apache Storm, Heron, Spark Streaming,) have been introduced to tackle the real-time processing of big streaming data. However, there are several challenges and open problems that need to be addressed in order improve the state-of-the-art in this domain and push big stream processing systems to make them widely used by large number of users and enterprises. The aim of this seminar was to bring together active and prominent researchers, developers and practitioners actively working in the domain of big stream processing to discuss very relevant open challenges and research directions. The plan was to work on specific challenges including the trade-offs of the various design decisions of big stream processing systems, the declarative stream querying and processing languages, and the benchmarking challenges of big stream processing systems.
On Monday morning, the workshop officially kicked off with a round of introductions about the participants where adhoc clusters for the interests of the participants have been defined. The clusters have been revolving around the topics of systems, query languages, benchmarking, stream mining and semantic stream processing. The program of the seminar included 4 tutorials, one per day. On Monday, Martin Strohbach from AGT International presented different case studies and scenarios for large scale stream processing in different application domains. On Tuesday, we enjoyed the systems tutorial which has been presented by Paris Carbone from KTH Royal Institute of Technology, Thomas Weise from Data Torrent Inc. and Matthias J. Sax from Confluent Inc. Paris presented an interesting overview of the journey of stream processing systems, Thomas presented the recent updates about the Apache Apex system while Matthias presented an overview about the Apache Kafka and Kafka Streams projects. On Wednesday, Martin Hirzel from IBM TJ Watson Research Center presented a tutorial about the taxonomy and classifications of stream processing languages. On Thursday, Tilmann Rabl from TU Berlin presented a tutorial about the challenges of benchmarking big data systems in general in addition to the specific challenges for benchmarking big stream processing systems. All tutorials have been very informative, interactive and involved very deep technical discussions. On Thursday evening, we had a lively demo session where various participants demonstrated their systems to the audience on parallel round-table interactive discussions. On Wednesday, the participants split into two groups based on common interest in selected subset of the open challengers and problems. The selected 2 topics of the groups were systems and query languages. Thursday schedule was dedicated to working group efforts. Summary about the outcomes of these 2 groups is included in this report. It is expected that work from at least one of the groups to be submitted for publication, and we expect further research publications to result directly from the seminar.
We believe that the most interesting aspect of the seminar was providing the opportunity to freely engage in direct and interactive discussions with solid experts and researchers in various topics of the field with common focused passion and interest. We believe that this is a unique feature for Dagstuhl seminars. We received very positive feedback from the participants and we believe that most of the participants were excited with the scientific atmosphere at the seminar and reported that the program of the seminar was useful for them. In summary, we consider the organization of this seminar as a success. We are grateful for the Dagstuhl team for providing the opportunity and full support to organize it. The success of this seminar motivated us to plan for future follow-up seminars to continue the discussions on the rapid advancements on the domain and plan for narrower and more focused discussion with concrete outputs for the community.
- Pramod Bhatotia (University of Edinburgh, GB) [dblp]
- Albert Bifet (Telecom ParisTech, FR) [dblp]
- Michael H. Böhlen (Universität Zürich, CH) [dblp]
- Angela Bonifati (University Claude Bernard - Lyon, FR) [dblp]
- Jean-Paul Calbimonte (HES-SO Valais - Sierre, CH) [dblp]
- Paris Carbone (KTH Royal Institute of Technology, SE) [dblp]
- Emanuele Della Valle (Polytechnic University of Milan, IT) [dblp]
- Javier David Fernández-García (Wirtschaftsuniversität Wien, AT)
- Ashish Gehani (SRI - Menlo Park, US) [dblp]
- Manfred Hauswirth (TU Berlin, DE) [dblp]
- Günter Hesse (Hasso-Plattner-Institut - Potsdam, DE) [dblp]
- Martin Hirzel (IBM TJ Watson Research Center - Yorktown Heights, US) [dblp]
- Ali Intizar (National University of Ireland - Galway, IE) [dblp]
- Asterios Katsifodimos (TU Delft, NL) [dblp]
- Nikos Katsipoulakis (University of Pittsburgh, US) [dblp]
- Henning Kropp (Hortonworks - München, DE) [dblp]
- Danh Le Phuoc (TU Berlin, DE) [dblp]
- Alessandro Margara (Polytechnic University of Milan, IT) [dblp]
- Gianmarco Morales (QCRI - Doha, QA) [dblp]
- Christoph Quix (Fraunhofer FIT - Sankt Augustin, DE) [dblp]
- Tilmann Rabl (TU Berlin, DE) [dblp]
- Sherif Sakr (KSAU - Riyadh, SA) [dblp]
- Kai-Uwe Sattler (TU Ilmenau, DE) [dblp]
- Matthias J. Sax (Confluent Inc - Palo Alto, US) [dblp]
- Martin Strohbach (AGT International - Darmstadt, DE) [dblp]
- Hong-Linh Truong (TU Wien, AT) [dblp]
- Akrivi Vlachou (University of Thessaly - Lamia, GR) [dblp]
- Thomas Weise (Mountain View, US) [dblp]
- Yongluan Zhou (University of Copenhagen, DK) [dblp]
- data bases / information retrieval
- optimization / scheduling
- programming languages / compiler
- Big Data
- Big Streams
- Stream Processing Systems
- Declarative Programming