Dagstuhl Perspectives Workshop 16152
Tensor Computing for Internet of Things
( Apr 10 – Apr 13, 2016 )
- Evrim Acar (University of Copenhagen, DK)
- Animashree Anandkumar (University of California - Irvine, US)
- Lenore Mullin (University of Albany - SUNY, US)
- Volker Tresp (Siemens AG - München, DE)
- Sükran Sebnem Rusitschka (Siemens AG - München, DE)
- Simone Schilke (for administrative matters)
- Tensor Computing for Internet of Things (Dagstuhl Perspectives Workshop 16152). Avrim Acar, Animashree Anandkumar, Lenore Mullin, Sebnem Rusitschka, and Volker Tresp. In Dagstuhl Reports, Volume 6, Issue 4, pp. 57-79, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)
- Tensor Computing for Internet of Things (Dagstuhl Perspectives Workshop 16152). Evrim Acar, Animashree Anandkumar, Lenore Mullin, Sebnem Rusitschka, and Volker Tresp. In Dagstuhl Manifestos, Volume 7, Issue 1, pp. 52-68, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)
How can we assure performance and dependability given the increasing complexity and data of an always-on connected world? Can we exploit power of tensor algebra to solve high-dimensional large-scale machine learning problems that such a world poses?
Cyber-physical systems (CPS) enable the physical world to merge with the virtual, leading to an internet of things, data, services, and people. This workshop will focus on the Internet of Things (IoT), i.e. devices, which have the capability to sense, communicate, and control. These devices are part of complex, dynamic, and distributed systems, such as in electricity or mobility networks. The various sensors enable them to capture multiple aspects of their surroundings in real-time. For example, phasor measurement units capture transient dynamics and evolving disturbances in the power system in high-resolution, in a synchronized manner, and in real-time. Another example is traffic networks, where a car today can deliver about 250 GB of data per hour from connected electronics such as weather sensors within the car, parking cameras and radars. Experts estimate that the IoT will consist of almost 50 billion objects by 2020. Big data computing frameworks, such as Hadoop, Spark or Storm currently form the basis for handling the massive amounts of data in batch and in stream. Advances in hardware such as many-core and heterogeneous architectures are also enabling factors. In order to enhance knowledge discovery, we believe that machine learning methods that can deal with multidimensional data are required to effectively extract information from this high-volume high-velocity sensor data. Crucial for the extraction of information is the format in which the data is represented.
The goal of the workshop is to explore tensor representations as the basis for machine learning solutions for the IoT. Tensors are algebraic objects which describe linear and multilinear relationships, and are represented as multidimensional arrays. They provide often a natural and compact representation for multidimensional data. In the recent years, tensor and machine learning communities, mainly active in the data-rich domains such as neuroscience, social network analysis, chemometrics, computer vision, knowledge graphs etc., have provided a solid research infrastructure reaching from the efficient routines for tensor calculus to methods of multi-way analysis to tensor decompositions for consistent and efficient estimation of parameters of the probabilistic models in search for convergence to globally optimal solutions. Big data necessitates optimized performance and space if latency and delay are to be minimized. These are also critical concerns in CPS, where real-time execution and security are desired. Great potential may exist for designing more efficient routines, which take into account the various computing architectures and resources that will co-exist in the heterogeneous IoT scenarios: complex processor/memory/network hierarchies and embedded processors.
This Dagstuhl Perspectives Workshop is planned as a catalyst and should attract experts from tensor and large scale data analysis, practitioners of hardware and software advancement for big data computing, as well as industrial stakeholders active in IoT/CPS. The resulting Tensor Computing for IoT manifesto will show the path to fill remaining gaps between the disciplines and outline a small number of concise action items that describe important directions of future research and investment. Topics of interest include:
- Limitations of the currently available models and algorithms for IoT data
- Employing tensor methods in conjunction with other methods for probabilistic modeling, deep learning
- Distributed data and computing models for multidimensional sensor data across heterogeneous architectures of multi-core cluster and embedded computing
- Universal algebra and calculus of indexing for optimized and verifiable composition of operations in an n-dimensional array/tensor algebra supporting the above models
In April 2016, Dagstuhl hosted a Perspectives Workshop on Tensor Computing for the Internet of Things. The prior year, industrial researchers had formulated the challenges of gaining insights from multi-dimensional sensory data coming from large-scale connected energy, transportation networks or manufacturing systems. The sheer amount of streaming multi-aspect data was prompting us to look for the most suitable techniques from the machine learning community: multi-way data analysis. Hence, we organized a three-day interactive workshop with two separate questions bringing two formerly distinct communities together: (i) How can we assure performance and reliability given the increasing complexity and data of an always-on connected world? (ii) Can we exploit the power of tensor algebra to solve high-dimensional large-scale machine learning problems that such a world poses?
The workshop focused on the Internet of Things (IoT), i.e. devices, which have the capability to sense, communicate, and more so, control their environments. These devices are increasingly becoming a part of complex, dynamic, and distributed systems of electricity or mobility networks, hence our daily lives. Various sensors enable these devices to capture multiple aspects of their surroundings in real-time. For example, phasor measurement units capture transient dynamics and evolving disturbances in the power system in high-resolution, in a synchronized manner, and in real-time. Another example is traffic networks, where a car today can deliver about 250 GB of data per hour from connected electronics such as weather sensors within the car, parking cameras and radars. Experts estimate that the IoT will consist of almost 50 billion objects by 2020 , which will trigger the Era of Exascale computing necessitating the management of heat and energy of computing in concert with more and more complex processor/network/memory hierarchies of sensors and embedded computers in distributed systems. Crucial for the extraction of relevant information is the format in which the raw data from such systems is represented. Crucial for viable efficiency of information extraction in IoT is which operations are used guaranteeing various attributes of resource use and management. Tensors can be viewed as data structures or as multilinear operators.
The goal of the workshop was to explore tensor representations and computing as the basis for machine learning solutions for the IoT. Tensors are algebraic objects which describe linear and multilinear relationships, and can be represented as multidimensional arrays. They often provide a natural and compact representation for multidimensional data. In the recent years, tensor and machine learning communities – mainly active in the data-rich domains such as neuroscience, social network analysis, chemometrics, knowledge graphs etc. – have provided a solid research infrastructure, reaching from the efficient routines for tensor calculus to methods of multi-way data analysis, i.e., tensor decompositions, to methods for consistent and efficient estimation of parameters of the probabilistic models.
Some tensor-based models have the intriguing characteristic that if there is a good match between the model and the underlying structure in the data, the models are much better interpretable than alternative techniques. Their interpretability is an essential feature for the machine learning techniques to gain acceptance in the rather engineering heavy fields of automation and control of cyber-physical systems. Many of these systems show intrinsically multilinear behavior, which is appropriately modeled by tensor methods and tools for controller design can use these models. The calibration of sensors delivering data and the higher resolution of measured data will have an additional impact on the interpretability of models.
Various presentations on tensor methods by established researchers from different application domains underscored that tensor methods are reaching a maturity tipping point. However, knowledge of usage characteristics of tensor models is scattered. Discussions of the currently independent perspectives on the usage of tensor methods showed convergence potential which we will detail in the Dagstuhl Manifesto. During our discussions based on the presentations of the IoT industrial researchers, it quickly became clear that we would need benchmark challenges for cyber-physical systems and benchmark data in order to be able to replicate the successes in machine learning for neuroscience, image processing or chemometrics, for example.
The tensor computing community will equally benefit from the new types of data, requirements and characteristics of IoT, which can lead to techniques that increase success rates of previous applications, as was the case with the challenges of social network data analysis leading to better tensor models/algorithms that can analyze data sets with missing entries, now used in many other fields in addition to social network analysis. Additionally, as opposed to standardized machine learning techniques, tensor computing currently lacks a common language and the homogeneity to flexibly exchange models. Hence, a hub platform bringing data and domain knowledge of cyber-physical systems together with a variety of practitioners of tensor computing would enhance increasing coherence of terms, best practices in data acquisition and structuring methods as well as model benchmarking, cataloging, and exchange of methods.
Furthermore, industrial researchers from IoT, automation and control domains highlighted their view that tensor computing methods are currently still inaccessible to the majority of the industrial practitioners even though there has been a considerable progress in developing tools for tensor computing. Matlab extensions to enable the use of tensor analysis are quite mature   . Matlab is widely used by control and automation practitioners. Python ecosystem for machine learning practitioners is very quickly adopting extensions for enabling tensor operations  . However, both are mainly for prototyping and ultimately do not fulfill the need for a unified framework for industrial grade development and deployment of models in highly distributed cyber-physical systems. Interestingly, just five months prior to our workshop, Tensorflow , a numerical computation library aiming at capturing structures in multidimensional data as well as supporting both prototyping and production level algorithms was open sourced. Tensorflow can run on server clusters as well as embedded systems such as smart phones . Another framework, unifying both batch and streaming data analysis, is Apache Spark . Spark provides seamless scalability of software code to run on multiple machines. Recently there have been deployments of tensor methods on the Spark platform.
As a multidisciplinary community we believe that we will be able to formulate requirements and provide support in developing improvements for unifying frameworks. The required skill set is quite rare: we are in need of software developers that can create reliable high-performant code for both server-side distributed training on massive amounts of data and deployment of trained models in embedded distributed system. Heterogeneous processor architectures are predominant in cyber-physical systems. Either these software developers should be data scientists proficient in tensor computing and very good at communicating with domain experts or we need tooling such that data scientists and domain experts can collaboratively model data for cyber-physical systems. We will detail these discussions in the Manifesto: We believe that it is feasible to create such tooling that automates the generation of reliable, secure code, which accounts for the adaptive logic of devices interacting with their dynamic physical environment – but also through which there is a direct feedback between data scientist, domain or control expert, and the adaptive control device.
The Manifesto, which will be published on http://www.dagstuhl.de/16152 will include a roadmap of how we as a newly formed multidisciplinary community want to start with a knowledge hub on tensors, and iterate through data grand challenges from IoT pilots, results dissemination, into what may one day become collaborative modeling hub for learning cyber-physical systems.
- Tamara G. Kolda Brett W. Bader et al. Matlab tensor toolbox version 2.6. http://www. sandia.gov/~tgkolda/TensorToolbox/index-2.6.html
- Statista. Internet of things (iot): number of connected devices worldwide from 2012 to 2020 (in billions). http://www.statista.com/statistics/471264/ iot-number-of-connected-devices-worldwide/
- Ivan Oseledets. Tt (tensor train) toolbox version 2.2.2. https://github.com/oseledets/ TT-Toolbox
- Laurent Sorber Marc Van Barel Nico Vervliet, Otto Debals and Lieven De Lathauwer. Tensor lab user guide, release 3.0. http://www.tensorlab.net/userguide3.pdf
- Ivan Oseledets. Python implementation of the tt-toolbox. https://github.com/oseledets/ ttpy
- Maximilian Nickel. scikit-tensor: Python library for multilinear algebra and tensor factorizations. https://github.com/mnick/scikit-tensor
- Google. Tensorflow: An open source software library for numerical computation using data flow graphs. https://www.tensorflow.org/
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zhang. Tensorflow: A system for large-scale machine learning. CoRR, abs/1605.08695, 2016
- Apache Spark. Apache spark: A fast and general engine for large-scale data processing. http://spark.apache.org/
- Evrim Acar (University of Copenhagen, DK) [dblp]
- Kareem Aggour (General Electric - Niskayuna, US) [dblp]
- Animashree Anandkumar (University of California - Irvine, US) [dblp]
- Rasmus Bro (University of Copenhagen, DK) [dblp]
- Ali Taylan Cemgil (Bogaziçi University - Istanbul, TR) [dblp]
- Edward Curry (National University of Ireland - Galway, IE) [dblp]
- Lieven De Lathauwer (KU Leuven, BE) [dblp]
- Hans Hagen (TU Kaiserslautern, DE) [dblp]
- Souleiman Hasan (National University of Ireland - Galway, IE) [dblp]
- Denis Krompaß (Siemens AG - München, DE) [dblp]
- Gerwald Lichtenberg (HAW - Hamburg, DE) [dblp]
- Benoit Meister (Reservoir Labs, Inc. - New York, US) [dblp]
- Morten Mørup (Technical University of Denmark - Lyngby, DK) [dblp]
- Lenore Mullin (University of Albany - SUNY, US) [dblp]
- Axel-Cyrille Ngonga Ngomo (Universität Leipzig, DE) [dblp]
- Ivan Oseledets (Skoltech - Scolkovo, RU) [dblp]
- Renato Pajarola (Universität Zürich, CH) [dblp]
- Evangelos Papalexakis (Carnegie Mellon University, US) [dblp]
- Christine Preisach (SAP SE - Walldorf, DE) [dblp]
- Achim Rettinger (KIT - Karlsruher Institut für Technologie, DE) [dblp]
- Sükran Sebnem Rusitschka (Siemens AG - München, DE) [dblp]
- Volker Tresp (Siemens AG - München, DE) [dblp]
- Bülent Yener (Rensselaer Polytechnic Institute - Troy, US) [dblp]
- data bases / information retrieval
- data structures / algorithms / complexity
- Tensor Methods
- Multi-way Data Analysis
- Multi-linear Algebra
- Tensor Software
- Distributed & Parallel Computing
- Big Data Computing & Analytics
- Cyber-physical Systems
- Intelligent Autonomous Systems
- Applications in Smart Grid
- Smart City