http://www.dagstuhl.de/16152

April 10 – 13 , 2016, Dagstuhl Perspectives Workshop 16152

Tensor Computing for Internet of Things

Organizers

Evrim Acar (University of Copenhagen, DK)
Animashree Anandkumar (University of California – Irvine, US)
Lenore Mullin (University of Albany – SUNY, US)
Volker Tresp (Siemens AG – München, DE)

Coordinators

Sebnem Rusitschka (Siemens AG – München, DE)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 6, Issue 4 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents
Dagstuhl's Impact: Documents available
Dagstuhl Perspectives Workshop Schedule [pdf]

Summary

In April 2016, Dagstuhl hosted a Perspectives Workshop on Tensor Computing for the Internet of Things. The prior year, industrial researchers had formulated the challenges of gaining insights from multi-dimensional sensory data coming from large-scale connected energy, transportation networks or manufacturing systems. The sheer amount of streaming multi-aspect data was prompting us to look for the most suitable techniques from the machine learning community: multi-way data analysis. Hence, we organized a three-day interactive workshop with two separate questions bringing two formerly distinct communities together: (i) How can we assure performance and reliability given the increasing complexity and data of an always-on connected world? (ii) Can we exploit the power of tensor algebra to solve high-dimensional large-scale machine learning problems that such a world poses?

The workshop focused on the Internet of Things (IoT), i.e. devices, which have the capability to sense, communicate, and more so, control their environments. These devices are increasingly becoming a part of complex, dynamic, and distributed systems of electricity or mobility networks, hence our daily lives. Various sensors enable these devices to capture multiple aspects of their surroundings in real-time. For example, phasor measurement units capture transient dynamics and evolving disturbances in the power system in high-resolution, in a synchronized manner, and in real-time. Another example is traffic networks, where a car today can deliver about 250 GB of data per hour from connected electronics such as weather sensors within the car, parking cameras and radars. Experts estimate that the IoT will consist of almost 50 billion objects by 2020 [2], which will trigger the Era of Exascale computing necessitating the management of heat and energy of computing in concert with more and more complex processor/network/memory hierarchies of sensors and embedded computers in distributed systems. Crucial for the extraction of relevant information is the format in which the raw data from such systems is represented. Crucial for viable efficiency of information extraction in IoT is which operations are used guaranteeing various attributes of resource use and management. Tensors can be viewed as data structures or as multilinear operators.

The goal of the workshop was to explore tensor representations and computing as the basis for machine learning solutions for the IoT. Tensors are algebraic objects which describe linear and multilinear relationships, and can be represented as multidimensional arrays. They often provide a natural and compact representation for multidimensional data. In the recent years, tensor and machine learning communities – mainly active in the data-rich domains such as neuroscience, social network analysis, chemometrics, knowledge graphs etc. – have provided a solid research infrastructure, reaching from the efficient routines for tensor calculus to methods of multi-way data analysis, i.e., tensor decompositions, to methods for consistent and efficient estimation of parameters of the probabilistic models.

Some tensor-based models have the intriguing characteristic that if there is a good match between the model and the underlying structure in the data, the models are much better interpretable than alternative techniques. Their interpretability is an essential feature for the machine learning techniques to gain acceptance in the rather engineering heavy fields of automation and control of cyber-physical systems. Many of these systems show intrinsically multilinear behavior, which is appropriately modeled by tensor methods and tools for controller design can use these models. The calibration of sensors delivering data and the higher resolution of measured data will have an additional impact on the interpretability of models.

Various presentations on tensor methods by established researchers from different application domains underscored that tensor methods are reaching a maturity tipping point. However, knowledge of usage characteristics of tensor models is scattered. Discussions of the currently independent perspectives on the usage of tensor methods showed convergence potential which we will detail in the Dagstuhl Manifesto. During our discussions based on the presentations of the IoT industrial researchers, it quickly became clear that we would need benchmark challenges for cyber-physical systems and benchmark data in order to be able to replicate the successes in machine learning for neuroscience, image processing or chemometrics, for example.

The tensor computing community will equally benefit from the new types of data, requirements and characteristics of IoT, which can lead to techniques that increase success rates of previous applications, as was the case with the challenges of social network data analysis leading to better tensor models/algorithms that can analyze data sets with missing entries, now used in many other fields in addition to social network analysis. Additionally, as opposed to standardized machine learning techniques, tensor computing currently lacks a common language and the homogeneity to flexibly exchange models. Hence, a hub platform bringing data and domain knowledge of cyber-physical systems together with a variety of practitioners of tensor computing would enhance increasing coherence of terms, best practices in data acquisition and structuring methods as well as model benchmarking, cataloging, and exchange of methods.

Furthermore, industrial researchers from IoT, automation and control domains highlighted their view that tensor computing methods are currently still inaccessible to the majority of the industrial practitioners even though there has been a considerable progress in developing tools for tensor computing. Matlab extensions to enable the use of tensor analysis are quite mature [1] [3] [4]. Matlab is widely used by control and automation practitioners. Python ecosystem for machine learning practitioners is very quickly adopting extensions for enabling tensor operations [5] [6]. However, both are mainly for prototyping and ultimately do not fulfill the need for a unified framework for industrial grade development and deployment of models in highly distributed cyber-physical systems. Interestingly, just five months prior to our workshop, Tensorflow [7], a numerical computation library aiming at capturing structures in multidimensional data as well as supporting both prototyping and production level algorithms was open sourced. Tensorflow can run on server clusters as well as embedded systems such as smart phones [8]. Another framework, unifying both batch and streaming data analysis, is Apache Spark [9]. Spark provides seamless scalability of software code to run on multiple machines. Recently there have been deployments of tensor methods on the Spark platform.

As a multidisciplinary community we believe that we will be able to formulate requirements and provide support in developing improvements for unifying frameworks. The required skill set is quite rare: we are in need of software developers that can create reliable high-performant code for both server-side distributed training on massive amounts of data and deployment of trained models in embedded distributed system. Heterogeneous processor architectures are predominant in cyber-physical systems. Either these software developers should be data scientists proficient in tensor computing and very good at communicating with domain experts or we need tooling such that data scientists and domain experts can collaboratively model data for cyber-physical systems. We will detail these discussions in the Manifesto: We believe that it is feasible to create such tooling that automates the generation of reliable, secure code, which accounts for the adaptive logic of devices interacting with their dynamic physical environment – but also through which there is a direct feedback between data scientist, domain or control expert, and the adaptive control device.

The Manifesto, which will be published on http://www.dagstuhl.de/16152 will include a roadmap of how we as a newly formed multidisciplinary community want to start with a knowledge hub on tensors, and iterate through data grand challenges from IoT pilots, results dissemination, into what may one day become collaborative modeling hub for learning cyber-physical systems.

References:

  1. Tamara G. Kolda Brett W. Bader et al. Matlab tensor toolbox version 2.6. http://www. sandia.gov/~tgkolda/TensorToolbox/index-2.6.html
  2. Statista. Internet of things (iot): number of connected devices worldwide from 2012 to 2020 (in billions). http://www.statista.com/statistics/471264/ iot-number-of-connected-devices-worldwide/
  3. Ivan Oseledets. Tt (tensor train) toolbox version 2.2.2. https://github.com/oseledets/ TT-Toolbox
  4. Laurent Sorber Marc Van Barel Nico Vervliet, Otto Debals and Lieven De Lathauwer. Tensor lab user guide, release 3.0. http://www.tensorlab.net/userguide3.pdf
  5. Ivan Oseledets. Python implementation of the tt-toolbox. https://github.com/oseledets/ ttpy
  6. Maximilian Nickel. scikit-tensor: Python library for multilinear algebra and tensor factorizations. https://github.com/mnick/scikit-tensor
  7. Google. Tensorflow: An open source software library for numerical computation using data flow graphs. https://www.tensorflow.org/
  8. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zhang. Tensorflow: A system for large-scale machine learning. CoRR, abs/1605.08695, 2016
  9. Apache Spark. Apache spark: A fast and general engine for large-scale data processing. http://spark.apache.org/
License
  Creative Commons BY 3.0 Unported license
  Evrim Acar, Animashree Anandkumar, Lenore Mullin, and Volker Tresp

Classification

  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity
  • Networks

Keywords

  • Tensor Methods
  • Multi-way Data Analysis
  • Multi-linear Algebra
  • Tensor Software
  • Distributed & Parallel Computing
  • Big Data Computing & Analytics
  • Cyber-physical Systems
  • Intelligent Autonomous Systems
  • Applications in Smart Grid
  • Mobility
  • Smart City

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support