http://www.dagstuhl.de/18401

30. September – 05. Oktober 2018, Dagstuhl Seminar 18401

Automating Data Science

Organisatoren

Tijl De Bie (Ghent University, BE)
Luc De Raedt (KU Leuven, BE)
Holger H. Hoos (LIACS – Leiden, NL)
Padhraic Smyth (University of California – Irvine, US)

Auskunft zu diesem Dagstuhl Seminar erteilen

Annette Beyer zu administrativen Fragen

Andreas Dolzmann zu wissenschaftlichen Fragen

Motivation

Data science is concerned with the extraction of knowledge and insight, and ultimately societal or economic value, from data. It complements traditional statistics in that its object is data as it presents itself in the wild (often complex and heterogeneous, noisy, loosely structured, biased, etc.), rather than well structured data sampled in carefully designed studies. It also has a strong computer science focus, and is related to popular areas such as big data, machine learning, data mining and knowledge discovery.

Data science is becoming increasingly important with the abundance of big data, while the number of skilled data scientists is lagging. This has raised the question as to whether it is possible to automate data science in several contexts. First, from an artificial intelligence perspective, it is interesting to investigate whether (data) science (or portions of it) can be automated, as it is an activity currently requiring high levels of human expertise. Second, the field of machine learning has a long-standing interest in applying machine learning at the meta-level, in order to obtain better machine learning algorithms, yielding recent successes in automated parameter tuning, algorithm configuration and algorithm selection. Third, there is an interest in automating not only the model building process itself (cf. the Automated Statistician) but also in automating the preprocessing steps (data wrangling).

This Dagstuhl seminar will bring together researchers from all areas concerned with data science in order to study whether, to what extent, and how data science can be automated. It will focus on the following Data Science topics:

  • Data Wrangling
  • Predictive Modeling
  • Exploratory Data Analysis
  • Inductive querying
  • Probabilistic Programming
  • Visual Analytics

and will aim at answering the following questions:

  • How can we automatically tune the parameters or configure algorithms? How can we apply this to machine learning and data science algorithms? This is related to expert / rule-based systems, information criteria, statistical learning theory, learning to learn, meta-learning, etc.
  • How can we assist users in their exploratory data mining tasks? Can we automate it? What type of interactivity is needed? How to obtain models of the user and of interestingness?
  • How can we support the data-wrangling process? How can inductive programming techniques help? Can it be realized fully automatically? What are the limitations and opportunities?
  • How can one automate data-driven story-telling? How can we explain learned models to the user? To what extent can natural language be used?
  • Can we (partially) automate Visual Analytics? Can we automatically visualize what is of interest to the user?
  • What is the trade-off between automation and interaction? To what extent is automation (un)desirable?
  • How can probabilistic programming and inductive querying techniques be used to facilitate data science ?
  • How can automation be married with the increasing tendency for personalization? With the impact on privacy and society of data science, are there any additional ethical issues to be taken into account?
  • Data Science for the expert versus for the layperson: different optimal trade-offs?

License
  Creative Commons BY 3.0 DE
  Tijl De Bie, Luc De Raedt, Holger H. Hoos, and Padhraic Smyth

Classification

  • Artificial Intelligence / Robotics
  • Data Bases / Information Retrieval
  • Programming Languages / Compiler

Keywords

  • Data science
  • Artificial intelligence
  • Automated machine learning
  • Automated scientific discovery
  • Inductive programming

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.