- Siegfried Nijssen (KU Leuven, BE)
- Susanne Bach-Bernhard (for administrative matters)
Constraint programming and constrained optimization (CPO) as well as machine learning and data mining (MLDM) are well-established research fields within computer science. They have contributed techniques that are routinely applied in real-life scientific and industrial applications.
In recent years awareness has grown that MLDM and CPO are closely related and need to be studied in relationship to each other. An important driving force is here the emergence of large amounts of data in industry and science. Both MLDM and CPO face the challenge of how to maximally exploit data to improve production processes, direct customer behavior, and gain scientific understanding. Only a well-balanced combination of data analysis and constraint optimization can be expected to succeed in this.
Data offers opportunities to constraint optimization in several ways. Currently, practitioners of constraint programming have to formulate explicitly the constraints that underly their application. Data may help here in making modeling decisions, and making better models. This raises the question as to whether it is possible to (semi)-automatically learn constraints, optimization criteria, and their formulations from data and experience.
At the same time, awareness has also grown that constraints and optimization are essential when mining and learning. The MLDM community has been using constraints and optimization to formalize mining and learning problems. Examples are the specification of desirable properties of patterns to be mined or clusters to be learned in constraint-based mining and learning.
Both the MLDM and the CPO community are now faced with the challenge of creating optimization technologies that apply to a wider range of tasks, while also taking into account large amounts of data.
The CPO (and in general the artificial intelligence) community could contribute solvers for broad ranges of constraint-satisfaction and optimization tasks to resolve this challenge. These are studied in in the area of constraint programming and constrained optimization. Machine learning and data mining could benefit from these developments as the goals of CPO and MLDM and constraint-based mining and learning overlap: it is only that CPO targets any type of constraint satisfaction and optimization problem, whereas MLDM specifically targets particular types of such problems.
The MLDM community, on the other hand, could contribute its experience in dealing with large amounts of data, and could contribute its experience in making sense out of data. Insights in how to model data, either using probabilistic models or using rule-based systems, and in how effective search algorithms are currently working, could be useful to the CPO community as its aims to add such models in optimization tasks.
This Dagstuhl seminar will bring the MLDM and CPO communities together to study these challenges. It investigates, on the one hand, how standard CPO techniques can be used in MLDM, and on the other hand, how MLDM can contribute to CPO.
In 2011, a successful Dagstuhl seminar (on "Constraint Programming meets Machine Learning and Data Mining") already brought together these two communities. It succeeded in bringing together key researchers in the field and realized a growing awareness of their potential integration. This seminar aims to consolidate these interests and further investigate the potential. It focuses on two new dimensions. First, while the 2011 Dagstuhl seminar focused on constraint satisfaction, the follow-on seminar will focus more strongly on constrained optimization. Secondly, its focus is on data. How can we effectively use data in CPO? How can we integrate data in CPO if we use CPO for data mining and machine learning? This seminar aims to further our understanding of integrating data, constraints and optimization.
- Learning constraints from data
- Optimizing solvers based on data
- Using data in CPO solvers for data mining
- Using data in CPO solvers for machine learning
- Integrating data, solvers, mining and learning
Constraint programming and optimization (CPO) have recently received considerable attention from the fields of machine learning and data mining (MLDM). On the one hand, the hypotheses and patterns that one seeks to discover in MLDM can be specified in terms of constraints (e.g. labels in the case of supervised learning, preferences in the case of learning to rank, must-link and cannot-link in the case of unsupervised learning, coverage and lift in the case of data mining). On the other hand, powerful constraint programming solvers have been developed. If MLDM users express their requirements in terms of constraints they can delegate the MLDM process to such highly efficient solvers.
Conversely, CPO can benefit from integrating learning and mining functionalities in a number of ways. For example, formulating a real-world problem in terms of constraints requires significant expertise in the problem domain. Also, selecting the most appropriate constraints, in terms of constraint solving efficiency, requires considerable expertise in the CPO domain. In other words, experience plays a major role in successfully applying CPO technology.
In addition, both CPO and MLDM share a common challenge associated with tuning their respective methods, specifically determining the best parameters to chose for an algorithm depending on the task at hand. A typical performance metric in machine learning is the predictive accuracy of a hypotheses, while in CPO it might be search cost or solution quality.
This seminar built upon the 2011 Constraint Programming meets Machine Learning and Data Mining and the 2014 Preference learning seminars. Its goal was to identify the key challenges and opportunities at the crossroads of CPO and MLDM. The interests of the participants included the following:
- Problem formulation and modelling: constraint-based modelling; preference formalisms; loss functions in ML; modelling and exploiting background knowledge; structured properties (e.g. preserving spatio-temporal structures).
- Improvement of algorithms / platforms in the areas of algorithm selection, algorithm configuration, and/or algorithm scheduling, particularly with respect to parallel execution.
- Specification and reasoning about goals and optimization criteria: modelling preferences and integrating with human expertise (exploiting the "human in the loop") to converge on high quality outcomes.
- Additional functionalities such as the use of visualization and explanation.
- Algorithmic scalability.
- Approximate reasoning, reasoning under uncertainty, and incorporating probability.
The seminar was organized into seven sessions: frameworks and languages; algorithm configuration; constraints in pattern mining; learning constraints; machine learning with constraints; applications; and demonstrations. The demonstrations presented at the seminar were by:
- Guido Tack - MiniZinc (see http://mininzinc.org);
- Joaquin Vanschoren - OpenML (see http://openml.org);
- Tias Guns - MiningZinc (see http://dtai.cs.kuleuven.be/CP4IM/miningzinc);
- Bruno Crémilleux - software for the calculation of Sky Pattern Cubes;
- Marc Denecker - IDP (see http://dtai.cs.kuleuven.be/krr/software/idp);
- Holger Hoos - algorithm selection and portfolio software;
- Luc De Raedt - ProbLog (see http://dtai.cs.kuleuven.be/problog/).
The seminar also had five working groups on:
- Declarative Languages for Machine Learning and Data Mining;
- Learning and Optimization with the Human in the Loop;
- Meta-Algorithmic Techniques;
- Big Data;
- Towards Killer Applications.
- Hendrik Blockeel (KU Leuven, BE) [dblp]
- Jean-François Boulicaut (INSA - Lyon, FR) [dblp]
- Ken Brown (University College Cork, IE) [dblp]
- Bruno Crémilleux (Caen University, FR) [dblp]
- James Cussens (University of York, GB) [dblp]
- Krzysztof Czarnecki (University of Waterloo, CA) [dblp]
- Thi-Bich-Hanh Dao (University of Orleans, FR) [dblp]
- Ian Davidson (University of California - Davis, US) [dblp]
- Luc De Raedt (KU Leuven, BE) [dblp]
- Marc Denecker (KU Leuven, BE) [dblp]
- Yves Deville (University of Louvain, BE) [dblp]
- Alan Frisch (University of York, GB) [dblp]
- Randy Goebel (University of Alberta, CA) [dblp]
- Valerio Grossi (University of Pisa, IT) [dblp]
- Tias Guns (KU Leuven, BE) [dblp]
- Holger H. Hoos (University of British Columbia - Vancouver, CA) [dblp]
- Frank Hutter (Universität Freiburg, DE) [dblp]
- Kristian Kersting (TU Dortmund, DE) [dblp]
- Lars Kotthoff (University College Cork, IE) [dblp]
- Pauli Miettinen (MPI für Informatik - Saarbrücken, DE) [dblp]
- Mirco Nanni (ISTI-CNR - Pisa, IT) [dblp]
- Benjamin Negrevergne (KU Leuven, BE) [dblp]
- Siegfried Nijssen (KU Leuven, BE) [dblp]
- Barry O'Sullivan (University College Cork, IE) [dblp]
- Andrea Passerini (University of Trento, IT) [dblp]
- Francesca Rossi (University of Padova, IT) [dblp]
- Lakhdar Sais (CNRS - Lens, FR) [dblp]
- Vijay A. Saraswat (IBM TJ Watson Research Center - Yorktown Heights, US) [dblp]
- Michele Sebag (University of Paris South XI, FR) [dblp]
- Arno Siebes (Utrecht University, NL) [dblp]
- Guido Tack (Monash University - Caulfield, AU) [dblp]
- Yuzuru Tanaka (Hokkaido University, JP) [dblp]
- Joaquin Vanschoren (TU Eindhoven, NL) [dblp]
- Christel Vrain (University of Orleans, FR) [dblp]
- Toby Walsh (NICTA - Sydney, AU) [dblp]
- Dagstuhl Seminar 11201: Constraint Programming meets Machine Learning and Data Mining (2011-05-15 - 2011-05-20) (Details)
- artificial intelligence / robotics
- data structures / algorithms / complexity
- optimization / scheduling
- constrained optimization
- constraint programming
- machine learning
- data mining
- big data