https://www.dagstuhl.de/19211

### May 19 – 24 , 2019, Dagstuhl Seminar 19211

# Enumeration in Data Management

## Organizers

Endre Boros (Rutgers University – Piscataway, US)

Benny Kimelfeld (Technion – Haifa, IL)

Reinhard Pichler (TU Wien, AT)

Nicole Schweikardt (HU Berlin, DE)

## For support, please contact

Annette Beyer for administrative matters

Andreas Dolzmann for scientific matters

## Motivation

The general task of enumeration is to evaluate a function that maps a given input into a (possibly large) collection of answers. Traditional examples include the maximal independent sets of a graph (or a hypergraph), the simple paths from a source to a destination in a network, the satisfying assignments of a logical formula, and the tuples in the result of a database query. The ability to enumerate with short delays, particularly "polynomial delay" and "incremental polynomial delay," has taken a significant focus by both theory and system-oriented researchers throughout the history of database research. For example, an algorithm for enumerating the minimal keys of a relation with incremental polynomial delay was already presented in the 1970s. Moreover, in recent years, various computational problems at the core of data management have been revisited through the lenses of enumeration algorithms and complexity. One class of problems includes data-mining challenges such as frequent patterns in graphs and communities in social networks. Another class of problems is driven by query optimization and includes the enumeration of tree decompositions and provenance expressions. Additional examples can be found in Information Retrieval from structured data, Information Extraction from text, data integration, and database cleaning.

While different in nature, the analyses of these problems share a few algorithmic techniques for devising solvers, and proof techniques for establishing complexity bounds. Consequently, a recent effort has been initiated to define suitable complexity classes for enumeration problems. This theory requires revisiting concepts such as computational hardness, reductions between computational problems, and complexity assumptions for lower bounds. As another direction, the fields of Logic in Computer Science and Database Theory have seen a number of contributions that deal with a stronger notion of efficiency of enumeration of query results. In this scenario, the objective is as follows: given a logical structure (i.e., a database) and a logical formula (i.e., a query), after a short preprocessing phase, the query results shall be generated one by one, without repetition, with guarantees on the maximum delay time between the output of two tuples. In this vein, the best that one can hope for is constant delay (i.e., the delay may depend on the size of the query but not on that of the database) and linear preprocessing time. Established results include the complete classification of classes of conjunctive queries into the queries that admit a constant delay following a linear preprocessing time, and the ones that do not. The lower bounds are based on algorithmic assumptions on limitation in Polynomial Time, such as the complexity of multiplying Boolean matrices. The upper bounds (constant delay) are based on the construction of auxiliary data structures (indexes). The natural question of whether such structures can be maintained efficiently (sub-linearly) has led to the pursue and significant progress of the "dynamic complexity" of enumeration. Constant-delay enumeration has also been adopted as a central concept in "factorized databases" that have gained recent attention.

This week-long Dagstuhl Seminar targets two main goals. The first goal is to better understand the recent developments, lay out the most important open problems, and join forces towards solutions thereof. To this end, we plan to schedule invited tutorials, and ask tutorial speakers to prepare lists of open problems. Moreover, we plan to select a few of these open problems and run small working groups to focus on each. The second goal is to establish and disseminate a toolkit for data-centric enumeration problems. We will seek tools of various types. One type is algorithmic techniques for enumeration with different guarantees, such as methods for hereditary graph properties, schemes for ranked enumeration, and reductions to query answering with constant delay. Another type of tools consists of proof techniques for lower bounds, including indicator problems such as the hypergraph transversals, Boolean matrix multiplication, and Boolean online matrix-vector multiplication.

**License**

Creative Commons BY 3.0 DE

Endre Boros, Benny Kimelfeld, Reinhard Pichler, and Nicole Schweikardt

## Classification

- Data Bases / Information Retrieval
- Data Structures / Algorithms / Complexity
- Verification / Logic

## Keywords

- Enumeration
- Polynomial delay
- Constant delay
- Dynamic complexity
- Databases
- Query evaluation