TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 24032

Representation, Provenance, and Explanations in Database Theory and Logic

( Jan 14 – Jan 19, 2024 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/24032

Organizers

Contact

Dagstuhl Reports

As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.

  • Upload (Use personal credentials as created in DOOR to log in)

Dagstuhl Seminar Wiki

Shared Documents

Schedule

Motivation

The goal of database theory is to formalize the theoretical underpinnings of databases and then analyze them with mathematical tools. One central question in the area is query evaluation, that is the computational problem of computing the answer to a query in a database. Of particular importance in this area is understanding the complexity of query evaluation, i.e., which resources, in particular, how much time and space, are necessary and sufficient to compute the answers to a given query. This Dagstuhl Seminar will focus on three key aspects of query evaluation: representation, provenance, and explanations. One common theme of these three directions is the use of circuits, which have a long history in complexity theory. In recent years, the theory of query processing has witnessed the use of different types of circuits, sometimes over Boolean inputs and operations, but also over more general structures, which often can be formalized as different semirings. We hope to see use of circuits in different representations of inputs for expressing and computing provenance, as well as for computing explanations using scores such as Shapley values. We hope to strengthen our understandings of these topics within database theory and logic. The topics to be discussed in the seminar are the following:

(1)Representation: One of the most important basics of algorithms is the use of good data-structures. There is a general tradeoff between expressivity, compactness, and efficient usability. Finding the right compromise in this tradeoff is often crucial for having efficient algorithms. These general considerations also apply to query evaluation: since in certain situations query results can be very large when materialized extensionally, and therefore, for many efficient query answering algorithms it is important to represent them in a compact fashion. For example, this is often the key underlying elements of enumeration algorithms, sampling, access to query results under access patterns, or algorithms providing so-called direct access to query results.

(2) Provenance: It is known that the query semantics for First Order Logic admits an interpretation in an arbitrary semiring. The most specialized case of Boolean semirings denote how an output tuple has been obtained from the inputs with joint usage (joins – translate to conjunctions "∧"), and alternative usages (projections or unions – translate to disjunctions "∨"). Such semirings can be used to understand compactly how outputs are generated from inputs, and have applications in query evaluation in uncertain data and in deletion propagation – to understand how the outputs change if one or more inputs are deleted, without re-computing the query. There are more advanced semirings like tropical semirings that can capture minimum cost in network flows. Compact representations of provenance semirings for recursive queries using circuits have also been studied, although the complexity of recursive queries under semiring semantics is open (despite its major importance in modern systems for machine learning (ML) where tensor algebra is extended with recursion). Compact and efficient knowledge compilation of provenance circuits into ordered and free binary decision diagrams, and more generally as decomposable deterministic negation normal forms (OBDD, FBDD, DNNF), are also important questions in database theory with applications in probabilistic databases.

(3) Explanations: As of late, explanations based on the widely known Shapley values from co-operative game theory have been used in database theory to measure the relevance of a certain database fact to a query answer, and to measure the relevance of inputs to the outcome of an ML classifier. Since the naive computation of Shapley values is intractable, as it includes a summation over exponentially many subsets, one of the main themes behind this investigation has been the identification of practically relevant classes of database queries for which such explanations can be computed in polynomial time. As it has been recently noted, all such positive results can be seen as particular instances of the fact that for a certain well-behaved class of circuits, namely deterministic and decomposable circuits, Shapley-values can be computed efficiently. The study of the complexity of computing Shapley values on different classes of circuits is an active area of research in AI, and we strongly believe that establishing bridges between this work and the one carried out in database theory can be strongly beneficial for both communities. There are other interesting notions of explanations for databases, ML, and aspects of responsible data science like fairness, which we will also explore in this Dagstuhl Seminar.

Copyright Pablo Barcelo, Pierre Bourhis, Stefan Mengel, and Sudeepa Roy

Participants

Classification
  • Data Structures and Algorithms
  • Databases
  • Logic in Computer Science

Keywords
  • Provenance
  • Factorized Databases
  • Shapley values
  • Database Theory
  • Computational Complexity