LIPIcs, Volume 290

27th International Conference on Database Theory (ICDT 2024)



Thumbnail PDF

Event

ICDT 2024, March 25-28, 2024, Paestum, Italy

Editors

Graham Cormode
  • University of Warwick, UK
Michael Shekelyan
  • Queen Mary University of London, UK

Publication Details

  • published at: 2024-03-14
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-312-6
  • DBLP: db/conf/icdt/icdt2024

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 290, ICDT 2024, Complete Volume

Authors: Graham Cormode and Michael Shekelyan


Abstract
LIPIcs, Volume 290, ICDT 2024, Complete Volume

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1-484, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Proceedings{cormode_et_al:LIPIcs.ICDT.2024,
  title =	{{LIPIcs, Volume 290, ICDT 2024, Complete Volume}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1--484},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024},
  URN =		{urn:nbn:de:0030-drops-197819},
  doi =		{10.4230/LIPIcs.ICDT.2024},
  annote =	{Keywords: LIPIcs, Volume 290, ICDT 2024, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Graham Cormode and Michael Shekelyan


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.ICDT.2024.0,
  author =	{Cormode, Graham and Shekelyan, Michael},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.0},
  URN =		{urn:nbn:de:0030-drops-197828},
  doi =		{10.4230/LIPIcs.ICDT.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk)

Authors: Georgia Koutrika


Abstract
Back in 1970’s, E. F. Codd worked on a prototype of a natural language question and answer application that would sit on top of a relational database system. Soon, natural language interfaces for databases (NLIDBs) became the holy grail for the database community. Different approaches have been proposed from the database, machine learning and NLP communities. Interest in the topic has had its peaks and valleys. After a long and adventurous journey of almost 50 years, there is a rekindled interest in NLIDBs in recent years, fueled by the need for democratizing data access and by the recent advances in deep learning and natural language processing in particular. There is a surge of works on natural language interfaces for databases using neural translation, and suddenly it becomes hard to keep up with advancements in the field. Are we close to finding the holy grail of data access? What are the lurking challenges that we need to surpass and what research opportunities arise? Finally, what is the role of the database community?

Cite as

Georgia Koutrika. Natural Language Data Interfaces: A Data Access Odyssey (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{koutrika:LIPIcs.ICDT.2024.1,
  author =	{Koutrika, Georgia},
  title =	{{Natural Language Data Interfaces: A Data Access Odyssey}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.1},
  URN =		{urn:nbn:de:0030-drops-197832},
  doi =		{10.4230/LIPIcs.ICDT.2024.1},
  annote =	{Keywords: natural language data interfaces, NLIDBs, NL-to-SQL, text-to-SQL, conversational databases}
}
Document
Invited Talk
How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

Authors: Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang


Abstract
Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.

Cite as

Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang. How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 2:1-2:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{roy_et_al:LIPIcs.ICDT.2024.2,
  author =	{Roy, Sudeepa and Gilad, Amir and Hu, Yihao and Meng, Hanze and Miao, Zhengjie and Stephens-Martinez, Kristin and Yang, Jun},
  title =	{{How Database Theory Helps Teach Relational Queries in Database Education}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{2:1--2:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.2},
  URN =		{urn:nbn:de:0030-drops-197841},
  doi =		{10.4230/LIPIcs.ICDT.2024.2},
  annote =	{Keywords: Query Debugging, SQL, Relational Algebra, Relational Calculus, Database Education, Boolean Provenance}
}
Document
Invited Talk
Rule-Based Ontologies: From Semantics to Syntax (Invited Talk)

Authors: Andreas Pieris


Abstract
An ontology specifies an abstract model of a domain of interest via a formal language that is typically based on logic. Tuple-generating dependencies (tgds) and equality-generating dependencies (egds) originally introduced as a unifying framework for database integrity constraints, and later on used in data exchange and integration, are well suited for modeling ontologies that are intended for data-intensive tasks. The reason is that, unlike other popular formalisms such as description logics, tgds and egds can easily handle higher-arity relations that naturally occur in relational databases. In recent years, there has been an extensive study of tgd- and egd-based ontologies and of their applications to several different data-intensive tasks. In those studies, model theory plays a crucial role and it typically proceeds from syntax to semantics. In other words, the syntax of an ontology language is introduced first and then the properties of the mathematical structures that satisfy ontologies of that language are explored. There is, however, a mature and growing body of research in the reverse direction, i.e., from semantics to syntax. Here, the starting point is a collection of model-theoretic properties and the goal is to determine whether or not these properties characterize some ontology language. Such results are welcome as they pinpoint the expressive power of an ontology language in terms of insightful model-theoretic properties. The main aim of this tutorial is to present a comprehensive overview of model-theoretic characterizations of tgd- and egd-based ontology languages that are encountered in database theory and symbolic artificial intelligence.

Cite as

Andreas Pieris. Rule-Based Ontologies: From Semantics to Syntax (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{pieris:LIPIcs.ICDT.2024.3,
  author =	{Pieris, Andreas},
  title =	{{Rule-Based Ontologies: From Semantics to Syntax}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.3},
  URN =		{urn:nbn:de:0030-drops-197857},
  doi =		{10.4230/LIPIcs.ICDT.2024.3},
  annote =	{Keywords: ontologies, tuple-generating dependencies, equality-generating dependencies, model theory, model-theoretic characterizations}
}
Document
Direct Access for Answers to Conjunctive Queries with Aggregation

Authors: Idan Eldar, Nofar Carmeli, and Benny Kimelfeld


Abstract
We study the fine-grained complexity of conjunctive queries with grouping and aggregation. For some common aggregate functions (e.g., min, max, count, sum), such a query can be phrased as an ordinary conjunctive query over a database annotated with a suitable commutative semiring. Specifically, we investigate the ability to evaluate such queries by constructing in log-linear time a data structure that provides logarithmic-time direct access to the answers ordered by a given lexicographic order. This task is nontrivial since the number of answers might be larger than log-linear in the size of the input, and so, the data structure needs to provide a compact representation of the space of answers. In the absence of aggregation and annotation, past research provides a sufficient tractability condition on queries and orders. For queries without self-joins, this condition is not just sufficient, but also necessary (under conventional lower-bound assumptions in fine-grained complexity). We show that all past results continue to hold for annotated databases, assuming that the annotation itself is not part of the lexicographic order. On the other hand, we show infeasibility for the case of count-distinct that does not have any efficient representation as a commutative semiring. We then investigate the ability to include the aggregate and annotation outcome in the lexicographic order. Among the hardness results, standing out as tractable is the case of a semiring with an idempotent addition, such as those of min and max. Notably, this case captures also count-distinct over a logarithmic-size domain.

Cite as

Idan Eldar, Nofar Carmeli, and Benny Kimelfeld. Direct Access for Answers to Conjunctive Queries with Aggregation. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{eldar_et_al:LIPIcs.ICDT.2024.4,
  author =	{Eldar, Idan and Carmeli, Nofar and Kimelfeld, Benny},
  title =	{{Direct Access for Answers to Conjunctive Queries with Aggregation}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.4},
  URN =		{urn:nbn:de:0030-drops-197861},
  doi =		{10.4230/LIPIcs.ICDT.2024.4},
  annote =	{Keywords: aggregate queries, conjunctive queries, provenance semirings, commutative semirings, annotated databases, direct access, ranking function, answer orderings, query classification}
}
Document
Communication Cost of Joins over Federated Data

Authors: Tamara Cucumides and Juan Reutter


Abstract
We study the problem of querying different data sources, which we assume out of our control and that are made available by standard web communication protocols. In this scenario, the time spent communicating data often dominates the time spent processing local queries in each server. Thus, our focus is on algorithms that minimize the communication between the query processing server and the federated servers containing data. However, any federated query can always be answered with linear communication, simply by requesting all the data to the federated sources. Further, one can show that certain queries do require this amount of communication. But sending all the data is definitely not a relevant algorithm from a practical point of view. This worst-case analysis is, therefore, not useful for our needs. There is a growing body of work in terms of designing strategies that minimize communication in query federation, but these strategies are commonly based in heuristics, and we currently miss a formal analysis providing guidelines for the design of such strategies. We focus on the communication complexity of federated joins when the problem is parameterized by a measure commonly referred to as the certificate of the instance: a framework that has been used before in the context of set intersection and local query processing. We show how to process any conjunctive query in time given by the certificate of instances. Our algorithm is an adaptation of Minesweeper, one of the algorithms devised for local query processing, into our federating setting. When certificates are of the size of the instance, this amount to sending the entire database, but our strategy provides drastic reductions in the communication needed for queries and instances with small certificates. We also show matching communication lower bounds for cases where the certificate is smaller than the size of active domain of the instances.

Cite as

Tamara Cucumides and Juan Reutter. Communication Cost of Joins over Federated Data. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{cucumides_et_al:LIPIcs.ICDT.2024.5,
  author =	{Cucumides, Tamara and Reutter, Juan},
  title =	{{Communication Cost of Joins over Federated Data}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.5},
  URN =		{urn:nbn:de:0030-drops-197879},
  doi =		{10.4230/LIPIcs.ICDT.2024.5},
  annote =	{Keywords: databases, database queries, query federation, communication complexity, adaptive algorithms}
}
Document
Range Entropy Queries and Partitioning

Authors: Sanjay Krishnan and Stavros Sintos


Abstract
Data partitioning that maximizes or minimizes Shannon entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be accelerated if we have a data structure to find the entropy in different subsets of data when the algorithm needs to decide what block to construct. While it is generally known how to compute the entropy of a discrete distribution efficiently, we want to efficiently derive the entropy among the data items that lie in a specific area. We solve this problem in a typical setting when we deal with real data, where data items are geometric points and each requested area is a query (hyper)rectangle. More specifically, we consider a set P of n weighted and colored points in ℝ^d. The goal is to construct a low space data structure, such that given a query (hyper)rectangle R, it computes the entropy based on the colors of the points in P∩ R, in sublinear time. We show a conditional lower bound for this problem proving that we cannot hope for data structures with near-linear space and near-constant query time. Then, we propose exact data structures for d = 1 and d > 1 with o(n^{2d}) space and o(n) query time. We also provide a tune parameter t that the user can choose to bound the asymptotic space and query time of the new data structures. Next, we propose near linear space data structures for returning either an additive or a multiplicative approximation of the entropy. Finally, we show how we can use the new data structures to efficiently partition time series and histograms with respect to entropy.

Cite as

Sanjay Krishnan and Stavros Sintos. Range Entropy Queries and Partitioning. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{krishnan_et_al:LIPIcs.ICDT.2024.6,
  author =	{Krishnan, Sanjay and Sintos, Stavros},
  title =	{{Range Entropy Queries and Partitioning}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{6:1--6:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.6},
  URN =		{urn:nbn:de:0030-drops-197883},
  doi =		{10.4230/LIPIcs.ICDT.2024.6},
  annote =	{Keywords: Shannon entropy, range query, data structure, data partitioning}
}
Document
Skyline Operators for Document Spanners

Authors: Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel


Abstract
When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples that extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

Cite as

Antoine Amarilli, Benny Kimelfeld, Sébastien Labbé, and Stefan Mengel. Skyline Operators for Document Spanners. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.7,
  author =	{Amarilli, Antoine and Kimelfeld, Benny and Labb\'{e}, S\'{e}bastien and Mengel, Stefan},
  title =	{{Skyline Operators for Document Spanners}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.7},
  URN =		{urn:nbn:de:0030-drops-197898},
  doi =		{10.4230/LIPIcs.ICDT.2024.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Query Evaluation}
}
Document
When Do Homomorphism Counts Help in Query Algorithms?

Authors: Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu


Abstract
A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.

Cite as

Balder ten Cate, Victor Dalmau, Phokion G. Kolaitis, and Wei-Lin Wu. When Do Homomorphism Counts Help in Query Algorithms?. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2024.8,
  author =	{ten Cate, Balder and Dalmau, Victor and Kolaitis, Phokion G. and Wu, Wei-Lin},
  title =	{{When Do Homomorphism Counts Help in Query Algorithms?}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.8},
  URN =		{urn:nbn:de:0030-drops-197905},
  doi =		{10.4230/LIPIcs.ICDT.2024.8},
  annote =	{Keywords: query algorithms, homomorphism, homomorphism counts, conjunctive query, constraint satisfaction}
}
Document
Approximating Single-Source Personalized PageRank with Absolute Error Guarantees

Authors: Zhewei Wei, Ji-Rong Wen, and Mingji Yang


Abstract
Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes s and t on a graph G = (V,E), the PPR value π(s,t) is defined as the probability that an α-discounted random walk from s terminates at t, where the walk terminates with probability α at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node s to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates π̂(s,t) satisfy max_{t ∈ V} |π̂(s,t)-π(s,t)| ≤ ε for a given error bound ε. We propose an algorithm that achieves this with high probability, with an expected running time of - Õ(√m/ε) for directed graphs, where m = |E|; - Õ(√{d_max}/ε) for undirected graphs, where d_max is the maximum node degree in the graph; - Õ(n^{γ-1/2}/ε) for power-law graphs, where n = |V| and γ ∈ (1/2,1) is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring max_{t ∈ V} |π̂(s,t)/d(t)-π(s,t)/d(t)| ≤ ε_d for a given error bound ε_d, where the graph is undirected and d(t) is the degree of node t. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of Õ(√{∑_{t ∈ V} π(s,t)/d(t)}/ε_d). This improves over the previously known O(1/ε_d) complexity.

Cite as

Zhewei Wei, Ji-Rong Wen, and Mingji Yang. Approximating Single-Source Personalized PageRank with Absolute Error Guarantees. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{wei_et_al:LIPIcs.ICDT.2024.9,
  author =	{Wei, Zhewei and Wen, Ji-Rong and Yang, Mingji},
  title =	{{Approximating Single-Source Personalized PageRank with Absolute Error Guarantees}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.9},
  URN =		{urn:nbn:de:0030-drops-197911},
  doi =		{10.4230/LIPIcs.ICDT.2024.9},
  annote =	{Keywords: Graph Algorithms, Sublinear Algorithms, Personalized PageRank}
}
Document
Right-Adjoints for Datalog Programs

Authors: Balder ten Cate, Víctor Dalmau, and Jakub Opršal


Abstract
A Datalog program can be viewed as a syntactic specification of a mapping from database instances over some schema to database instances over another schema. We establish a large class of Datalog programs for which this mapping admits a (generalized) right-adjoint. We employ these results to obtain new insights into the existence of, and methods for constructing, homomorphism dualities within restricted classes of instances. From this, we derive new results regarding the existence of uniquely characterizing data examples for database queries in the presence of integrity constraints.

Cite as

Balder ten Cate, Víctor Dalmau, and Jakub Opršal. Right-Adjoints for Datalog Programs. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{tencate_et_al:LIPIcs.ICDT.2024.10,
  author =	{ten Cate, Balder and Dalmau, V{\'\i}ctor and Opr\v{s}al, Jakub},
  title =	{{Right-Adjoints for Datalog Programs}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.10},
  URN =		{urn:nbn:de:0030-drops-197929},
  doi =		{10.4230/LIPIcs.ICDT.2024.10},
  annote =	{Keywords: Datalog, Adjoints, Homomorphism Dualities, Database Constraints, Conjunctive Queries, Data Examples}
}
Document
On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings

Authors: Sungjin Im, Benjamin Moseley, Hung Ngo, and Kirk Pruhs


Abstract
Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible. It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well.

Cite as

Sungjin Im, Benjamin Moseley, Hung Ngo, and Kirk Pruhs. On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{im_et_al:LIPIcs.ICDT.2024.11,
  author =	{Im, Sungjin and Moseley, Benjamin and Ngo, Hung and Pruhs, Kirk},
  title =	{{On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.11},
  URN =		{urn:nbn:de:0030-drops-197939},
  doi =		{10.4230/LIPIcs.ICDT.2024.11},
  annote =	{Keywords: Datalog, convergence rate, semiring}
}
Document
Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility

Authors: Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren


Abstract
Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.

Cite as

Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren. Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{munozserrano_et_al:LIPIcs.ICDT.2024.12,
  author =	{Mu\~{n}oz Serrano, Thomas and Riveros, Cristian and Vansummeren, Stijn},
  title =	{{Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.12},
  URN =		{urn:nbn:de:0030-drops-197946},
  doi =		{10.4230/LIPIcs.ICDT.2024.12},
  annote =	{Keywords: Query evaluation, conjunctive queries, linear algebra, enumeration algorithms}
}
Document
Direct Access for Conjunctive Queries with Negations

Authors: Florent Capelli and Oliver Irwin


Abstract
Given a conjunctive query Q and a database 𝐃, a direct access to the answers of Q over 𝐃 is the operation of returning, given an index j, the j-th answer for some order on its answers. While this problem is #P-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries - that is, queries having only negated atoms. In particular, we show that the class of β-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.

Cite as

Florent Capelli and Oliver Irwin. Direct Access for Conjunctive Queries with Negations. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 13:1-13:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{capelli_et_al:LIPIcs.ICDT.2024.13,
  author =	{Capelli, Florent and Irwin, Oliver},
  title =	{{Direct Access for Conjunctive Queries with Negations}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{13:1--13:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.13},
  URN =		{urn:nbn:de:0030-drops-197958},
  doi =		{10.4230/LIPIcs.ICDT.2024.13},
  annote =	{Keywords: Conjunctive queries, factorized databases, direct access, hypertree decomposition}
}
Document
The Importance of Parameters in Database Queries

Authors: Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke


Abstract
We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter. The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.

Cite as

Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke. The Importance of Parameters in Database Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2024.14,
  author =	{Grohe, Martin and Kimelfeld, Benny and Lindner, Peter and Standke, Christoph},
  title =	{{The Importance of Parameters in Database Queries}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.14},
  URN =		{urn:nbn:de:0030-drops-197966},
  doi =		{10.4230/LIPIcs.ICDT.2024.14},
  annote =	{Keywords: SHAP score, query parameters, Shapley value}
}
Document
Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability

Authors: Antoine Amarilli, Timothy van Bremen, and Kuldeep S. Meel


Abstract
Query evaluation over probabilistic databases is a notoriously intractable problem - not only in combined complexity, but for many natural queries in data complexity as well [Antoine Amarilli et al., 2017; Nilesh N. Dalvi and Dan Suciu, 2012]. This motivates the study of probabilistic query evaluation through the lens of approximation algorithms, and particularly of combined FPRASes, whose runtime is polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, which can be equivalently viewed as probabilistic graphs. We study in which cases we can devise combined FPRASes for probabilistic query evaluation in this setting. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability and (conditional) inapproximability results. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of [Marcelo Arenas et al., 2021] on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now [Rico Zenklusen and Marco Laumanns, 2011]. We also show that one cannot extend a recent result [Timothy van Bremen and Kuldeep S. Meel, 2023] that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. Finally, we complement all our inapproximability results with unconditional lower bounds, showing that DNNF provenance circuits must have at least moderately exponential size in combined complexity.

Cite as

Antoine Amarilli, Timothy van Bremen, and Kuldeep S. Meel. Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 15:1-15:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.15,
  author =	{Amarilli, Antoine and van Bremen, Timothy and Meel, Kuldeep S.},
  title =	{{Conjunctive Queries on Probabilistic Graphs: The Limits of Approximability}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{15:1--15:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.15},
  URN =		{urn:nbn:de:0030-drops-197978},
  doi =		{10.4230/LIPIcs.ICDT.2024.15},
  annote =	{Keywords: Probabilistic query evaluation, tuple-independent databases, approximation}
}
Document
Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity

Authors: Hubie Chen and Stefan Mengel


Abstract
A central computational task in database theory, finite model theory, and computer science at large is the evaluation of a first-order sentence on a finite structure. In the context of this task, the width of a sentence, defined as the maximum number of free variables over all subformulas, has been established as a crucial measure, where minimizing width of a sentence (while retaining logical equivalence) is considered highly desirable. An undecidability result rules out the possibility of an algorithm that, given a first-order sentence, returns a logically equivalent sentence of minimum width; this result motivates the study of width minimization via syntactic rewriting rules, which is this article’s focus. For a number of common rewriting rules (which are known to preserve logical equivalence), including rules that allow for the movement of quantifiers, we present an algorithm that, given a positive first-order sentence ϕ, outputs the minimum-width sentence obtainable from ϕ via application of these rules. We thus obtain a complete algorithmic understanding of width minimization up to the studied rules; this result is the first one - of which we are aware - that establishes this type of understanding in such a general setting. Our result builds on the theory of term rewriting and establishes an interface among this theory, query evaluation, and structural decomposition theory.

Cite as

Hubie Chen and Stefan Mengel. Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICDT.2024.16,
  author =	{Chen, Hubie and Mengel, Stefan},
  title =	{{Optimally Rewriting Formulas and Database Queries: A Confluence of Term Rewriting, Structural Decomposition, and Complexity}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.16},
  URN =		{urn:nbn:de:0030-drops-197984},
  doi =		{10.4230/LIPIcs.ICDT.2024.16},
  annote =	{Keywords: width, query rewriting, structural decomposition, term rewriting}
}
Document
Containment of Regular Path Queries Under Path Constraints

Authors: Sylvain Salvati and Sophie Tison


Abstract
Data integrity is ensured by expressing constraints it should satisfy. One can also view constraints as data properties and take advantage of them for several tasks such as reasoning about data or accelerating query processing. In the context of graph databases, simple constraints can be expressed by means of path constraints while simple queries are modeled as regular path queries (RPQs). In this paper, we investigate the containment of RPQs under path constraints. We focus on word constraints that can be viewed as tuple-generating dependencies (TGDs) of the form ∀x_1,x_2, ∃y⁻, a_1(x_1,y_1) ∧ ... ∧ a_i(y_{i-1},y_i) ∧ ... ∧ a_n(y_{n-1},x_2) ⟶ ∃z⁻, b_1(x_1,z_1) ∧ ... ∧ b_i(z_{i-1},z_i) ∧ ... ∧ b_m(z_{m-1},x_2). Such a constraint means that whenever two nodes in a graph are connected by a path labeled a_1 … a_n, there is also a path labeled b_1 … b_m that connects them. Rewrite systems offer an abstract view of these TGDs: the rewrite rule a_1 … a_n → b_1 … b_m represents the previous constraint. A set of constraints 𝒞 is then represented by a rewrite system R and, when dealing with possibly infinite databases, a path query p is contained in a path query q under the constraints 𝒞 iff p rewrites to q with R. Contrary to what has been claimed in the literature we show that, when restricting to finite databases only, there are cases where a path query p is contained in a path query q under the constraints 𝒞 while p does not rewrite to q with R. More generally, we study the finite controllability of the containment of RPQs under word constraints, that is when this containment problem on unrestricted databases does coincide with the finite case. We give an exact characterisation of the cases where this equivalence holds. We then deduce the undecidability of the containment problem in the finite case even when RPQs are restricted to word queries. We prove several properties related to finite controllability, and in particular that it is undecidable. We also exhibit some classes of word constraints that ensure the finite controllability and the decidability of the containment problem.

Cite as

Sylvain Salvati and Sophie Tison. Containment of Regular Path Queries Under Path Constraints. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 17:1-17:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{salvati_et_al:LIPIcs.ICDT.2024.17,
  author =	{Salvati, Sylvain and Tison, Sophie},
  title =	{{Containment of Regular Path Queries Under Path Constraints}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{17:1--17:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.17},
  URN =		{urn:nbn:de:0030-drops-197994},
  doi =		{10.4230/LIPIcs.ICDT.2024.17},
  annote =	{Keywords: Graph databases, rational path queries, query containment, TGDs, word constraints, rewrite systems, finite controllability, decision problems}
}
Document
Computing Data Distribution from Query Selectivities

Authors: Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang


Abstract
We are given a set 𝒵 = {(R_1,s_1), …, (R_n,s_n)}, where each R_i is a range in ℝ^d, such as rectangle or ball, and s_i ∈ [0,1] denotes its selectivity. The goal is to compute a small-size discrete data distribution 𝒟 = {(q₁,w₁),…, (q_m,w_m)}, where q_j ∈ ℝ^d and w_j ∈ [0,1] for each 1 ≤ j ≤ m, and ∑_{1≤j≤m} w_j = 1, such that 𝒟 is the most consistent with 𝒵, i.e., err_p(𝒟,𝒵) = 1/n ∑_{i = 1}ⁿ |s_i - ∑_{j=1}^m w_j⋅1(q_j ∈ R_i)|^p is minimized. In a database setting, 𝒵 corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and 𝒟 can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is NP-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time O((n+δ^{-d}) δ^{-2} polylog n), a discrete distribution 𝒟̃ of size O(δ^{-2}), such that err_p(𝒟̃,𝒵) ≤ min_𝒟 err_p(𝒟,𝒵)+δ (for p = 1,2,∞) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely.

Cite as

Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang. Computing Data Distribution from Query Selectivities. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.ICDT.2024.18,
  author =	{Agarwal, Pankaj K. and Raychaudhury, Rahul and Sintos, Stavros and Yang, Jun},
  title =	{{Computing Data Distribution from Query Selectivities}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.18},
  URN =		{urn:nbn:de:0030-drops-198007},
  doi =		{10.4230/LIPIcs.ICDT.2024.18},
  annote =	{Keywords: selectivity queries, discrete distributions, Multiplicative Weights Update, eps-approximation, learnable functions, depth problem, arrangement}
}
Document
Information Inequality Problem over Set Functions

Authors: Miika Hannula


Abstract
Information inequalities appear in many database applications such as query output size bounds, query containment, and implication between data dependencies. Recently Khamis et al. [Mahmoud Abo Khamis et al., 2020] proposed to study the algorithmic aspects of information inequalities, including the information inequality problem: decide whether a linear inequality over entropies of random variables is valid. While the decidability of this problem is a major open question, applications often involve only inequalities that adhere to specific syntactic forms linked to useful semantic invariance properties. This paper studies the information inequality problem in different syntactic and semantic scenarios that arise from database applications. Focusing on the boundary between tractability and intractability, we show that the information inequality problem is coNP-complete if restricted to normal polymatroids, and in polynomial time if relaxed to monotone functions. We also examine syntactic restrictions related to query output size bounds, and provide an alternative proof, through monotone functions, for the polynomial-time computability of the entropic bound over simple sets of degree constraints.

Cite as

Miika Hannula. Information Inequality Problem over Set Functions. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 19:1-19:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hannula:LIPIcs.ICDT.2024.19,
  author =	{Hannula, Miika},
  title =	{{Information Inequality Problem over Set Functions}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{19:1--19:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.19},
  URN =		{urn:nbn:de:0030-drops-198011},
  doi =		{10.4230/LIPIcs.ICDT.2024.19},
  annote =	{Keywords: entropy, information theory, worst-case output size, computational complexity}
}
Document
Conditional Independence on Semiring Relations

Authors: Miika Hannula


Abstract
Conditional independence plays a foundational role in database theory, probability theory, information theory, and graphical models. In databases, a notion similar to conditional independence, known as the (embedded) multivalued dependency, appears in database normalization. Many properties of conditional independence are shared across various domains, and to some extent these commonalities can be studied through a measure-theoretic approach. The present paper proposes an alternative approach via semiring relations, defined by extending database relations with tuple annotations from some commutative semiring. Integrating various interpretations of conditional independence in this context, we investigate how the choice of the underlying semiring impacts the corresponding axiomatic and decomposition properties. We specifically identify positivity and multiplicative cancellativity as the key semiring properties that enable extending results from the relational context to the broader semiring framework. Additionally, we explore the relationships between different conditional independence notions through model theory.

Cite as

Miika Hannula. Conditional Independence on Semiring Relations. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hannula:LIPIcs.ICDT.2024.20,
  author =	{Hannula, Miika},
  title =	{{Conditional Independence on Semiring Relations}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.20},
  URN =		{urn:nbn:de:0030-drops-198023},
  doi =		{10.4230/LIPIcs.ICDT.2024.20},
  annote =	{Keywords: semiring, conditional independence, functional dependency, decomposition, axiom}
}
Document
Subgraph Enumeration in Optimal I/O Complexity

Authors: Shiyuan Deng and Yufei Tao


Abstract
Given a massive data graph G = (V, E) and a small pattern graph Q, the goal of subgraph enumeration is to list all the subgraphs of G isomorphic to Q. In the external memory (EM) model, it is well-known that every indivisible algorithm must perform Ω({|E|^ρ}/{M^{ρ-1} B}) I/Os in the worst case, where M represents the number of words in (internal) memory, B denotes the number of words in a disk block, and ρ is the fractional edge covering number of Q. It has been a longstanding open problem to design an algorithm to match this lower bound. The state of the art is an algorithm in ICDT'23 that achieves an I/O complexity of O({|E|^ρ}/{M^{ρ-1} B} log_{M/B} |E|/B) with high probability. In this paper, we remove the log_{M/B} |E|/B factor, thereby settling the open problem when randomization is permitted.

Cite as

Shiyuan Deng and Yufei Tao. Subgraph Enumeration in Optimal I/O Complexity. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{deng_et_al:LIPIcs.ICDT.2024.21,
  author =	{Deng, Shiyuan and Tao, Yufei},
  title =	{{Subgraph Enumeration in Optimal I/O Complexity}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.21},
  URN =		{urn:nbn:de:0030-drops-198033},
  doi =		{10.4230/LIPIcs.ICDT.2024.21},
  annote =	{Keywords: Subgraph Enumeration, Conjunctive Queries, External Memory, Algorithms}
}
Document
Evaluating Graph Queries Using Semantic Treewidth

Authors: Cristina Feier, Tomasz Gogacz, and Filip Murlak


Abstract
Unions of conjunctive two-way regular path queries (UC2RPQs) are a common abstraction of query languages for graph databases, much like unions of conjunctive queries (UCQs) in the relational case. As in the case of UCQs, their evaluation is NP-complete in combined complexity. Semantic tree-width, i.e. the minimal treewidth of equivalent queries, has been proposed as a candidate criterion to characterize fixed-parameter tractability of UC2RPQs. It was recently shown how to decide the semantic tree-width of a UC2RPQ, by constructing the best under-approximation of a given treewidth, in the form of a UC2RPQ of size doubly exponential in the size of the original query. This leads to an fpt algorithm for evaluating UC2RPQs of semantic TW k which runs in time doubly exponential in the size of the parameter, i.e. in the UC2RPQ. Here we describe a more efficient fpt algorithm for evaluating UC2RPQs of semantic treewidth k which runs in time singly exponential in the size of the parameter. We do this by a careful construction of a witness query which, while still being doubly exponential, can be represented as a Datalog program of bounded width and singly exponential size.

Cite as

Cristina Feier, Tomasz Gogacz, and Filip Murlak. Evaluating Graph Queries Using Semantic Treewidth. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 22:1-22:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{feier_et_al:LIPIcs.ICDT.2024.22,
  author =	{Feier, Cristina and Gogacz, Tomasz and Murlak, Filip},
  title =	{{Evaluating Graph Queries Using Semantic Treewidth}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{22:1--22:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.22},
  URN =		{urn:nbn:de:0030-drops-198048},
  doi =		{10.4230/LIPIcs.ICDT.2024.22},
  annote =	{Keywords: conjunctive two-way regular path queries, fixed-parameter tractable evaluation, semantic treewidth, Datalog encoding, optimization}
}
Document
Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling

Authors: Ru Wang and Yufei Tao


Abstract
Given a (natural) join with an acyclic set of degree constraints (the join itself does not need to be acyclic), we show how to draw a uniformly random sample from the join result in O(polymat/max{1, OUT}) expected time (assuming data complexity) after a preprocessing phase of O(IN) expected time, where IN, OUT, and polymat are the join’s input size, output size, and polymatroid bound, respectively. This compares favorably with the state of the art (Deng et al. and Kim et al., both in PODS'23), which states that, in the absence of degree constraints, a uniformly random sample can be drawn in Õ(AGM/max{1, OUT}) expected time after a preprocessing phase of Õ(IN) expected time, where AGM is the join’s AGM bound and Õ(.) hides a polylog(IN) factor. Our algorithm applies to every join supported by the solutions of Deng et al. and Kim et al. Furthermore, since the polymatroid bound is at most the AGM bound, our performance guarantees are never worse, but can be considerably better, than those of Deng et al. and Kim et al. We then utilize our techniques to tackle directed subgraph sampling, a problem that has extensive database applications and bears close relevance to joins. Let G = (V, E) be a directed data graph where each vertex has an out-degree at most λ, and let P be a directed pattern graph with a constant number of vertices. The objective is to uniformly sample an occurrence of P in G. The problem can be modeled as join sampling with input size IN = Θ(|E|) but, whenever P contains cycles, the converted join has cyclic degree constraints. We show that it is always possible to throw away certain degree constraints such that (i) the remaining constraints are acyclic and (ii) the new join has asymptotically the same polymatroid bound polymat as the old one. Combining this finding with our new join sampling solution yields an algorithm to sample from the original (cyclic) join (thereby yielding a uniformly random occurrence of P) in O(polymat/max{1, OUT}) expected time after O(|E|) expected-time preprocessing, where OUT is the number of occurrences.

Cite as

Ru Wang and Yufei Tao. Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 23:1-23:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{wang_et_al:LIPIcs.ICDT.2024.23,
  author =	{Wang, Ru and Tao, Yufei},
  title =	{{Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{23:1--23:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.23},
  URN =		{urn:nbn:de:0030-drops-198054},
  doi =		{10.4230/LIPIcs.ICDT.2024.23},
  annote =	{Keywords: Join Sampling, Subgraph Sampling, Degree Constraints, Polymatroid Bounds}
}
Document
Finding Smallest Witnesses for Conjunctive Queries

Authors: Xiao Hu and Stavros Sintos


Abstract
A witness is a sub-database that preserves the query results of the original database but of much smaller size. It has wide applications in query rewriting and debugging, query explanation, IoT analytics, multi-layer network routing, etc. In this paper, we study the smallest witness problem (SWP) for the class of conjunctive queries (CQs) without self-joins. We first establish the dichotomy that SWP for a CQ can be computed in polynomial time if and only if it has head-cluster property, unless P = NP. We next turn to the approximated version by relaxing the size of a witness from being minimum. We surprisingly find that the head-domination property - that has been identified for the deletion propagation problem [Kimelfeld et al., 2012] - can also precisely capture the hardness of the approximated smallest witness problem. In polynomial time, SWP for any CQ with head-domination property can be approximated within a constant factor, while SWP for any CQ without such a property cannot be approximated within a logarithmic factor, unless P = NP. We further explore efficient approximation algorithms for CQs without head-domination property: (1) we show a trivial algorithm which achieves a polynomially large approximation ratio for general CQs; (2) for any CQ with only one non-output attribute, such as star CQs, we show a greedy algorithm with a logarithmic approximation ratio; (3) for line CQs, which contain at least two non-output attributes, we relate SWP problem to the directed steiner forest problem, whose algorithms can be applied to line CQs directly. Meanwhile, we establish a much higher lower bound, exponentially larger than the logarithmic lower bound obtained above. It remains open to close the gap between the lower and upper bound of the approximated SWP for CQs without head-domination property.

Cite as

Xiao Hu and Stavros Sintos. Finding Smallest Witnesses for Conjunctive Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hu_et_al:LIPIcs.ICDT.2024.24,
  author =	{Hu, Xiao and Sintos, Stavros},
  title =	{{Finding Smallest Witnesses for Conjunctive Queries}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{24:1--24:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.24},
  URN =		{urn:nbn:de:0030-drops-198066},
  doi =		{10.4230/LIPIcs.ICDT.2024.24},
  annote =	{Keywords: conjunctive query, smallest witness, head-cluster, head-domination}
}
Document
Ranked Enumeration for MSO on Trees via Knowledge Compilation

Authors: Antoine Amarilli, Pierre Bourhis, Florent Capelli, and Mikaël Monet


Abstract
We study the problem of enumerating the satisfying assignments for certain circuit classes from knowledge compilation, where assignments are ranked in a specific order. In particular, we show how this problem can be used to efficiently perform ranked enumeration of the answers to MSO queries over trees, with the order being given by a ranking function satisfying a subset-monotonicity property. Assuming that the number of variables is constant, we show that we can enumerate the satisfying assignments in ranked order for so-called multivalued circuits that are smooth, decomposable, and in negation normal form (smooth multivalued DNNF). There is no preprocessing and the enumeration delay is linear in the size of the circuit times the number of values, plus a logarithmic term in the number of assignments produced so far. If we further assume that the circuit is deterministic (smooth multivalued d-DNNF), we can achieve linear-time preprocessing in the circuit, and the delay only features the logarithmic term.

Cite as

Antoine Amarilli, Pierre Bourhis, Florent Capelli, and Mikaël Monet. Ranked Enumeration for MSO on Trees via Knowledge Compilation. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2024.25,
  author =	{Amarilli, Antoine and Bourhis, Pierre and Capelli, Florent and Monet, Mika\"{e}l},
  title =	{{Ranked Enumeration for MSO on Trees via Knowledge Compilation}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.25},
  URN =		{urn:nbn:de:0030-drops-198079},
  doi =		{10.4230/LIPIcs.ICDT.2024.25},
  annote =	{Keywords: Enumeration, knowledge compilation, monadic second-order logic}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail