LIPIcs, Volume 220

25th International Conference on Database Theory (ICDT 2022)



Thumbnail PDF

Event

ICDT 2022, March 29 to April 1, 2022, Edinburgh, UK (Virtual Conference)

Editors

Dan Olteanu
  • University of Zurich, Switzerland
Nils Vortmeier
  • University of Zurich, Switzerland

Publication Details

  • published at: 2022-03-19
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-223-5
  • DBLP: db/conf/icdt/icdt2022

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 220, ICDT 2022, Complete Volume

Authors: Dan Olteanu and Nils Vortmeier


Abstract
LIPIcs, Volume 220, ICDT 2022, Complete Volume

Cite as

25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1-354, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@Proceedings{olteanu_et_al:LIPIcs.ICDT.2022,
  title =	{{LIPIcs, Volume 220, ICDT 2022, Complete Volume}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1--354},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022},
  URN =		{urn:nbn:de:0030-drops-158737},
  doi =		{10.4230/LIPIcs.ICDT.2022},
  annote =	{Keywords: LIPIcs, Volume 220, ICDT 2022, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Dan Olteanu and Nils Vortmeier


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{olteanu_et_al:LIPIcs.ICDT.2022.0,
  author =	{Olteanu, Dan and Vortmeier, Nils},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.0},
  URN =		{urn:nbn:de:0030-drops-158745},
  doi =		{10.4230/LIPIcs.ICDT.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Authors: Hung Q. Ngo


Abstract
This article is a companion to an invited talk at ICDT'2022 with the same title. Cardinality estimation is among the most important problems in query optimization. It is well-documented that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions. Our approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also explain connections to other fundamental problems in information theory and database theory regarding information inequalities. The talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, Dan Suciu, and Alireza Samadian Zakaria

Cite as

Hung Q. Ngo. On an Information Theoretic Approach to Cardinality Estimation (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1:1-1:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{ngo:LIPIcs.ICDT.2022.1,
  author =	{Ngo, Hung Q.},
  title =	{{On an Information Theoretic Approach to Cardinality Estimation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1:1--1:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.1},
  URN =		{urn:nbn:de:0030-drops-158750},
  doi =		{10.4230/LIPIcs.ICDT.2022.1},
  annote =	{Keywords: Cardinality Estimation, Information Theory, Polymatroid Bound, Worst-case Optimal Join}
}
Document
Invited Talk
Counting the Solutions to a Query (Invited Talk)

Authors: Marcelo Arenas


Abstract
In this talk, we consider the problem of counting the solutions to a query. Our first motivating scenario is the use of regular expressions to extract paths from a graph database. More specifically, given a graph database D, a regular expression r and a natural number n, consider the problem of counting the number of paths p in D such that p conforms to r and the length of p is n. This problem is known to be hard, namely #P-complete. In this talk, we show that this problem admits a fully polynomial-time randomized approximation scheme (FPRAS). Remarkably, the key idea to prove this result is to show that the fundamental problem #NFA admits an FPRAS, where #NFA is the problem of counting the number of strings of length n accepted by a non-deterministic finite automaton (NFA). While this problem is known to be #P-complete and, more precisely, SpanL-complete, it was open whether this problem admits an FPRAS. In this work, we solve this open problem and obtain as a welcome corollary that every function in SpanL admits an FPRAS. As a second motivating scenario, we consider the widely used class of conjunctive queries over relational databases. More specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first FPRAS for counting the answers to a query in C. In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries. As for the case of graph databases, the key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we show that the problem #TA admits an FPRAS, where #TA is the problem of counting the number of trees of size n accepted by a tree automaton (TA). This talk is based on the results presented in [Marcelo Arenas et al., 2021; Marcelo Arenas et al., 2021].

Cite as

Marcelo Arenas. Counting the Solutions to a Query (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{arenas:LIPIcs.ICDT.2022.2,
  author =	{Arenas, Marcelo},
  title =	{{Counting the Solutions to a Query}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.2},
  URN =		{urn:nbn:de:0030-drops-158763},
  doi =		{10.4230/LIPIcs.ICDT.2022.2},
  annote =	{Keywords: Counting, query answering, fully polynomial-time randomized approximation scheme}
}
Document
Invited Talk
Answering Unions of Conjunctive Queries with Ideal Time Guarantees (Invited Talk)

Authors: Nofar Carmeli


Abstract
The holy grail we strive for is, given a query, to identify an algorithm that answers it over general databases with optimal time guarantees for the specific query. In this tutorial, we focus on what can be seen as ideal time guarantees: linear preprocessing (needed to read the input) and constant time per answer (needed to print the output). We seek to understand which queries can be solved with these (or almost these) time guarantees and how. We start with the basic building blocks of database queries: joins, and slowly increase the expressivity by introducing projections and unions until we cover positive relational algebra. We first consider the task of enumerating all query answers and then discuss related, more demanding, tasks such as ordered enumeration and direct access to query answers. We investigate the challenges in answering such queries and provide algorithms and conditional lower bounds

Cite as

Nofar Carmeli. Answering Unions of Conjunctive Queries with Ideal Time Guarantees (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{carmeli:LIPIcs.ICDT.2022.3,
  author =	{Carmeli, Nofar},
  title =	{{Answering Unions of Conjunctive Queries with Ideal Time Guarantees}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.3},
  URN =		{urn:nbn:de:0030-drops-158771},
  doi =		{10.4230/LIPIcs.ICDT.2022.3},
  annote =	{Keywords: query evaluation, enumeration, fine-grained complexity, constant delay, union of conjunctive queries}
}
Document
On the Hardness of Category Tree Construction

Authors: Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov


Abstract
Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order Θ̃(√n) or Θ̃(n), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.

Cite as

Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov. On the Hardness of Category Tree Construction. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{gershtein_et_al:LIPIcs.ICDT.2022.4,
  author =	{Gershtein, Shay and Avron, Uri and Guy, Ido and Milo, Tova and Novgorodov, Slava},
  title =	{{On the Hardness of Category Tree Construction}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.4},
  URN =		{urn:nbn:de:0030-drops-158785},
  doi =		{10.4230/LIPIcs.ICDT.2022.4},
  annote =	{Keywords: maximum independent set, approximation algorithms, approximation hardness bounds, taxonomy construction, category tree construction}
}
Document
Linear Programs with Conjunctive Queries

Authors: Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon


Abstract
In this paper, we study the problem of optimizing a linear program whose variables are the answers to a conjunctive query. For this we propose the language LP(CQ) for specifying linear programs whose constraints and objective functions depend on the answer sets of conjunctive queries. We contribute an efficient algorithm for solving programs in a fragment of LP(CQ). The naive approach constructs a linear program having as many variables as there are elements in the answer set of the queries. Our approach constructs a linear program having the same optimal value but fewer variables. This is done by exploiting the structure of the conjunctive queries using generalized hypertree decompositions of small width to factorize elements of the answer set together. We illustrate the various applications of LP(CQ) programs on three examples: optimizing deliveries of resources, minimizing noise for differential privacy, and computing the s-measure of patterns in graphs as needed for data mining.

Cite as

Florent Capelli, Nicolas Crosetti, Joachim Niehren, and Jan Ramon. Linear Programs with Conjunctive Queries. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{capelli_et_al:LIPIcs.ICDT.2022.5,
  author =	{Capelli, Florent and Crosetti, Nicolas and Niehren, Joachim and Ramon, Jan},
  title =	{{Linear Programs with Conjunctive Queries}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.5},
  URN =		{urn:nbn:de:0030-drops-158796},
  doi =		{10.4230/LIPIcs.ICDT.2022.5},
  annote =	{Keywords: Database queries, linear programming, hypergraph decomposition}
}
Document
Certifiable Robustness for Nearest Neighbor Classifiers

Authors: Austen Z. Fan and Paraschos Koutris


Abstract
ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, k-Nearest Neighbors (k-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

Cite as

Austen Z. Fan and Paraschos Koutris. Certifiable Robustness for Nearest Neighbor Classifiers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{fan_et_al:LIPIcs.ICDT.2022.6,
  author =	{Fan, Austen Z. and Koutris, Paraschos},
  title =	{{Certifiable Robustness for Nearest Neighbor Classifiers}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{6:1--6:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.6},
  URN =		{urn:nbn:de:0030-drops-158809},
  doi =		{10.4230/LIPIcs.ICDT.2022.6},
  annote =	{Keywords: Inconsistent databases, k-NN classification, certifiable robustness}
}
Document
Improved Approximation and Scalability for Fair Max-Min Diversification

Authors: Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou


Abstract
Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m]. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Cite as

Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved Approximation and Scalability for Fair Max-Min Diversification. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 7:1-7:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7,
  author =	{Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria},
  title =	{{Improved Approximation and Scalability for Fair Max-Min Diversification}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{7:1--7:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.7},
  URN =		{urn:nbn:de:0030-drops-158812},
  doi =		{10.4230/LIPIcs.ICDT.2022.7},
  annote =	{Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms}
}
Document
Rewriting with Acyclic Queries: Mind Your Head

Authors: Gaetano Geck, Jens Keppeler, Thomas Schwentick, and Christopher Spinrath


Abstract
The paper studies the rewriting problem, that is, the decision problem whether, for a given conjunctive query Q and a set 𝒱 of views, there is a conjunctive query Q' over 𝒱 that is equivalent to Q, for cases where the query, the views, and/or the desired rewriting are acyclic or even more restricted. It shows that, if Q itself is acyclic, an acyclic rewriting exists if there is any rewriting. An analogous statement also holds for free-connex acyclic, hierarchical, and q-hierarchical queries. Regarding the complexity of the rewriting problem, the paper identifies a border between tractable and (presumably) intractable variants of the rewriting problem: for schemas of bounded arity, the acyclic rewriting problem is NP-hard, even if both Q and the views in 𝒱 are acyclic or hierarchical. However, it becomes tractable, if the views are free-connex acyclic (i.e., in a nutshell, their body is (i) acyclic and (ii) remains acyclic if their head is added as an additional atom).

Cite as

Gaetano Geck, Jens Keppeler, Thomas Schwentick, and Christopher Spinrath. Rewriting with Acyclic Queries: Mind Your Head. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 8:1-8:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{geck_et_al:LIPIcs.ICDT.2022.8,
  author =	{Geck, Gaetano and Keppeler, Jens and Schwentick, Thomas and Spinrath, Christopher},
  title =	{{Rewriting with Acyclic Queries: Mind Your Head}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{8:1--8:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.8},
  URN =		{urn:nbn:de:0030-drops-158829},
  doi =		{10.4230/LIPIcs.ICDT.2022.8},
  annote =	{Keywords: rewriting, acyclic rewriting, acyclic conjunctive queries, free-connex queries, hierarchical queries, NP-hardness}
}
Document
Parallel Acyclic Joins with Canonical Edge Covers

Authors: Yufei Tao


Abstract
In PODS'21, Hu presented an algorithm in the massively parallel computation (MPC) model that processes any acyclic join with an asymptotically optimal load. In this paper, we present an alternative analysis of her algorithm. The novelty of our analysis is in the revelation of a new mathematical structure - which we name canonical edge cover - for acyclic hypergraphs. We prove non-trivial properties for canonical edge covers that offer us a graph-theoretic perspective about why Hu’s algorithm works.

Cite as

Yufei Tao. Parallel Acyclic Joins with Canonical Edge Covers. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{tao:LIPIcs.ICDT.2022.9,
  author =	{Tao, Yufei},
  title =	{{Parallel Acyclic Joins with Canonical Edge Covers}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.9},
  URN =		{urn:nbn:de:0030-drops-158838},
  doi =		{10.4230/LIPIcs.ICDT.2022.9},
  annote =	{Keywords: Joins, Conjunctive Queries, MPC Algorithms, Parallel Computing}
}
Document
Splitting Spanner Atoms: A Tool for Acyclic Core Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson


Abstract
This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs - conjunctive queries in FC, a logic that is based on word equations. We introduce a way to decompose word equations of unbounded arity into a conjunction of binary word equations. If the result of the decomposition is acyclic, then evaluation and enumeration of results become tractable. The main result of this work is an algorithm that decides in polynomial time whether an FC-CQ can be decomposed into an acyclic FC-CQ. We also give an efficient conversion from synchronized SERCQs to FC-CQs with regular constraints. As a consequence, tractability results for acyclic relational CQs directly translate to a large class of SERCQs.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Splitting Spanner Atoms: A Tool for Acyclic Core Spanners. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2022.10,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Splitting Spanner Atoms: A Tool for Acyclic Core Spanners}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{10:1--10:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.10},
  URN =		{urn:nbn:de:0030-drops-158843},
  doi =		{10.4230/LIPIcs.ICDT.2022.10},
  annote =	{Keywords: Document spanners, information extraction, conjunctive queries}
}
Document
Practical Relational Calculus Query Evaluation

Authors: Martin Raszyk, David Basin, Srđan Krstić, and Dmitriy Traytel


Abstract
The relational calculus (RC) is a concise, declarative query language. However, existing RC query evaluation approaches are inefficient and often deviate from established algorithms based on finite tables used in database management systems. We devise a new translation of an arbitrary RC query into two safe-range queries, for which the finiteness of the query’s evaluation result is guaranteed. Assuming an infinite domain, the two queries have the following meaning: The first is closed and characterizes the original query’s relative safety, i.e., whether given a fixed database, the original query evaluates to a finite relation. The second safe-range query is equivalent to the original query, if the latter is relatively safe. We compose our translation with other, more standard ones to ultimately obtain two SQL queries. This allows us to use standard database management systems to evaluate arbitrary RC queries. We show that our translation improves the time complexity over existing approaches, which we also empirically confirm in both realistic and synthetic experiments.

Cite as

Martin Raszyk, David Basin, Srđan Krstić, and Dmitriy Traytel. Practical Relational Calculus Query Evaluation. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{raszyk_et_al:LIPIcs.ICDT.2022.11,
  author =	{Raszyk, Martin and Basin, David and Krsti\'{c}, Sr{\d}an and Traytel, Dmitriy},
  title =	{{Practical Relational Calculus Query Evaluation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{11:1--11:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.11},
  URN =		{urn:nbn:de:0030-drops-158857},
  doi =		{10.4230/LIPIcs.ICDT.2022.11},
  annote =	{Keywords: Relational calculus, relative safety, safe-range, query translation}
}
Document
Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs

Authors: Cristina Feier


Abstract
We consider the parameterized complexity of evaluating Ontology Mediated Queries (OMQ) based on Guarded TGDs (GTGD) and Unions of Conjunctive Queries, in the case where relational symbols have unrestricted arity and where the parameter is the size of the OMQ. We establish exact criteria for fixed-parameter tractable (fpt) evaluation of recursively enumerable (r.e.) classes of such OMQs (under the widely held Exponential Time Hypothesis). One of the main technical tools introduced in the paper is an fpt-reduction from deciding parameterized uniform CSPs to parameterized OMQ evaluation. The reduction preserves measures known to be essential for classifying r.e. classes of parameterized uniform CSPs: submodular width (according to the well known result of Marx for unrestricted-arity schemas) and treewidth (according to the well known result of Grohe for bounded-arity schemas). As such, it can be employed to obtain hardness results for evaluation of r.e. classes of parameterized OMQs based on GTGD both in the unrestricted and in the bounded arity case. Previously, for bounded arity schemas, this has been tackled using a technique requiring full introspection into the construction employed by Grohe.

Cite as

Cristina Feier. Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{feier:LIPIcs.ICDT.2022.12,
  author =	{Feier, Cristina},
  title =	{{Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.12},
  URN =		{urn:nbn:de:0030-drops-158869},
  doi =		{10.4230/LIPIcs.ICDT.2022.12},
  annote =	{Keywords: omq, fpt evaluation, guarded tgds, unbounded arity, submodular width}
}
Document
Tuple-Generating Dependencies Capture Complex Values

Authors: Maximilian Marx and Markus Krötzsch


Abstract
We formalise a variant of Datalog that allows complex values constructed by nesting elements of the input database in sets and tuples. We study its complexity and give a translation into sets of tuple-generating dependencies (TGDs) for which the standard chase terminates on any input database. We identify a fragment for which reasoning is tractable. As membership is undecidable for this fragment, we develop decidable sufficient conditions.

Cite as

Maximilian Marx and Markus Krötzsch. Tuple-Generating Dependencies Capture Complex Values. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 13:1-13:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{marx_et_al:LIPIcs.ICDT.2022.13,
  author =	{Marx, Maximilian and Kr\"{o}tzsch, Markus},
  title =	{{Tuple-Generating Dependencies Capture Complex Values}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{13:1--13:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.13},
  URN =		{urn:nbn:de:0030-drops-158876},
  doi =		{10.4230/LIPIcs.ICDT.2022.13},
  annote =	{Keywords: terminating standard chase, existential rules, Datalog, complexity}
}
Document
Inference of Shape Graphs for Graph Databases

Authors: Benoît Groz, Aurélien Lemay, Sławek Staworko, and Piotr Wieczorek


Abstract
We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound, i.e., always producing a schema that validates the input graph, and complete, i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance.

Cite as

Benoît Groz, Aurélien Lemay, Sławek Staworko, and Piotr Wieczorek. Inference of Shape Graphs for Graph Databases. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 14:1-14:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{groz_et_al:LIPIcs.ICDT.2022.14,
  author =	{Groz, Beno\^{i}t and Lemay, Aur\'{e}lien and Staworko, S{\l}awek and Wieczorek, Piotr},
  title =	{{Inference of Shape Graphs for Graph Databases}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{14:1--14:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.14},
  URN =		{urn:nbn:de:0030-drops-158889},
  doi =		{10.4230/LIPIcs.ICDT.2022.14},
  annote =	{Keywords: RDF, Schema, Inference, Learning, Fitting, Minimality, Containment}
}
Document
Expressiveness of SHACL Features

Authors: Bart Bogaerts, Maxime Jakubowski, and Jan Van den Bussche


Abstract
SHACL is a W3C-proposed schema language for expressing structural constraints on RDF graphs. Recent work on formalizing this language has revealed a striking relationship to description logics. SHACL expressions can use four fundamental features that are not so common in description logics. These features are zero-or-one path expressions; equality tests; disjointness tests; and closure constraints. Moreover, SHACL is peculiar in allowing only a restricted form of expressions (so-called targets) on the left-hand side of inclusion constraints. The goal of this paper is to obtain a clear picture of the impact and expressiveness of these features and restrictions. We show that each of the four features is primitive: using the feature, one can express boolean queries that are not expressible without using the feature. We also show that the restriction that SHACL imposes on allowed targets is inessential, as long as closure constraints are not used.

Cite as

Bart Bogaerts, Maxime Jakubowski, and Jan Van den Bussche. Expressiveness of SHACL Features. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 15:1-15:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{bogaerts_et_al:LIPIcs.ICDT.2022.15,
  author =	{Bogaerts, Bart and Jakubowski, Maxime and Van den Bussche, Jan},
  title =	{{Expressiveness of SHACL Features}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{15:1--15:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.15},
  URN =		{urn:nbn:de:0030-drops-158890},
  doi =		{10.4230/LIPIcs.ICDT.2022.15},
  annote =	{Keywords: Expressive power, schema languages}
}
Document
Robustness Against Read Committed for Transaction Templates with Functional Constraints

Authors: Brecht Vandevoort, Bas Ketsman, Christoph Koch, and Frank Neven


Abstract
The popular isolation level Multiversion Read Committed (RC) trades some of the strong guarantees of serializability for increased transaction throughput. Sometimes, transaction workloads can be safely executed under RC obtaining serializability at the lower cost of RC. Such workloads are said to be robust against RC. Previous work has yielded a tractable procedure for deciding robustness against RC for workloads generated by transaction programs modeled as transaction templates. An important insight of that work is that, by more accurately modeling transaction programs, we are able to recognize larger sets of workloads as robust. In this work, we increase the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys. We show that the incorporation of functional constraints can identify more workloads as robust that otherwise would not be. Even though we establish that the robustness problem becomes undecidable in its most general form, we show that various restrictions on functional constraints lead to decidable and even tractable fragments that can be used to model and test for robustness against RC for realistic scenarios.

Cite as

Brecht Vandevoort, Bas Ketsman, Christoph Koch, and Frank Neven. Robustness Against Read Committed for Transaction Templates with Functional Constraints. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{vandevoort_et_al:LIPIcs.ICDT.2022.16,
  author =	{Vandevoort, Brecht and Ketsman, Bas and Koch, Christoph and Neven, Frank},
  title =	{{Robustness Against Read Committed for Transaction Templates with Functional Constraints}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.16},
  URN =		{urn:nbn:de:0030-drops-158905},
  doi =		{10.4230/LIPIcs.ICDT.2022.16},
  annote =	{Keywords: concurrency control, robustness, complexity}
}
Document
A Dyadic Simulation Approach to Efficient Range-Summability

Authors: Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara


Abstract
Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees.

Cite as

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. A Dyadic Simulation Approach to Efficient Range-Summability. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{meng_et_al:LIPIcs.ICDT.2022.17,
  author =	{Meng, Jingfan and Wang, Huayi and Xu, Jun and Ogihara, Mitsunori},
  title =	{{A Dyadic Simulation Approach to Efficient Range-Summability}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.17},
  URN =		{urn:nbn:de:0030-drops-158915},
  doi =		{10.4230/LIPIcs.ICDT.2022.17},
  annote =	{Keywords: fast range-summation, locality-sensitive hashing, rejection sampling}
}
Document
Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

Authors: Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich


Abstract
We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w, and a tuple c = ((c^-_1, c^+_1), (c^-_2, c^+_2), …, (c^-_{|s|-1}, c^+_{|s|-1})) of local gap-size constraints over ℕ × (ℕ ∪ {∞}). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w, and the i^{th} gap of the subsequence (i. e., the distance between the i^{th} and (i+1)^{th} position of the subsequence) has length at least c^-_i and at most c^+_i. We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving that any query discovery algorithm also yields an algorithm for the matching problem. Hence, lower bounds on the complexity of the matching problem directly translate into according lower bounds of the query discovery problem. As a proof of concept, we also implemented a prototype of our algorithm and tested it on real-world data.

Cite as

Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, and Matthias Weidlich. Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 18:1-18:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{kleestmeiner_et_al:LIPIcs.ICDT.2022.18,
  author =	{Kleest-Mei{\ss}ner, Sarah and Sattler, Rebecca and Schmid, Markus L. and Schweikardt, Nicole and Weidlich, Matthias},
  title =	{{Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{18:1--18:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.18},
  URN =		{urn:nbn:de:0030-drops-158922},
  doi =		{10.4230/LIPIcs.ICDT.2022.18},
  annote =	{Keywords: event queries on traces, pattern queries on strings, learning descriptive queries, complexity of query evaluation and query learning}
}
Document
Streaming Enumeration on Nested Documents

Authors: Martín Muñoz and Cristian Riveros


Abstract
Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage.

Cite as

Martín Muñoz and Cristian Riveros. Streaming Enumeration on Nested Documents. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 19:1-19:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{munoz_et_al:LIPIcs.ICDT.2022.19,
  author =	{Mu\~{n}oz, Mart{\'\i}n and Riveros, Cristian},
  title =	{{Streaming Enumeration on Nested Documents}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{19:1--19:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.19},
  URN =		{urn:nbn:de:0030-drops-158935},
  doi =		{10.4230/LIPIcs.ICDT.2022.19},
  annote =	{Keywords: Streaming, nested documents, query evaluation, enumeration algorithms}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail