LIPIcs, Volume 98

21st International Conference on Database Theory (ICDT 2018)



Thumbnail PDF

Event

ICDT 2018, March 26-29, 2018, Vienna, Austria

Editors

Benny Kimelfeld
Yael Amsterdamer

Publication Details

  • published at: 2018-03-05
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-063-7
  • DBLP: db/conf/icdt/icdt2018

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 98, ICDT'18, Complete Volume

Authors: Benny Kimelfeld and Yael Amsterdamer


Abstract
LIPIcs, Volume 98, ICDT'18, Complete Volume

Cite as

21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Proceedings{kimelfeld_et_al:LIPIcs.ICDT.2018,
  title =	{{LIPIcs, Volume 98, ICDT'18, Complete Volume}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018},
  URN =		{urn:nbn:de:0030-drops-86795},
  doi =		{10.4230/LIPIcs.ICDT.2018},
  annote =	{Keywords: Information systems, Data management systems, Information systems, Database design and models, Information systems, Database query processing}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Benny Kimelfeld and Yael Amsterdamer


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2018.0,
  author =	{Kimelfeld, Benny and Amsterdamer, Yael},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.0},
  URN =		{urn:nbn:de:0030-drops-85938},
  doi =		{10.4230/LIPIcs.ICDT.2018.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Fine-grained Algorithms and Complexity

Authors: Virginia Vassilevska Williams


Abstract
A central goal of algorithmic research is to determine how fast computational problems can be solved in the worst case. Theorems from complexity theory state that there are problems that, on inputs of size n, can be solved in t(n) time but not in O(t(n)^{1-epsilon}) time for epsilon>0. The main challenge is to determine where in this hierarchy various natural and important problems lie. Throughout the years, many ingenious algorithmic techniques have been developed and applied to obtain blazingly fast algorithms for many problems. Nevertheless, for many other central problems, the best known running times are essentially those of their classical algorithms from the 1950s and 1960s. Unconditional lower bounds seem very difficult to obtain, and so practically all known time lower bounds are conditional. For years, the main tool for proving hardness of computational problems have been NP-hardness reductions, basing hardness on P neq NP. However, when we care about the exact running time (as opposed to merely polynomial vs non-polynomial), NP-hardness is not applicable, especially if the problem is already solvable in polynomial time. In recent years, a new theory has been developed, based on "fine-grained reductions" that focus on exact running times. Mimicking NP-hardness, the approach is to (1) select a key problem X that is conjectured to require essentially t(n) time for some t, and (2) reduce X in a fine-grained way to many important problems. This approach has led to the discovery of many meaningful relationships between problems, and even sometimes to equivalence classes. The main key problems used to base hardness on have been: the 3SUM problem, the CNF-SAT problem (based on the Strong Exponential Time Hypothesis (SETH)) and the All Pairs Shortest Paths Problem. Research on SETH-based lower bounds has flourished in particular in recent years showing that the classical algorithms are optimal for problems such as Approximate Diameter, Edit Distance, Frechet Distance, Longest Common Subsequence etc. In this talk I will give an overview of the current progress in this area of study, and will highlight some exciting new developments and their relationship to database theory.

Cite as

Virginia Vassilevska Williams. Fine-grained Algorithms and Complexity. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{vassilevskawilliams:LIPIcs.ICDT.2018.1,
  author =	{Vassilevska Williams, Virginia},
  title =	{{Fine-grained Algorithms and Complexity}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.1},
  URN =		{urn:nbn:de:0030-drops-86135},
  doi =		{10.4230/LIPIcs.ICDT.2018.1},
  annote =	{Keywords: algorithms, complexity, fine-grained}
}
Document
Join Algorithms: From External Memory to the BSP

Authors: Ke Yi


Abstract
Database systems have been traditionally disk-based, which had motivated the extensive study on external memory (EM) algorithms. However, as RAMs continue to get larger and cheaper, modern distributed data systems are increasingly adopting a main memory based, shared-nothing architecture, exemplified by systems like Spark and Flink. These systems can be abstracted by the BSP model (with variants like the MPC model and the MapReduce model), and there has been a strong revived interest in designing BSP algorithms for handling large amounts of data. With hard disks starting to fade away from the picture, EM algorithms may now seem less relevant. However, we observe that many of the recently developed join algorithms under the BSP model have a high degree of resemblance with their counterparts in the EM model. In this talk, I will present some recent results on join algorithms in the EM and BSP model, examine their relationships, and discuss a general theoretical framework for converting EM algorithms to the BSP.

Cite as

Ke Yi. Join Algorithms: From External Memory to the BSP. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{yi:LIPIcs.ICDT.2018.2,
  author =	{Yi, Ke},
  title =	{{Join Algorithms: From External Memory to the BSP}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.2},
  URN =		{urn:nbn:de:0030-drops-86126},
  doi =		{10.4230/LIPIcs.ICDT.2018.2},
  annote =	{Keywords: External memory model, BSP, join algorithms}
}
Document
An Update on Dynamic Complexity Theory

Authors: Thomas Zeume


Abstract
In many modern data management scenarios, data is subject to frequent changes. In order to avoid costly re-computing query answers from scratch after each small update, one can try to use auxiliary relations that have been computed before. Of course, the auxiliary relations need to be updated dynamically whenever the data changes. Dynamic complexity theory studies which queries and auxiliary relations can be updated in a highly parallel fashion, that is, by constant-depth circuits or, equivalently, by first-order formulas or the relational algebra. After gently introducing dynamic complexity theory, I will discuss recent results of the area with a focus on the dynamic complexity of the reachability query.

Cite as

Thomas Zeume. An Update on Dynamic Complexity Theory. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{zeume:LIPIcs.ICDT.2018.3,
  author =	{Zeume, Thomas},
  title =	{{An Update on Dynamic Complexity Theory}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{3:1--3:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.3},
  URN =		{urn:nbn:de:0030-drops-86117},
  doi =		{10.4230/LIPIcs.ICDT.2018.3},
  annote =	{Keywords: Dynamic descriptive complexity, SQL updates, Reachability}
}
Document
Rewriting Guarded Existential Rules into Small Datalog Programs

Authors: Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas Simkus


Abstract
The goal of this paper is to understand the relative expressiveness of the query language in which queries are specified by a set of guarded (disjunctive) tuple-generating dependencies (TGDs) and an output (or 'answer') predicate. Our main result is to show that every such query can be translated into a polynomially-sized (disjunctive) Datalog program if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant. To overcome the challenge that Datalog has no direct means to express the existential quantification present in TGDs, we define a two-player game that characterizes the satisfaction of the dependencies, and design a Datalog query that can decide the existence of a winning strategy for the game. For guarded disjunctive TGDs, we can obtain Datalog rules with disjunction in the heads. However, the use of disjunction is limited, and the resulting rules fall into a fragment that can be evaluated in deterministic single exponential time. We proceed quite differently for the case when the TGDs are not disjunctive and we show that we can obtain a plain Datalog query. Notably, unlike previous translations for related fragments, our translation requires only polynomial time if the maximal number of variables in the (disjunctive) TGDs is bounded by a constant.

Cite as

Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas Simkus. Rewriting Guarded Existential Rules into Small Datalog Programs. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 4:1-4:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ahmetaj_et_al:LIPIcs.ICDT.2018.4,
  author =	{Ahmetaj, Shqiponja and Ortiz, Magdalena and Simkus, Mantas},
  title =	{{Rewriting Guarded Existential Rules into Small Datalog Programs}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{4:1--4:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.4},
  URN =		{urn:nbn:de:0030-drops-85950},
  doi =		{10.4230/LIPIcs.ICDT.2018.4},
  annote =	{Keywords: Existential rules, Expressiveness, Descriptive Complexity, Query Rewriting}
}
Document
Enumeration on Trees under Relabelings

Authors: Antoine Amarilli, Pierre Bourhis, and Stefan Mengel


Abstract
We study how to evaluate MSO queries with free variables on trees, within the framework of enumeration algorithms. Previous work has shown how to enumerate answers with linear-time preprocessing and delay linear in the size of each output, i.e., constant-delay for free first-order variables. We extend this result to support relabelings, a restricted kind of update operations on trees which allows us to change the node labels. Our main result shows that we can enumerate the answers of MSO queries on trees with linear-time preprocessing and delay linear in each answer, while supporting node relabelings in logarithmic time. To prove this, we reuse the circuit-based enumeration structure from our earlier work, and develop techniques to maintain its index under node relabelings. We also show how enumeration under relabelings can be applied to evaluate practical query languages, such as aggregate, group-by, and parameterized queries.

Cite as

Antoine Amarilli, Pierre Bourhis, and Stefan Mengel. Enumeration on Trees under Relabelings. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2018.5,
  author =	{Amarilli, Antoine and Bourhis, Pierre and Mengel, Stefan},
  title =	{{Enumeration on Trees under Relabelings}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.5},
  URN =		{urn:nbn:de:0030-drops-86060},
  doi =		{10.4230/LIPIcs.ICDT.2018.5},
  annote =	{Keywords: enumeration, trees, updates, MSO, circuits, knowledge compilation}
}
Document
Connecting Width and Structure in Knowledge Compilation

Authors: Antoine Amarilli, Mikaël Monet, and Pierre Senellart


Abstract
Several query evaluation tasks can be done via knowledge compilation: the query result is compiled as a lineage circuit from which the answer can be determined. For such tasks, it is important to leverage some width parameters of the circuit, such as bounded treewidth or pathwidth, to convert the circuit to structured classes, e.g., deterministic structured NNFs (d-SDNNFs) or OBDDs. In this work, we show how to connect the width of circuits to the size of their structured representation, through upper and lower bounds. For the upper bound, we show how bounded-treewidth circuits can be converted to a d-SDNNF, in time linear in the circuit size. Our bound, unlike existing results, is constructive and only singly exponential in the treewidth. We show a related lower bound on monotone DNF or CNF formulas, assuming a constant bound on the arity (size of clauses) and degree (number of occurrences of each variable). Specifically, any d-SDNNF (resp., SDNNF) for such a DNF (resp., CNF) must be of exponential size in its treewidth; and the same holds for pathwidth when compiling to OBDDs. Our lower bounds, in contrast with most previous work, apply to any formula of this class, not just a well-chosen family. Hence, for our language of DNF and CNF, pathwidth and treewidth respectively characterize the efficiency of compiling to OBDDs and (d-)SDNNFs, that is, compilation is singly exponential in the width parameter. We conclude by applying our lower bound results to the task of query evaluation.

Cite as

Antoine Amarilli, Mikaël Monet, and Pierre Senellart. Connecting Width and Structure in Knowledge Compilation. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 6:1-6:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2018.6,
  author =	{Amarilli, Antoine and Monet, Mika\"{e}l and Senellart, Pierre},
  title =	{{Connecting Width and Structure in Knowledge Compilation}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{6:1--6:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.6},
  URN =		{urn:nbn:de:0030-drops-86083},
  doi =		{10.4230/LIPIcs.ICDT.2018.6},
  annote =	{Keywords: knowledge compilation, probabilistic databases, treewidth, circuits}
}
Document
A More General Theory of Static Approximations for Conjunctive Queries

Authors: Pablo Barceló, Miguel Romero, and Thomas Zeume


Abstract
Conjunctive query (CQ) evaluation is NP-complete, but becomes tractable for fragments of bounded hypertreewidth. If a CQ is hard to evaluate, it is thus useful to evaluate an approximation of it in such fragments. While underapproximations (i.e., those that return correct answers only) are well-understood, the dual notion of overapproximations that return complete (but not necessarily sound) answers, and also a more general notion of approximation based on the symmetric difference of query results, are almost unexplored. In fact, the decidability of the basic problems of evaluation, identification, and existence of those approximations, is open. We develop a connection with existential pebble game tools that allows the systematic study of such problems. In particular, we show that the evaluation and identification of overapproximations can be solved in polynomial time. We also make progress in the problem of existence of overapproximations, showing it to be decidable in 2EXPTIME over the class of acyclic CQs. Furthermore, we look at when overapproximations do not exist, suggesting that this can be alleviated by using a more liberal notion of overapproximation. We also show how to extend our tools to study symmetric difference approximations. We observe that such approximations properly extend under- and over-approximations, settle the complexity of its associated identification problem, and provide several results on existence and evaluation.

Cite as

Pablo Barceló, Miguel Romero, and Thomas Zeume. A More General Theory of Static Approximations for Conjunctive Queries. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 7:1-7:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2018.7,
  author =	{Barcel\'{o}, Pablo and Romero, Miguel and Zeume, Thomas},
  title =	{{A More General Theory of Static Approximations for Conjunctive Queries}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{7:1--7:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.7},
  URN =		{urn:nbn:de:0030-drops-86021},
  doi =		{10.4230/LIPIcs.ICDT.2018.7},
  annote =	{Keywords: conjunctive queries, hypertreewidth, approximations, pebble games}
}
Document
Answering UCQs under Updates and in the Presence of Integrity Constraints

Authors: Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt


Abstract
We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition, all tuples in the query result), and counting (output the number of tuples in the query result). We identify three increasingly restrictive classes of UCQs which we call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs. Our main results provide the following dichotomies: If the query's homomorphic core is t-hierarchical (q-hierarchical, exhaustively q-hierarchical), then the testing (enumeration, counting) problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails. We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that all values in a particular column of a relation belong to a fixed domain of constant size).

Cite as

Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering UCQs under Updates and in the Presence of Integrity Constraints. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 8:1-8:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{berkholz_et_al:LIPIcs.ICDT.2018.8,
  author =	{Berkholz, Christoph and Keppeler, Jens and Schweikardt, Nicole},
  title =	{{Answering UCQs under Updates and in the Presence of Integrity Constraints}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{8:1--8:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.8},
  URN =		{urn:nbn:de:0030-drops-85990},
  doi =		{10.4230/LIPIcs.ICDT.2018.8},
  annote =	{Keywords: dynamic query evaluation, union of conjunctive queries, constant-delay enumeration, counting problem, testing}
}
Document
Expressivity and Complexity of MongoDB Queries

Authors: Elena Botoeva, Diego Calvanese, Benjamin Cogrel, and Guohui Xiao


Abstract
In this paper, we consider MongoDB, a widely adopted but not formally understood database system managing JSON documents and equipped with a powerful query mechanism, called the aggregation framework. We provide a clean formal abstraction of this query language, which we call MQuery. We study the expressivity of MQuery, showing the equivalence of its well-typed fragment with nested relational algebra. We further investigate the computational complexity of significant fragments of it, obtaining several (tight) bounds in combined complexity, which range from LogSpace to alternating exponential-time with a polynomial number of alternations.

Cite as

Elena Botoeva, Diego Calvanese, Benjamin Cogrel, and Guohui Xiao. Expressivity and Complexity of MongoDB Queries. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 9:1-9:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{botoeva_et_al:LIPIcs.ICDT.2018.9,
  author =	{Botoeva, Elena and Calvanese, Diego and Cogrel, Benjamin and Xiao, Guohui},
  title =	{{Expressivity and Complexity of MongoDB Queries}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{9:1--9:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.9},
  URN =		{urn:nbn:de:0030-drops-86074},
  doi =		{10.4230/LIPIcs.ICDT.2018.9},
  annote =	{Keywords: MongoDB, NoSQL, aggregation framework, expressivity}
}
Document
On the Expressive Power of Query Languages for Matrices

Authors: Robert Brijder, Floris Geerts, Jan Van den Bussche, and Timmy Weerwag


Abstract
We investigate the expressive power of MATLANG, a formal language for matrix manipulation based on common matrix operations and linear algebra. The language can be extended with the operation inv of inverting a matrix. In MATLANG + inv we can compute the transitive closure of directed graphs, whereas we show that this is not possible without inversion. Indeed we show that the basic language can be simulated in the relational algebra with arithmetic operations, grouping, and summation. We also consider an operation eigen for diagonalizing a matrix, which is defined so that different eigenvectors returned for a same eigenvalue are orthogonal. We show that inv can be expressed in MATLANG + eigen. We put forward the open question whether there are boolean queries about matrices, or generic queries about graphs, expressible in MATLANG + eigen but not in MATLANG + inv. The evaluation problem for MATLANG + eigen is shown to be complete for the complexity class Exists R.

Cite as

Robert Brijder, Floris Geerts, Jan Van den Bussche, and Timmy Weerwag. On the Expressive Power of Query Languages for Matrices. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{brijder_et_al:LIPIcs.ICDT.2018.10,
  author =	{Brijder, Robert and Geerts, Floris and Van den Bussche, Jan and Weerwag, Timmy},
  title =	{{On the Expressive Power of Query Languages for Matrices}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{10:1--10:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.10},
  URN =		{urn:nbn:de:0030-drops-86007},
  doi =		{10.4230/LIPIcs.ICDT.2018.10},
  annote =	{Keywords: matrix query languages, relational algebra with aggregates, query evaluation problem, graph queries}
}
Document
Enumeration Complexity of Conjunctive Queries with Functional Dependencies

Authors: Nofar Carmeli and Markus Kröll


Abstract
We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing. In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person).

Cite as

Nofar Carmeli and Markus Kröll. Enumeration Complexity of Conjunctive Queries with Functional Dependencies. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{carmeli_et_al:LIPIcs.ICDT.2018.11,
  author =	{Carmeli, Nofar and Kr\"{o}ll, Markus},
  title =	{{Enumeration Complexity of Conjunctive Queries with Functional Dependencies}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.11},
  URN =		{urn:nbn:de:0030-drops-85988},
  doi =		{10.4230/LIPIcs.ICDT.2018.11},
  annote =	{Keywords: Enumeration, Complexity, CQs}
}
Document
Preserving Constraints with the Stable Chase

Authors: David Carral, Markus Krötzsch, Maximilian Marx, Ana Ozaki, and Sebastian Rudolph


Abstract
Conjunctive query answering over databases with constraints – also known as (tuple-generating) dependencies – is considered a central database task. To this end, several versions of a construction called chase have been described. Given a set Sigma of dependencies, it is interesting to ask which constraints not contained in Sigma that are initially satisfied in a given database instance are preserved when computing a chase over Sigma. Such constraints are an example for the more general class of incidental constraints, which when added to Sigma as new dependencies do not affect certain answers and might even speed up query answering. After formally introducing incidental constraints, we show that deciding incidentality is undecidable for tuple-generating dependencies, even in cases for which query entailment is decidable. For dependency sets with a finite universal model, the core chase can be used to decide incidentality. For the infinite case, we propose the stable chase, which generalises the core chase, and study its relation to incidental constraints.

Cite as

David Carral, Markus Krötzsch, Maximilian Marx, Ana Ozaki, and Sebastian Rudolph. Preserving Constraints with the Stable Chase. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 12:1-12:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{carral_et_al:LIPIcs.ICDT.2018.12,
  author =	{Carral, David and Kr\"{o}tzsch, Markus and Marx, Maximilian and Ozaki, Ana and Rudolph, Sebastian},
  title =	{{Preserving Constraints with the Stable Chase}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{12:1--12:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.12},
  URN =		{urn:nbn:de:0030-drops-86015},
  doi =		{10.4230/LIPIcs.ICDT.2018.12},
  annote =	{Keywords: Incidental constraints, Tuple-generating dependencies, Infinite core chase, Universal Model, BCQ entailment}
}
Document
Fast Sketch-based Recovery of Correlation Outliers

Authors: Graham Cormode and Jacques Dark


Abstract
Many data sources can be interpreted as time-series, and a key problem is to identify which pairs out of a large collection of signals are highly correlated. We expect that there will be few, large, interesting correlations, while most signal pairs do not have any strong correlation. We abstract this as the problem of identifying the highly correlated pairs in a collection of n mostly pairwise uncorrelated random variables, where observations of the variables arrives as a stream. Dimensionality reduction can remove dependence on the number of observations, but further techniques are required to tame the quadratic (in n) cost of a search through all possible pairs. We develop a new algorithm for rapidly finding large correlations based on sketch techniques with an added twist: we quickly generate sketches of random combinations of signals, and use these in concert with ideas from coding theory to decode the identity of correlated pairs. We prove correctness and compare performance and effectiveness with the best LSH (locality sensitive hashing) based approach.

Cite as

Graham Cormode and Jacques Dark. Fast Sketch-based Recovery of Correlation Outliers. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{cormode_et_al:LIPIcs.ICDT.2018.13,
  author =	{Cormode, Graham and Dark, Jacques},
  title =	{{Fast Sketch-based Recovery of Correlation Outliers}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.13},
  URN =		{urn:nbn:de:0030-drops-86094},
  doi =		{10.4230/LIPIcs.ICDT.2018.13},
  annote =	{Keywords: correlation, sketching, streaming, dimensionality reduction}
}
Document
Satisfiability for SCULPT-Schemas for CSV-Like Data

Authors: Johannes Doleschal, Wim Martens, Frank Neven, and Adam Witkowski


Abstract
SCULPT is a simple schema language inspired by the recent working effort towards a recommendation by the World Wide Web Consortium (W3C) for tabular data and metadata on the Web. In its core, a SCULPT schema consists of a set of rules where left-hand sides select sets of regions in the tabular data and the right-hand sides describe the contents of these regions. A document (divided in cells by row- and column-delimiters) then satisfies a schema if it satisfies every rule. In this paper, we study the satisfiability problem for SCULPT schemas. As SCULPT describes grid-like structures, satisfiability obviously becomes undecidable rather quickly even for very restricted schemas. We define a schema language called L-SCULPT (Lego SCULPT) that restricts the walking power of SCULPT by selecting rectangular shaped areas and only considers tables for which selected regions do not intersect. Depending on the axes used by L-SCULPT, we show that satisfiability is PTIME-complete or undecidable. One of the tractable fragments is practically useful as it extends the structural core of the current W3C proposal for schemas over tabular data. We therefore see how the navigational power of the W3C proposal can be extended while still retaining tractable satisfiability tests.

Cite as

Johannes Doleschal, Wim Martens, Frank Neven, and Adam Witkowski. Satisfiability for SCULPT-Schemas for CSV-Like Data. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{doleschal_et_al:LIPIcs.ICDT.2018.14,
  author =	{Doleschal, Johannes and Martens, Wim and Neven, Frank and Witkowski, Adam},
  title =	{{Satisfiability for SCULPT-Schemas for CSV-Like Data}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.14},
  URN =		{urn:nbn:de:0030-drops-85969},
  doi =		{10.4230/LIPIcs.ICDT.2018.14},
  annote =	{Keywords: CSV, schema languages, semi-structured data}
}
Document
Querying the Unary Negation Fragment with Regular Path Expressions

Authors: Jean Christoph Jung, Carsten Lutz, Mauricio Martel, and Thomas Schneider


Abstract
The unary negation fragment of first-order logic (UNFO) has recently been proposed as a generalization of modal logic that shares many of its good computational and model-theoretic properties. It is attractive from the perspective of database theory because it can express conjunctive queries (CQs) and ontologies formulated in many description logics (DLs). Both are relevant for ontology-mediated querying and, in fact, CQ evaluation under UNFO ontologies (and thus also under DL ontologies) can be `expressed' in UNFO as a satisfiability problem. In this paper, we consider the natural extension of UNFO with regular expressions on binary relations. The resulting logic UNFOreg can express (unions of) conjunctive two-way regular path queries (C2RPQs) and ontologies formulated in DLs that include transitive roles and regular expressions on roles. Our main results are that evaluating C2RPQs under UNFOreg ontologies is decidable, 2ExpTime-complete in combined complexity, and coNP-complete in data complexity, and that satisfiability in UNFOreg is 2ExpTime-complete, thus not harder than in UNFO.

Cite as

Jean Christoph Jung, Carsten Lutz, Mauricio Martel, and Thomas Schneider. Querying the Unary Negation Fragment with Regular Path Expressions. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{jung_et_al:LIPIcs.ICDT.2018.15,
  author =	{Jung, Jean Christoph and Lutz, Carsten and Martel, Mauricio and Schneider, Thomas},
  title =	{{Querying the Unary Negation Fragment with Regular Path Expressions}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{15:1--15:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.15},
  URN =		{urn:nbn:de:0030-drops-85971},
  doi =		{10.4230/LIPIcs.ICDT.2018.15},
  annote =	{Keywords: Query Answering, Regular Path Queries, Decidable Fragments of First-Order Logic, Computational Complexity}
}
Document
Covers of Query Results

Authors: Ahmet Kara and Dan Olteanu


Abstract
We introduce succinct lossless representations of query results called covers. They are subsets of the query results that correspond to minimal edge covers in the hypergraphs of these results. We first study covers whose structures are given by fractional hypertree decompositions of join queries. For any decomposition of a query, we give asymptotically tight size bounds for the covers of the query result over that decomposition and show that such covers can be computed in worst-case optimal time up to a logarithmic factor in the database size. For acyclic join queries, we can compute covers compositionally using query plans with a new operator called cover-join. The tuples in the query result can be enumerated from any of its covers with linearithmic pre-computation time and constant delay. We then generalize covers from joins to functional aggregate queries that express a host of computational problems such as aggregate-join queries, in-database optimization, matrix chain multiplication, and inference in probabilistic graphical models.

Cite as

Ahmet Kara and Dan Olteanu. Covers of Query Results. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 16:1-16:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2018.16,
  author =	{Kara, Ahmet and Olteanu, Dan},
  title =	{{Covers of Query Results}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{16:1--16:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.16},
  URN =		{urn:nbn:de:0030-drops-86100},
  doi =		{10.4230/LIPIcs.ICDT.2018.16},
  annote =	{Keywords: factorized database, fractional hypertree decomposition, functional aggregate query, minimal edge cover, query plan}
}
Document
Distribution Policies for Datalog

Authors: Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris


Abstract
Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds.

Cite as

Bas Ketsman, Aws Albarghouthi, and Paraschos Koutris. Distribution Policies for Datalog. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 17:1-17:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ketsman_et_al:LIPIcs.ICDT.2018.17,
  author =	{Ketsman, Bas and Albarghouthi, Aws and Koutris, Paraschos},
  title =	{{Distribution Policies for Datalog}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{17:1--17:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.17},
  URN =		{urn:nbn:de:0030-drops-86034},
  doi =		{10.4230/LIPIcs.ICDT.2018.17},
  annote =	{Keywords: Datalog queries, Distributed evaluation, Distribution policies}
}
Document
Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics

Authors: Bas Ketsman, Frank Neven, and Brecht Vandevoort


Abstract
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds.

Cite as

Bas Ketsman, Frank Neven, and Brecht Vandevoort. Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ketsman_et_al:LIPIcs.ICDT.2018.18,
  author =	{Ketsman, Bas and Neven, Frank and Vandevoort, Brecht},
  title =	{{Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{18:1--18:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.18},
  URN =		{urn:nbn:de:0030-drops-86040},
  doi =		{10.4230/LIPIcs.ICDT.2018.18},
  annote =	{Keywords: Conjunctive queries, distributed evaluation, bag semantics}
}
Document
Evaluation and Enumeration Problems for Regular Path Queries

Authors: Wim Martens and Tina Trautner


Abstract
Regular path queries (RPQs) are a central component of graph databases. We investigate decision- and enumeration problems concerning the evaluation of RPQs under several semantics that have recently been considered: arbitrary paths, shortest paths, and simple paths. Whereas arbitrary and shortest paths can be enumerated in polynomial delay, the situation is much more intricate for simple paths. For instance, already the question if a given graph contains a simple path of a certain length has cases with highly non-trivial solutions and cases that are long-standing open problems. We study RPQ evaluation for simple paths from a parameterized complexity perspective and define a class of simple transitive expressions that is prominent in practice and for which we can prove a dichotomy for the evaluation problem. We observe that, even though simple path semantics is intractable for RPQs in general, it is feasible for the vast majority of RPQs that are used in practice. At the heart of our study on simple paths is a result of independent interest: the two disjoint paths problem in directed graphs is W[1]-hard if parameterized by the length of one of the two paths.

Cite as

Wim Martens and Tina Trautner. Evaluation and Enumeration Problems for Regular Path Queries. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 19:1-19:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{martens_et_al:LIPIcs.ICDT.2018.19,
  author =	{Martens, Wim and Trautner, Tina},
  title =	{{Evaluation and Enumeration Problems for Regular Path Queries}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{19:1--19:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.19},
  URN =		{urn:nbn:de:0030-drops-85947},
  doi =		{10.4230/LIPIcs.ICDT.2018.19},
  annote =	{Keywords: graph databases, regular path queries, regular languages, parameterized complexity}
}
Document
Massively Parallel Entity Matching with Linear Classification in Low Dimensional Space

Authors: Yufei Tao


Abstract
In entity matching classification, we are given two sets R and S of objects where whether r and s form a match is known for each pair (r, s) in R x S. If R and S are subsets of domains D(R) and D(S) respectively, the goal is to discover a classifier function f: D(R) x D(S) -> {0, 1} from a certain class satisfying the property that, for every (r, s) in R x S, f(r, s) = 1 if and only if r and s are a match. Past research is accustomed to running a learning algorithm directly on all the labeled (i.e., match or not) pairs in R times S. This, however, suffers from the drawback that even reading through the input incurs a quadratic cost. We pursue a direction towards removing the quadratic barrier. Denote by T the set of matching pairs in R times S. We propose to accept R, S, and T as the input, and aim to solve the problem with cost proportional to |R|+|S|+|T|, thereby achieving a large performance gain in the (typical) scenario where |T|<<|R||S|. This paper provides evidence on the feasibility of the new direction, by showing how to accomplish the aforementioned purpose for entity matching with linear classification, where a classifier is a linear multi-dimensional plane separating the matching and non-matching pairs. We actually do so in the MPC model, echoing the trend of deploying massively parallel computing systems for large-scale learning. As a side product, we obtain new MPC algorithms for three geometric problems: linear programming, batched range counting, and dominance join.

Cite as

Yufei Tao. Massively Parallel Entity Matching with Linear Classification in Low Dimensional Space. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{tao:LIPIcs.ICDT.2018.20,
  author =	{Tao, Yufei},
  title =	{{Massively Parallel Entity Matching with Linear Classification in Low Dimensional Space}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.20},
  URN =		{urn:nbn:de:0030-drops-86057},
  doi =		{10.4230/LIPIcs.ICDT.2018.20},
  annote =	{Keywords: Entity Matching, Linear Programming, Range Counting, Dominance Join, Massively Parallel Computation}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail