LIPIcs, Volume 127

22nd International Conference on Database Theory (ICDT 2019)



Thumbnail PDF

Event

ICDT 2019, March 26-28, 2019, Lisbon, Portugal

Editors

Pablo Barcelo
  • Department of Computer Science, Universidad de Chile, CL
Marco Calautti
  • School of Informatics, University of Edinburgh, UK

Publication Details

  • published at: 2019-03-19
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-101-6
  • DBLP: db/conf/icdt/icdt2019

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
LIPIcs, Volume 127, ICDT'19, Complete Volume

Authors: Pablo Barcelo and Marco Calautti


Abstract
LIPIcs, Volume 127, ICDT'19, Complete Volume

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{barcelo_et_al:LIPIcs.ICDT.2019,
  title =	{{LIPIcs, Volume 127, ICDT'19, Complete Volume}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019},
  URN =		{urn:nbn:de:0030-drops-103630},
  doi =		{10.4230/LIPIcs.ICDT.2019},
  annote =	{Keywords: Computing Methodologies, Knowledge Representation and Reasoning, Theory of computation, Data modeling, Incomplete, inconsistent and uncertain database Information systems, Data management systems, Data streams, Database query processing, Incomplete data, Inconsistent data, Relational database model}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Pablo Barcelo and Marco Calautti


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{barcelo_et_al:LIPIcs.ICDT.2019.0,
  author =	{Barcelo, Pablo and Calautti, Marco},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.0},
  URN =		{urn:nbn:de:0030-drops-103020},
  doi =		{10.4230/LIPIcs.ICDT.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Learning Models over Relational Databases (Invited Talk)

Authors: Dan Olteanu


Abstract
In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. The performance of this database-centric approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from database systems techniques such as factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. Recent results show that this translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack. While these initial results are promising, there is much more awaiting to be discovered.

Cite as

Dan Olteanu. Learning Models over Relational Databases (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{olteanu:LIPIcs.ICDT.2019.1,
  author =	{Olteanu, Dan},
  title =	{{Learning Models over Relational Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.1},
  URN =		{urn:nbn:de:0030-drops-103034},
  doi =		{10.4230/LIPIcs.ICDT.2019.1},
  annote =	{Keywords: In-database analytics, Data complexity, Feature extraction queries, Database dependencies, Model reparameterization}
}
Document
Invited Talk
The Power of Relational Learning (Invited Talk)

Authors: Lise Getoor


Abstract
We live in a richly interconnected world and, not surprisingly, we generate richly interconnected data. From smart cities to social media to financial networks to biological networks, data is relational. While database theory is built on strong relational foundations, the same is not true for machine learning. The majority of machine learning methods flatten data into a single table before performing any processing. Further, database theory is also built on a bedrock of declarative representations. The same is not true for machine learning, in particular deep learning, which results in black-box, uninterpretable and unexplainable models. In this talk, I will introduce the field of statistical relational learning, an alternative machine learning approach based on declarative relational representations paired with probabilistic models. I’ll describe our work on probabilistic soft logic, a probabilistic programming language that is ideally suited to richly connected, noisy data. Our recent results show that by building on state-of-the-art optimization methods in a distributed implementation, we can solve very large relational learning problems orders of magnitude faster than existing approaches.

Cite as

Lise Getoor. The Power of Relational Learning (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{getoor:LIPIcs.ICDT.2019.2,
  author =	{Getoor, Lise},
  title =	{{The Power of Relational Learning}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.2},
  URN =		{urn:nbn:de:0030-drops-103048},
  doi =		{10.4230/LIPIcs.ICDT.2019.2},
  annote =	{Keywords: Machine learning, Probabilistic soft logic, Relational model}
}
Document
Invited Talk
The Power of the Terminating Chase (Invited Talk)

Authors: Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph


Abstract
The chase has become a staple of modern database theory with applications in data integration, query optimisation, data exchange, ontology-based query answering, and many other areas. Most application scenarios and implementations require the chase to terminate and produce a finite universal model, and a large arsenal of sufficient termination criteria is available to guarantee this (generally undecidable) condition. In this invited tutorial, we therefore ask about the expressive power of logical theories for which the chase terminates. Specifically, which database properties can be recognised by such theories, i.e., which Boolean queries can they realise? For the skolem (semi-oblivious) chase, and almost any known termination criterion, this expressivity is just that of plain Datalog. Surprisingly, this limitation of most prior research does not apply to the chase in general. Indeed, we show that standard - chase terminating theories can realise queries with data complexities ranging from PTime to non-elementary that are out of reach for the terminating skolem chase. A "Datalog-first" standard chase that prioritises applications of rules without existential quantifiers makes modelling simpler - and we conjecture: computationally more efficient. This is one of the many open questions raised by our insights, and we conclude with an outlook on the research opportunities in this area.

Cite as

Markus Krötzsch, Maximilian Marx, and Sebastian Rudolph. The Power of the Terminating Chase (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 3:1-3:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{krotzsch_et_al:LIPIcs.ICDT.2019.3,
  author =	{Kr\"{o}tzsch, Markus and Marx, Maximilian and Rudolph, Sebastian},
  title =	{{The Power of the Terminating Chase}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{3:1--3:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.3},
  URN =		{urn:nbn:de:0030-drops-103057},
  doi =		{10.4230/LIPIcs.ICDT.2019.3},
  annote =	{Keywords: Existential rules, Tuple-generating dependencies, all-instances chase termination, expressive power, data complexity}
}
Document
Counting Triangles under Updates in Worst-Case Optimal Time

Authors: Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang


Abstract
We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time maintenance under updates, which is suboptimal. Our approach can also count all triangles in a static database in the worst-case optimal time needed for enumerating them.

Cite as

Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Counting Triangles under Updates in Worst-Case Optimal Time. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2019.4,
  author =	{Kara, Ahmet and Ngo, Hung Q. and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Counting Triangles under Updates in Worst-Case Optimal Time}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.4},
  URN =		{urn:nbn:de:0030-drops-103068},
  doi =		{10.4230/LIPIcs.ICDT.2019.4},
  annote =	{Keywords: incremental view maintenance, amortized analysis, data skew}
}
Document
A Formal Framework for Complex Event Processing

Authors: Alejandro Grez, Cristian Riveros, and Martín Ugarte


Abstract
Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. However, existing CEP languages lack from a clear semantics, making them hard to understand and generalize. Moreover, there are no general techniques for evaluating CEP query languages with clear performance guarantees. In this paper we embark on the task of giving a rigorous and efficient framework to CEP. We propose a formal language for specifying complex events, called CEL, that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by studying the syntactical properties of CEL and propose rewriting optimization techniques for simplifying the evaluation of formulas. Then, we introduce a formal computational model for CEP, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by constant-delay enumeration of the results. Finally, we gather the main results of this work to present an efficient and declarative framework for CEP.

Cite as

Alejandro Grez, Cristian Riveros, and Martín Ugarte. A Formal Framework for Complex Event Processing. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 5:1-5:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{grez_et_al:LIPIcs.ICDT.2019.5,
  author =	{Grez, Alejandro and Riveros, Cristian and Ugarte, Mart{\'\i}n},
  title =	{{A Formal Framework for Complex Event Processing}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{5:1--5:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.5},
  URN =		{urn:nbn:de:0030-drops-103079},
  doi =		{10.4230/LIPIcs.ICDT.2019.5},
  annote =	{Keywords: Complex event processing, streaming evaluation, constant delay enumeration}
}
Document
A Formal Framework for Probabilistic Unclean Databases

Authors: Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas


Abstract
Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples.

Cite as

Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 6:1-6:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{desa_et_al:LIPIcs.ICDT.2019.6,
  author =	{De Sa, Christopher and Ilyas, Ihab F. and Kimelfeld, Benny and R\'{e}, Christopher and Rekatsinas, Theodoros},
  title =	{{A Formal Framework for Probabilistic Unclean Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{6:1--6:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.6},
  URN =		{urn:nbn:de:0030-drops-103083},
  doi =		{10.4230/LIPIcs.ICDT.2019.6},
  annote =	{Keywords: Unclean databases, data cleaning, probabilistic databases, noisy channel}
}
Document
On the Expressive Power of Linear Algebra on Graphs

Authors: Floris Geerts


Abstract
Most graph query languages are rooted in logic. By contrast, in this paper we consider graph query languages rooted in linear algebra. More specifically, we consider MATLANG, a matrix query language recently introduced, in which some basic linear algebra functionality is supported. We investigate the problem of characterising equivalence of graphs, represented by their adjacency matrices, for various fragments of MATLANG. A complete picture is painted of the impact of the linear algebra operations in MATLANG on their ability to distinguish graphs.

Cite as

Floris Geerts. On the Expressive Power of Linear Algebra on Graphs. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 7:1-7:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{geerts:LIPIcs.ICDT.2019.7,
  author =	{Geerts, Floris},
  title =	{{On the Expressive Power of Linear Algebra on Graphs}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{7:1--7:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.7},
  URN =		{urn:nbn:de:0030-drops-103093},
  doi =		{10.4230/LIPIcs.ICDT.2019.7},
  annote =	{Keywords: matrix query languages, graph queries, graph theory, logics}
}
Document
Fragments of Bag Relational Algebra: Expressiveness and Certain Answers

Authors: Marco Console, Paolo Guagliardo, and Leonid Libkin


Abstract
While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra.

Cite as

Marco Console, Paolo Guagliardo, and Leonid Libkin. Fragments of Bag Relational Algebra: Expressiveness and Certain Answers. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{console_et_al:LIPIcs.ICDT.2019.8,
  author =	{Console, Marco and Guagliardo, Paolo and Libkin, Leonid},
  title =	{{Fragments of Bag Relational Algebra: Expressiveness and Certain Answers}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.8},
  URN =		{urn:nbn:de:0030-drops-103106},
  doi =		{10.4230/LIPIcs.ICDT.2019.8},
  annote =	{Keywords: bag semantics, relational algebra, expressivity, certain answers, complexity}
}
Document
Categorical Range Reporting with Frequencies

Authors: Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan


Abstract
In this paper, we consider a variant of the color range reporting problem called color reporting with frequencies. Our goal is to pre-process a set of colored points into a data structure, so that given a query range Q, we can report all colors that appear in Q, along with their respective frequencies. In other words, for each reported color, we also output the number of times it occurs in Q. We describe an external-memory data structure that uses O(N(1+log^2D/log N)) words and answers one-dimensional queries in O(1 +K/B) I/Os, where N is the total number of points in the data structure, D is the total number of colors in the data structure, K is the number of reported colors, and B is the block size. Next we turn to an approximate version of this problem: report all colors sigma that appear in the query range; for every reported color, we provide a constant-factor approximation on its frequency. We consider color reporting with approximate frequencies in two dimensions. Our data structure uses O(N) space and answers two-dimensional queries in O(log_B N +log^*B + K/B) I/Os in the special case when the query range is bounded on two sides. As a corollary, we can also answer one-dimensional approximate queries within the same time and space bounds.

Cite as

Arnab Ganguly, J. Ian Munro, Yakov Nekrich, Rahul Shah, and Sharma V. Thankachan. Categorical Range Reporting with Frequencies. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.ICDT.2019.9,
  author =	{Ganguly, Arnab and Munro, J. Ian and Nekrich, Yakov and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Categorical Range Reporting with Frequencies}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.9},
  URN =		{urn:nbn:de:0030-drops-103115},
  doi =		{10.4230/LIPIcs.ICDT.2019.9},
  annote =	{Keywords: Data Structures, Range Reporting, Range Counting, Categorical Range Reporting, Orthogonal Range Query}
}
Document
Approximating Distance Measures for the Skyline

Authors: Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk


Abstract
In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach.

Cite as

Nirman Kumar, Benjamin Raichel, Stavros Sintos, and Gregory Van Buskirk. Approximating Distance Measures for the Skyline. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kumar_et_al:LIPIcs.ICDT.2019.10,
  author =	{Kumar, Nirman and Raichel, Benjamin and Sintos, Stavros and Van Buskirk, Gregory},
  title =	{{Approximating Distance Measures for the Skyline}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.10},
  URN =		{urn:nbn:de:0030-drops-103125},
  doi =		{10.4230/LIPIcs.ICDT.2019.10},
  annote =	{Keywords: Skyline, Pareto optimal, Approximation, Hardness, Multi-criteria decision making}
}
Document
Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees

Authors: Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou


Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin’s well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified against the similarity threshold. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.

Cite as

Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, and Yannis Papakonstantinou. Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICDT.2019.11,
  author =	{Li, Yuliang and Wang, Jianguo and Pullman, Benjamin and Bandeira, Nuno and Papakonstantinou, Yannis},
  title =	{{Index-Based, High-Dimensional, Cosine Threshold Querying with Optimality Guarantees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.11},
  URN =		{urn:nbn:de:0030-drops-103135},
  doi =		{10.4230/LIPIcs.ICDT.2019.11},
  annote =	{Keywords: Vector databases, Similarity search, Cosine, Threshold Algorithm}
}
Document
An Experimental Study of the Treewidth of Real-World Graph Data

Authors: Silviu Maniu, Pierre Senellart, and Suraj Jog


Abstract
Treewidth is a parameter that measures how tree-like a relational instance is, and whether it can reasonably be decomposed into a tree. Many computation tasks are known to be tractable on databases of small treewidth, but computing the treewidth of a given instance is intractable. This article is the first large-scale experimental study of treewidth and tree decompositions of real-world database instances (25 datasets from 8 different domains, with sizes ranging from a few thousand to a few million vertices). The goal is to determine which data, if any, can benefit of the wealth of algorithms for databases of small treewidth. For each dataset, we obtain upper and lower bound estimations of their treewidth, and study the properties of their tree decompositions. We show in particular that, even when treewidth is high, using partial tree decompositions can result in data structures that can assist algorithms.

Cite as

Silviu Maniu, Pierre Senellart, and Suraj Jog. An Experimental Study of the Treewidth of Real-World Graph Data. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{maniu_et_al:LIPIcs.ICDT.2019.12,
  author =	{Maniu, Silviu and Senellart, Pierre and Jog, Suraj},
  title =	{{An Experimental Study of the Treewidth of Real-World Graph Data}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.12},
  URN =		{urn:nbn:de:0030-drops-103147},
  doi =		{10.4230/LIPIcs.ICDT.2019.12},
  annote =	{Keywords: Treewidth, Graph decompositions, Experiments, Query processing}
}
Document
Recursive Programs for Document Spanners

Authors: Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld


Abstract
A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well-studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are regular expressions with capture variables. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (which extract relations that constitute the extensional database). This paper explores the expressive power of recursive Datalog over regex formulas. We show that such programs can express precisely the document spanners computable in polynomial time. We compare this expressiveness to known formalisms such as the closure of regex formulas under the relational algebra and string equality. Finally, we extend our study to a recently proposed framework that generalizes both the relational model and the document spanners.

Cite as

Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive Programs for Document Spanners. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{peterfreund_et_al:LIPIcs.ICDT.2019.13,
  author =	{Peterfreund, Liat and Cate, Balder ten and Fagin, Ronald and Kimelfeld, Benny},
  title =	{{Recursive Programs for Document Spanners}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.13},
  URN =		{urn:nbn:de:0030-drops-103155},
  doi =		{10.4230/LIPIcs.ICDT.2019.13},
  annote =	{Keywords: Information Extraction, Document Spanners, Polynomial Time, Recursion, Regular Expressions, Datalog}
}
Document
Parallel-Correctness and Parallel-Boundedness for Datalog Programs

Authors: Frank Neven, Thomas Schwentick, Christopher Spinrath, and Brecht Vandevoort


Abstract
Recently, Ketsman et al. started the investigation of the parallel evaluation of recursive queries in the Massively Parallel Communication (MPC) model. Among other things, it was shown that parallel-correctness and parallel-boundedness for general Datalog programs is undecidable, by a reduction from the undecidable containment problem for Datalog. Furthermore, economic policies were introduced as a means to specify data distribution in a recursive setting. In this paper, we extend the latter framework to account for more general distributed evaluation strategies in terms of communication policies. We then show that the undecidability of parallel-correctness runs deeper: it already holds for fragments of Datalog, e.g., monadic and frontier-guarded Datalog, with a decidable containment problem, under relatively simple evaluation strategies. These simple evaluation strategies are defined w.r.t. data-moving distribution constraints. We then investigate restrictions of economic policies that yield decidability. In particular, we show that parallel-correctness is 2EXPTIME-complete for monadic and frontier-guarded Datalog under hash-based economic policies. Next, we consider restrictions of data-moving constraints and show that parallel-correctness and parallel-boundedness are 2EXPTIME-complete for frontier-guarded Datalog. Interestingly, distributed evaluation no longer preserves the usual containment relationships between fragments of Datalog. Indeed, not every monadic Datalog program is equivalent to a frontier-guarded one in the distributed setting. We illustrate the latter by considering two alternative settings where in one of these parallel-correctness is decidable for frontier-guarded Datalog but undecidable for monadic Datalog.

Cite as

Frank Neven, Thomas Schwentick, Christopher Spinrath, and Brecht Vandevoort. Parallel-Correctness and Parallel-Boundedness for Datalog Programs. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{neven_et_al:LIPIcs.ICDT.2019.14,
  author =	{Neven, Frank and Schwentick, Thomas and Spinrath, Christopher and Vandevoort, Brecht},
  title =	{{Parallel-Correctness and Parallel-Boundedness for Datalog Programs}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.14},
  URN =		{urn:nbn:de:0030-drops-103165},
  doi =		{10.4230/LIPIcs.ICDT.2019.14},
  annote =	{Keywords: Datalog, distributed databases, distributed evaluation, decision problems, complexity}
}
Document
The First Order Truth Behind Undecidability of Regular Path Queries Determinacy

Authors: Grzegorz Głuch, Jerzy Marcinkowski, and Piotr Ostropolski-Nalewaja


Abstract
In our paper [Głuch, Marcinkowski, Ostropolski-Nalewaja, LICS ACM, 2018] we have solved an old problem stated in [Calvanese, De Giacomo, Lenzerini, Vardi, SPDS ACM, 2000] showing that query determinacy is undecidable for Regular Path Queries. Here a strong generalisation of this result is shown, and - we think - a very unexpected one. We prove that no regularity is needed: determinacy remains undecidable even for finite unions of conjunctive path queries.

Cite as

Grzegorz Głuch, Jerzy Marcinkowski, and Piotr Ostropolski-Nalewaja. The First Order Truth Behind Undecidability of Regular Path Queries Determinacy. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 15:1-15:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gluch_et_al:LIPIcs.ICDT.2019.15,
  author =	{G{\l}uch, Grzegorz and Marcinkowski, Jerzy and Ostropolski-Nalewaja, Piotr},
  title =	{{The First Order Truth Behind Undecidability of Regular Path Queries Determinacy}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{15:1--15:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.15},
  URN =		{urn:nbn:de:0030-drops-103175},
  doi =		{10.4230/LIPIcs.ICDT.2019.15},
  annote =	{Keywords: database theory, query, view, determinacy, recursive path queries}
}
Document
Datalog: Bag Semantics via Set Semantics

Authors: Leopoldo Bertossi, Georg Gottlob, and Reinhard Pichler


Abstract
Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called warded Datalog^+/-, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. This use of Datalog^+/- is extended to give a set semantics to duplicates in Datalog^+/- itself. We investigate the properties of the resulting Datalog^+/- programs, the problem of deciding multiplicities, and expressibility of some bag operations. Moreover, the proposed translation has the potential for interesting applications such as to Multiset Relational Algebra and the semantic web query language SPARQL with bag semantics.

Cite as

Leopoldo Bertossi, Georg Gottlob, and Reinhard Pichler. Datalog: Bag Semantics via Set Semantics. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 16:1-16:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bertossi_et_al:LIPIcs.ICDT.2019.16,
  author =	{Bertossi, Leopoldo and Gottlob, Georg and Pichler, Reinhard},
  title =	{{Datalog: Bag Semantics via Set Semantics}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{16:1--16:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.16},
  URN =		{urn:nbn:de:0030-drops-103188},
  doi =		{10.4230/LIPIcs.ICDT.2019.16},
  annote =	{Keywords: Datalog, duplicates, multisets, query answering, chase, Datalog+/-}
}
Document
Oblivious Chase Termination: The Sticky Case

Authors: Marco Calautti and Andreas Pieris


Abstract
The chase procedure is one of the most fundamental algorithmic tools in database theory. A key algorithmic task is uniform chase termination, i.e., given a set of tuple-generating dependencies (tgds), is it the case that the chase under this set of tgds terminates, for every input database? In view of the fact that this problem is undecidable, no matter which version of the chase we consider, it is natural to ask whether well-behaved classes of tgds, introduced in different contexts such as ontological reasoning, make our problem decidable. In this work, we consider a prominent decidability paradigm for tgds, called stickiness. We show that for sticky sets of tgds, uniform chase termination is decidable if we focus on the (semi-)oblivious chase, and we pinpoint its exact complexity: PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity. These complexity results are obtained via graph-based syntactic characterizations of chase termination that are of independent interest.

Cite as

Marco Calautti and Andreas Pieris. Oblivious Chase Termination: The Sticky Case. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{calautti_et_al:LIPIcs.ICDT.2019.17,
  author =	{Calautti, Marco and Pieris, Andreas},
  title =	{{Oblivious Chase Termination: The Sticky Case}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.17},
  URN =		{urn:nbn:de:0030-drops-103197},
  doi =		{10.4230/LIPIcs.ICDT.2019.17},
  annote =	{Keywords: Chase procedure, tuple-generating dependencies, stickiness, termination, computational complexity}
}
Document
A Single Approach to Decide Chase Termination on Linear Existential Rules

Authors: Michel Leclère, Marie-Laure Mugnier, Michaël Thomazo, and Federico Ulliana


Abstract
Existential rules, long known as tuple-generating dependencies in database theory, have been intensively studied in the last decade as a powerful formalism to represent ontological knowledge in the context of ontology-based query answering. A knowledge base is then composed of an instance that contains incomplete data and a set of existential rules, and answers to queries are logically entailed from the knowledge base. This brought again to light the fundamental chase tool, and its different variants that have been proposed in the literature. It is well-known that the problem of determining, given a chase variant and a set of existential rules, whether the chase will halt on any instance, is undecidable. Hence, a crucial issue is whether it becomes decidable for known subclasses of existential rules. In this work, we consider linear existential rules with atomic head, a simple yet important subclass of existential rules that generalizes inclusion dependencies. We show the decidability of the all-instance chase termination problem on these rules for three main chase variants, namely semi-oblivious, restricted and core chase. To obtain these results, we introduce a novel approach based on so-called derivation trees and a single notion of forbidden pattern. Besides the theoretical interest of a unified approach and new proofs for the semi-oblivious and core chase variants, we provide the first positive decidability results concerning the termination of the restricted chase, proving that chase termination on linear existential rules with atomic head is decidable for both versions of the problem: Does every chase sequence terminate? Does some chase sequence terminate?

Cite as

Michel Leclère, Marie-Laure Mugnier, Michaël Thomazo, and Federico Ulliana. A Single Approach to Decide Chase Termination on Linear Existential Rules. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{leclere_et_al:LIPIcs.ICDT.2019.18,
  author =	{Lecl\`{e}re, Michel and Mugnier, Marie-Laure and Thomazo, Micha\"{e}l and Ulliana, Federico},
  title =	{{A Single Approach to Decide Chase Termination on Linear Existential Rules}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.18},
  URN =		{urn:nbn:de:0030-drops-103200},
  doi =		{10.4230/LIPIcs.ICDT.2019.18},
  annote =	{Keywords: Chase, Tuple Generating Dependencies, Existential rules, Decidability}
}
Document
Additive First-Order Queries

Authors: Gerald Berger, Martin Otto, Andreas Pieris, Dimitri Surinx, and Jan Van den Bussche


Abstract
A database query q is called additive if q(A U B) = q(A) U q(B) for domain-disjoint input databases A and B. Additivity allows the computation of the query result to be parallelised over the connected components of the input database. We define the "connected formulas" as a syntactic fragment of first-order logic, and show that a first-order query is additive if and only if it expressible by a connected formula. This characterisation specializes to the guarded fragment of first-order logic. We also show that additivity is decidable for formulas of the guarded fragment, establish the computational complexity, and do the same for positive-existential formulas. Our results hold when restricting attention to finite structures, as is common in database theory, but also hold in the unrestricted setting.

Cite as

Gerald Berger, Martin Otto, Andreas Pieris, Dimitri Surinx, and Jan Van den Bussche. Additive First-Order Queries. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 19:1-19:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{berger_et_al:LIPIcs.ICDT.2019.19,
  author =	{Berger, Gerald and Otto, Martin and Pieris, Andreas and Surinx, Dimitri and Van den Bussche, Jan},
  title =	{{Additive First-Order Queries}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{19:1--19:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.19},
  URN =		{urn:nbn:de:0030-drops-103217},
  doi =		{10.4230/LIPIcs.ICDT.2019.19},
  annote =	{Keywords: Expressive power}
}
Document
Characterizing Tractability of Simple Well-Designed Pattern Trees with Projection

Authors: Stefan Mengel and Sebastian Skritek


Abstract
We study the complexity of evaluating well-designed pattern trees, a query language extending conjunctive queries with the possibility to define parts of the query to be optional. This possibility of optional parts is important for obtaining meaningful results over incomplete data sources as it is common in semantic web settings. Recently, a structural characterization of the classes of well-designed pattern trees that can be evaluated in polynomial time was shown. However, projection - a central feature of many query languages - was not considered in this study. We work towards closing this gap by giving a characterization of all tractable classes of simple well-designed pattern trees with projection (under some common complexity theoretic assumptions). Since well-designed pattern trees correspond to the fragment of well-designed {AND, OPTIONAL}-SPARQL queries this gives a complete description of the tractable classes of queries with projections in this fragment that can be characterized by the underlying graph structures of the queries.

Cite as

Stefan Mengel and Sebastian Skritek. Characterizing Tractability of Simple Well-Designed Pattern Trees with Projection. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 20:1-20:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{mengel_et_al:LIPIcs.ICDT.2019.20,
  author =	{Mengel, Stefan and Skritek, Sebastian},
  title =	{{Characterizing Tractability of Simple Well-Designed Pattern Trees with Projection}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{20:1--20:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.20},
  URN =		{urn:nbn:de:0030-drops-103220},
  doi =		{10.4230/LIPIcs.ICDT.2019.20},
  annote =	{Keywords: SPARQL, well-designed pattern trees, query evaluation, FPT, characterizing tractable classes}
}
Document
Boolean Tensor Decomposition for Conjunctive Queries with Negation

Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu


Abstract
We propose an approach for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the InsideOut and PANDA algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width respectively. Its query complexity depends on the structure of the conjunction of negated relations; in general it is exponential in the number of join variables occurring in negated relations yet it becomes polynomial for several classes of queries. This approach relies on several contributions. We show how to rewrite queries with negation on bounded-degree relations into equivalent conjunctive queries with not-all-equal (NAE) predicates, which are a multi-dimensional analog of disequality (!=). We then generalize the known color-coding technique to conjunctions of NAE predicates and explain it via a Boolean tensor decomposition of conjunctions of NAE predicates. This decomposition can be achieved via a probabilistic construction that can be derandomized efficiently.

Cite as

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 21:1-21:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.ICDT.2019.21,
  author =	{Abo Khamis, Mahmoud and Ngo, Hung Q. and Olteanu, Dan and Suciu, Dan},
  title =	{{Boolean Tensor Decomposition for Conjunctive Queries with Negation}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{21:1--21:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.21},
  URN =		{urn:nbn:de:0030-drops-103236},
  doi =		{10.4230/LIPIcs.ICDT.2019.21},
  annote =	{Keywords: color-coding, combined complexity, negation, query evaluation}
}
Document
Constant-Delay Enumeration for Nondeterministic Document Spanners

Authors: Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth


Abstract
We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, in particular for the restricted case of so-called extended VAs.

Cite as

Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. Constant-Delay Enumeration for Nondeterministic Document Spanners. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.ICDT.2019.22,
  author =	{Amarilli, Antoine and Bourhis, Pierre and Mengel, Stefan and Niewerth, Matthias},
  title =	{{Constant-Delay Enumeration for Nondeterministic Document Spanners}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{22:1--22:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.22},
  URN =		{urn:nbn:de:0030-drops-103246},
  doi =		{10.4230/LIPIcs.ICDT.2019.22},
  annote =	{Keywords: enumeration, spanners, automata}
}
Document
Consistent Query Answering for Primary Keys in Logspace

Authors: Paraschos Koutris and Jef Wijsen


Abstract
We study the complexity of consistent query answering on databases that may violate primary key constraints. A repair of such a database is any consistent database that can be obtained by deleting a minimal set of tuples. For every Boolean query q, CERTAINTY(q) is the problem that takes a database as input and asks whether q evaluates to true on every repair. In [Koutris and Wijsen, ACM TODS, 2017], the authors show that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either in P or coNP-complete, and it is decidable which of the two cases applies. In this paper, we sharpen this result by showing that for every self-join-free Boolean conjunctive query q, the problem CERTAINTY(q) is either expressible in symmetric stratified Datalog (with some aggregation operator) or coNP-complete. Since symmetric stratified Datalog is in L, we thus obtain a complexity-theoretic dichotomy between L and coNP-complete. Another new finding of practical importance is that CERTAINTY(q) is on the logspace side of the dichotomy for queries q where all join conditions express foreign-to-primary key matches, which is undoubtedly the most common type of join condition.

Cite as

Paraschos Koutris and Jef Wijsen. Consistent Query Answering for Primary Keys in Logspace. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{koutris_et_al:LIPIcs.ICDT.2019.23,
  author =	{Koutris, Paraschos and Wijsen, Jef},
  title =	{{Consistent Query Answering for Primary Keys in Logspace}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.23},
  URN =		{urn:nbn:de:0030-drops-103252},
  doi =		{10.4230/LIPIcs.ICDT.2019.23},
  annote =	{Keywords: conjunctive queries, consistent query answering, Datalog, primary keys}
}
Document
Learning Definable Hypotheses on Trees

Authors: Emilie Grienenberger and Martin Ritzert


Abstract
We study the problem of learning properties of nodes in tree structures. Those properties are specified by logical formulas, such as formulas from first-order or monadic second-order logic. We think of the tree as a database encoding a large dataset and therefore aim for learning algorithms which depend at most sublinearly on the size of the tree. We present a learning algorithm for quantifier-free formulas where the running time only depends polynomially on the number of training examples, but not on the size of the background structure. By a previous result on strings we know that for general first-order or monadic second-order (MSO) formulas a sublinear running time cannot be achieved. However, we show that by building an index on the tree in a linear time preprocessing phase, we can achieve a learning algorithm for MSO formulas with a logarithmic learning phase.

Cite as

Emilie Grienenberger and Martin Ritzert. Learning Definable Hypotheses on Trees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{grienenberger_et_al:LIPIcs.ICDT.2019.24,
  author =	{Grienenberger, Emilie and Ritzert, Martin},
  title =	{{Learning Definable Hypotheses on Trees}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{24:1--24:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.24},
  URN =		{urn:nbn:de:0030-drops-103261},
  doi =		{10.4230/LIPIcs.ICDT.2019.24},
  annote =	{Keywords: monadic second-order logic, trees, query learning}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail