Dagstuhl Publishing
http://www.dagstuhl.de
RSS feed announcing the latest publications published by Schloss Dagstuhlen-en2010-06-082018-06-06http://www.dagstuhl.de/dagpub-rss.phpFront Matter, Table of Contents, Preface, Conference Organization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9463
Front Matter, Table of Contents, Preface, Conference OrganizationWed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9463Algorithms for Inverse Optimization Problems
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9464
We study inverse optimization problems, wherein the goal is to map given solutions to an underlying optimization problem to a cost vector for which the given solutions are the (unique) optimal solutions. Inverse optimization problems find diverse applications and have been widely studied. A prominent problem in this field is the inverse shortest path (ISP) problem [D. Burton and Ph.L. Toint, 1992; W. Ben-Ameur and E. Gourdin, 2004; A. Bley, 2007], which finds applications in shortest-path routing protocols used in telecommunications. Here we seek a cost vector that is positive, integral, induces a set of given paths as the unique shortest paths, and has minimum l_infty norm. Despite being extensively studied, very few algorithmic results are known for inverse optimization problems involving integrality constraints on the desired cost vector whose norm has to be minimized.Motivated by ISP, we initiate a systematic study of such integral inverse optimization problems from the perspective of designing polynomial time approximation algorithms. For ISP, our main result is an additive 1-approximation algorithm for multicommodity ISP with node-disjoint commodities, which we show is tight assuming P!=NP. We then consider the integral-cost inverse versions of various other fundamental combinatorial optimization problems, including min-cost flow, max/min-cost bipartite matching, and max/min-cost basis in a matroid, and obtain tight or nearly-tight approximation guarantees for these. Our guarantees for the first two problems are based on results for a broad generalization, namely integral inverse polyhedral optimization, for which we also give approximation guarantees. Our techniques also give similar results for variants, including l_p-norm minimization of the integral cost vector, and distance-minimization from an initial cost vector.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9464Two-Dimensional Maximal Repetitions
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9465
Maximal repetitions or runs in strings have a wide array of applications and thus have been extensively studied. In this paper, we extend this notion to 2-dimensions, precisely defining a maximal 2D repetition. We provide initial bounds on the number of maximal 2D repetitions that can occur in a matrix. The main contribution of this paper is the presentation of the first algorithm for locating all maximal 2D repetitions in a matrix. The algorithm is efficient and straightforward, with runtime O(n^2 log n log log n+ rho log n), where n^2 is the size of the input, and rho is the number of 2D repetitions in the output.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9465Approximate Convex Intersection Detection with Applications to Width and Minkowski Sums
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9466
Approximation problems involving a single convex body in R^d have received a great deal of attention in the computational geometry community. In contrast, works involving multiple convex bodies are generally limited to dimensions d <= 3 and/or do not consider approximation. In this paper, we consider approximations to two natural problems involving multiple convex bodies: detecting whether two polytopes intersect and computing their Minkowski sum. Given an approximation parameter epsilon > 0, we show how to independently preprocess two polytopes A,B subset R^d into data structures of size O(1/epsilon^{(d-1)/2}) such that we can answer in polylogarithmic time whether A and B intersect approximately. More generally, we can answer this for the images of A and B under affine transformations. Next, we show how to epsilon-approximate the Minkowski sum of two given polytopes defined as the intersection of n halfspaces in O(n log(1/epsilon) + 1/epsilon^{(d-1)/2 + alpha}) time, for any constant alpha > 0. Finally, we present a surprising impact of these results to a well studied problem that considers a single convex body. We show how to epsilon-approximate the width of a set of n points in O(n log(1/epsilon) + 1/epsilon^{(d-1)/2 + alpha}) time, for any constant alpha > 0, a major improvement over the previous bound of roughly O(n + 1/epsilon^{d-1}) time.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9466On the Worst-Case Complexity of TimSort
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9467
TimSort is an intriguing sorting algorithm designed in 2002 for Python, whose worst-case complexity was announced, but not proved until our recent preprint. In fact, there are two slightly different versions of TimSort that are currently implemented in Python and in Java respectively. We propose a pedagogical and insightful proof that the Python version runs in O(n log n). The approach we use in the analysis also applies to the Java version, although not without very involved technical details. As a byproduct of our study, we uncover a bug in the Java implementation that can cause the sorting method to fail during the execution. We also give a proof that Python's TimSort running time is in O(n + n log rho), where rho is the number of runs (i.e. maximal monotonic sequences), which is quite a natural parameter here and part of the explanation for the good behavior of TimSort on partially sorted inputs.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9467A New and Improved Algorithm for Online Bin Packing
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9468
We revisit the classic online bin packing problem studied in the half-century. In this problem, items of positive sizes no larger than 1 are presented one by one to be packed into subsets called bins of total sizes no larger than 1, such that every item is assigned to a bin before the next item is presented. We use online partitioning of items into classes based on sizes, as in previous work, but we also apply a new method where items of one class can be packed into more than two types of bins, where a bin type is defined according to the number of such items grouped together. Additionally, we allow the smallest class of items to be packed in multiple kinds of bins, and not only into their own bins. We combine this with the approach of packing of sufficiently big items according to their exact sizes. Finally, we simplify the analysis of such algorithms, allowing the analysis to be based on the most standard weight functions. This simplified analysis allows us to study the algorithm which we defined based on all these ideas. This leads us to the design and analysis of the first algorithm of asymptotic competitive ratio strictly below 1.58, specifically, we break this barrier by providing an algorithm AH (Advanced Harmonic) whose asymptotic competitive ratio does not exceed 1.57829.Our main contribution is the introduction of the simple analysis based on weight function to analyze the state of the art online algorithms for the classic online bin packing problem. The previously used analytic tool named weight system was too complicated for the community in this area to adjust it for other problems and other algorithmic tools that are needed in order to improve the current best algorithms. We show that the weight system based analysis is not needed for the analysis of the current algorithms for the classic online bin packing problem. The importance of a simple analysis is demonstrated by analyzing several new features together with all existing techniques, and by proving a better competitive ratio than the previously best one.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9468Practical Access to Dynamic Programming on Tree Decompositions
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9469
Parameterized complexity theory has lead to a wide range of algorithmic breakthroughs within the last decades, but the practicability of these methods for real-world problems is still not well understood. We investigate the practicability of one of the fundamental approaches of this field: dynamic programming on tree decompositions. Indisputably, this is a key technique in parameterized algorithms and modern algorithm design. Despite the enormous impact of this approach in theory, it still has very little influence on practical implementations. The reasons for this phenomenon are manifold. One of them is the simple fact that such an implementation requires a long chain of non-trivial tasks (as computing the decomposition, preparing it,...). We provide an easy way to implement such dynamic programs that only requires the definition of the update rules. With this interface, dynamic programs for various problems, such as 3-coloring, can be implemented easily in about 100 lines of structured Java code.The theoretical foundation of the success of dynamic programming on tree decompositions is well understood due to Courcelle's celebrated theorem, which states that every MSO-definable problem can be efficiently solved if a tree decomposition of small width is given. We seek to provide practical access to this theorem as well, by presenting a lightweight model-checker for a small fragment of MSO. This fragment is powerful enough to describe many natural problems, and our model-checker turns out to be very competitive against similar state-of-the-art tools.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9469Average Whenever You Meet: Opportunistic Protocols for Community Detection
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9470
Consider the following asynchronous, opportunistic communication model over a graph G: in each round, one edge is activated uniformly and independently at random and (only) its two endpoints can exchange messages and perform local computations. Under this model, we study the following random process: The first time a vertex is an endpoint of an active edge, it chooses a random number, say +/- 1 with probability 1/2; then, in each round, the two endpoints of the currently active edge update their values to their average.We provide a rigorous analysis of the above process showing that, if G exhibits a two-community structure (for example, two expanders connected by a sparse cut), the values held by the nodes will collectively reflect the underlying community structure over a suitable phase of the above process. Our analysis requires new concentration bounds on the product of certain random matrices that are technically challenging and possibly of independent interest.We then exploit our analysis to design the first opportunistic protocols that approximately recover community structure using only logarithmic (or polylogarithmic, depending on the sparsity of the cut) work per node.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9470Polynomial-Time Approximation Schemes for k-center, k-median, and Capacitated Vehicle Routing in Bounded Highway Dimension
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9471
The concept of bounded highway dimension was developed to capture observed properties of road networks. We show that a graph of bounded highway dimension with a distinguished root vertex can be embedded into a graph of bounded treewidth in such a way that u-to-v distance is preserved up to an additive error of epsilon times the u-to-root plus v-to-root distances. We show that this embedding yields a PTAS for Bounded-Capacity Vehicle Routing in graphs of bounded highway dimension. In this problem, the input specifies a depot and a set of clients, each with a location and demand; the output is a set of depot-to-depot tours, where each client is visited by some tour and each tour covers at most Q units of client demand. Our PTAS can be extended to handle penalties for unvisited clients.We extend this embedding result to handle a set S of root vertices. This result implies a PTAS for Multiple Depot Bounded-Capacity Vehicle Routing: the tours can go from one depot to another. The embedding result also implies that, for fixed k, there is a PTAS for k-Center in graphs of bounded highway dimension. In this problem, the goal is to minimize d so that there exist k vertices (the centers) such that every vertex is within distance d of some center. Similarly, for fixed k, there is a PTAS for k-Median in graphs of bounded highway dimension. In this problem, the goal is to minimize the sum of distances to the k centers.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9471Fine-grained Lower Bounds on Cops and Robbers
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9472
Cops and Robbers is a classic pursuit-evasion game played between a group of g cops and one robber on an undirected N-vertex graph G. We prove that the complexity of deciding the winner in the game under optimal play requires Omega (N^{g-o(1)}) time on instances with O(N log^2 N) edges, conditioned on the Strong Exponential Time Hypothesis. Moreover, the problem of calculating the minimum number of cops needed to win the game is 2^{Omega (sqrt{N})}, conditioned on the weaker Exponential Time Hypothesis. Our conditional lower bound comes very close to a conditional upper bound: if Meyniel's conjecture holds then the cop number can be decided in 2^{O(sqrt{N}log N)} time.In recent years, the Strong Exponential Time Hypothesis has been used to obtain many lower bounds on classic combinatorial problems, such as graph diameter, LCS, EDIT-DISTANCE, and REGEXP matching. To our knowledge, these are the first conditional (S)ETH-hard lower bounds on a strategic game.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9472A Polynomial Kernel for Diamond-Free Editing
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9473
Given a fixed graph H, the H-free editing problem asks whether we can edit at most k edges to make a graph contain no induced copy of H. We obtain a polynomial kernel for this problem when H is a diamond. The incompressibility dichotomy for H being a 3-connected graph and the classical complexity dichotomy suggest that except for H being a complete/empty graph, H-free editing problems admit polynomial kernels only for a few small graphs H. Therefore, we believe that our result is an essential step toward a complete dichotomy on the compressibility of H-free editing. Additionally, we give a cubic-vertex kernel for the diamond-free edge deletion problem, which is far simpler than the previous kernel of the same size for the problem.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9473Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9474
Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved [Mihail and Zegura, 2003]. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC [{Carstens} et al., 2016]. Since trades however are more expensive, we study Curveball's practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we propose the notion of a global trade processing every node in a graph during a single super step, and show that global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES [M. Hamann et al., 2017] nearly one order of magnitude faster.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9474Space-Optimal Quasi-Gray Codes with Logarithmic Read Complexity
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9475
A quasi-Gray code of dimension n and length l over an alphabet Sigma is a sequence of distinct words w_1,w_2,...,w_l from Sigma^n such that any two consecutive words differ in at most c coordinates, for some fixed constant c>0. In this paper we are interested in the read and write complexity of quasi-Gray codes in the bit-probe model, where we measure the number of symbols read and written in order to transform any word w_i into its successor w_{i+1}.We present construction of quasi-Gray codes of dimension n and length 3^n over the ternary alphabet {0,1,2} with worst-case read complexity O(log n) and write complexity 2. This generalizes to arbitrary odd-size alphabets. For the binary alphabet, we present quasi-Gray codes of dimension n and length at least 2^n - 20n with worst-case read complexity 6+log n and write complexity 2. This complements a recent result by Raskin [Raskin '17] who shows that any quasi-Gray code over binary alphabet of length 2^n has read complexity Omega(n).Our results significantly improve on previously known constructions and for the odd-size alphabets we break the Omega(n) worst-case barrier for space-optimal (non-redundant) quasi-Gray codes with constant number of writes. We obtain our results via a novel application of algebraic tools together with the principles of catalytic computation [Buhrman et al. '14, Ben-Or and Cleve '92, Barrington '89, Coppersmith and Grossman '75].Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9475A Framework for In-place Graph Algorithms
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9476
Read-only memory (ROM) model is a classical model of computation to study time-space tradeoffs of algorithms. A classical result on the ROM model is that any algorithm to sort n numbers using O(s) words of extra space requires Omega (n^2/s) comparisons for lg n <= s <= n/lg n and the bound has also been recently matched by an algorithm. However, if we relax the model, we do have sorting algorithms (say Heapsort) that can sort using O(n lg n) comparisons using O(lg n) bits of extra space, even keeping a permutation of the given input sequence at anytime during the algorithm.We address similar relaxations for graph algorithms. We show that a simple natural relaxation of ROM model allows us to implement fundamental graph search methods like BFS and DFS more space efficiently than in ROM. By simply allowing elements in the adjacency list of a vertex to be permuted, we show that, on an undirected or directed connected graph G having n vertices and m edges, the vertices of G can be output in a DFS or BFS order using O(lg n) bits of extra space and O(n^3 lg n) time. Thus we obtain similar bounds for reachability and shortest path distance (both for undirected and directed graphs). With a little more (but still polynomial) time, we can also output vertices in the lex-DFS order. As reachability in directed graphs (even in DAGs) and shortest path distance (even in undirected graphs) are NL-complete, and lex-DFS is P-complete, our results show that our model is more powerful than ROM if L != P.En route, we also introduce and develop algorithms for another relaxation of ROM where the adjacency lists of the vertices are circular lists and we can modify only the heads of the lists. Here we first show a linear time DFS implementation using n + O(lg n) bits of extra space. Improving the extra space exponentially to only O(lg n) bits, we also obtain BFS and DFS albeit with a slightly slower running time. Both the models we propose maintain the graph structure throughout the algorithm, only the order of vertices in the adjacency list changes. In sharp contrast, for BFS and DFS, to the best of our knowledge, there are no algorithms in ROM that use even O(n^{1-epsilon}) bits of extra space; in fact, implementing DFS using cn bits for c<1 has been mentioned as an open problem. Furthermore, DFS (BFS, respectively) algorithms using n+o(n) (o(n), respectively) bits of extra use Reingold's [JACM, 2008] or Barnes et al's reachability algorithm [SICOMP, 1998] and hence have high runtime. Our results can be contrasted with the recent result of Buhrman et al. [STOC, 2014] which gives an algorithm for directed st-reachability on catalytic Turing machines using O(lg n) bits with catalytic space O(n^2 lg n) and time O(n^9).Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9476Self-Assembly of Any Shape with Constant Tile Types using High Temperature
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9477
Inspired by nature and motivated by a lack of top-down tools for precise nanoscale manufacture, self-assembly is a bottom-up process where simple, unorganized components autonomously combine to form larger more complex structures. Such systems hide rich algorithmic properties - notably, Turing universality - and a self-assembly system can be seen as both the object to be manufactured as well as the machine controlling the manufacturing process. Thus, a benchmark problem in self-assembly is the unique assembly of shapes: to design a set of simple agents which, based on aggregation rules and random movement, self-assemble into a particular shape and nothing else. We use a popular model of self-assembly, the 2-handed or hierarchical tile assembly model, and allow the existence of repulsive forces, which is a well-studied variant. The technique utilizes a finely-tuned temperature (the minimum required affinity required for aggregation of separate complexes).We show that calibrating the temperature and the strength of the aggregation between the tiles, one can encode the shape to be assembled without increasing the number of distinct tile types. Precisely, we show one tile set for which the following holds: for any finite connected shape S, there exists a setting of binding strengths between tiles and a temperature under which the system uniquely assembles S at some scale factor. Our tile system only uses one repulsive glue type and the system is growth-only (it produces no unstable assemblies). The best previous unique shape assembly results in tile assembly models use O(K(S)/(log K(S))) distinct tile types, where K(S) is the Kolmogorov (descriptional) complexity of the shape S.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9477A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9478
We present a unified (randomized) polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due to terminals not covered by the subgraph. Under our unified framework, the solution subgraph needs to be Eulerian for PCTSP, while it needs to be a tree for PCSTP. Before our work, even a QPTAS for the problems in doubling metrics is not known.Our unified PTAS is based on the previous dynamic programming frameworks proposed in [Talwar STOC 2004] and [Bartal, Gottlieb, Krauthgamer STOC 2012]. However, since it is unknown which part of the optimal cost is due to edge lengths and which part is due to penalties of uncovered terminals, we need to develop new techniques to apply previous divide-and-conquer strategies and sparse instance decompositions.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9478Near-Optimal Distance Emulator for Planar Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9479
Given a graph G and a set of terminals T, a distance emulator of G is another graph H (not necessarily a subgraph of G) containing T, such that all the pairwise distances in G between vertices of T are preserved in H. An important open question is to find the smallest possible distance emulator.We prove that, given any subset of k terminals in an n-vertex undirected unweighted planar graph, we can construct in O~(n) time a distance emulator of size O~(min(k^2,sqrt{k * n})). This is optimal up to logarithmic factors. The existence of such distance emulator provides a straightforward framework to solve distance-related problems on planar graphs: Replace the input graph with the distance emulator, and apply whatever algorithm available to the resulting emulator. In particular, our result implies that, on any unweighted undirected planar graph, one can compute all-pairs shortest path distances among k terminals in O~(n) time when k=O(n^{1/3}).Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9479Approximation Schemes for Geometric Coverage Problems
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9480
In their seminal work, Mustafa and Ray [Nabil H. Mustafa and Saurabh Ray, 2010] showed that a wide class of geometric set cover (SC) problems admit a PTAS via local search - this is one of the most general approaches known for such problems. Their result applies if a naturally defined "exchange graph" for two feasible solutions is planar and is based on subdividing this graph via a planar separator theorem due to Frederickson [Greg N. Frederickson, 1987]. Obtaining similar results for the related maximum coverage problem (MC) seems non-trivial due to the hard cardinality constraint. In fact, while Badanidiyuru, Kleinberg, and Lee [Ashwinkumar Badanidiyuru et al., 2012] have shown (via a different analysis) that local search yields a PTAS for two-dimensional real halfspaces, they only conjectured that the same holds true for dimension three. Interestingly, at this point it was already known that local search provides a PTAS for the corresponding set cover case and this followed directly from the approach of Mustafa and Ray.In this work we provide a way to address the above-mentioned issue. First, we propose a color-balanced version of the planar separator theorem. The resulting subdivision approximates locally in each part the global distribution of the colors. Second, we show how this roughly balanced subdivision can be employed in a more careful analysis to strictly obey the hard cardinality constraint. More specifically, we obtain a PTAS for any "planarizable" instance of MC and thus essentially for all cases where the corresponding SC instance can be tackled via the approach of Mustafa and Ray. As a corollary, we confirm the conjecture of Badanidiyuru, Kleinberg, and Lee [Ashwinkumar Badanidiyuru et al., 2012] regarding real halfspaces in dimension three. We feel that our ideas could also be helpful in other geometric settings involving a cardinality constraint.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9480Amortized Analysis of Asynchronous Price Dynamics
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9481
We extend a recently developed framework for analyzing asynchronous coordinate descent algorithms to show that an asynchronous version of tatonnement, a fundamental price dynamic widely studied in general equilibrium theory, converges toward a market equilibrium for Fisher markets with CES utilities or Leontief utilities, for which tatonnement is equivalent to coordinate descent.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9481Cycles to the Rescue! Novel Constraints to Compute Maximum Planar Subgraphs Fast
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9482
The NP-hard Maximum Planar Subgraph problem asks for a planar subgraph H of a given graph G such that H has maximum edge cardinality. For more than two decades, the only known non-trivial exact algorithm was based on integer linear programming and Kuratowski's famous planarity criterion. We build upon this approach and present new constraint classes - together with a lifting of the polyhedron - to obtain provably stronger LP-relaxations, and in turn faster algorithms in practice. The new constraints take Euler's polyhedron formula as a starting point and combine it with considering cycles in G. This paper discusses both the theoretical as well as the practical sides of this strengthening.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9482Parameterized Approximation Algorithms for Bidirected Steiner Network Problems
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9483
The Directed Steiner Network (DSN) problem takes as input a directed edge-weighted graph G=(V,E) and a set {D}subseteq V x V of k demand pairs. The aim is to compute the cheapest network N subseteq G for which there is an s -> t path for each (s,t)in {D}. It is known that this problem is notoriously hard as there is no k^{1/4-o(1)}-approximation algorithm under Gap-ETH, even when parameterizing the runtime by k [Dinur & Manurangsi, ITCS 2018]. In light of this, we systematically study several special cases of DSN and determine their parameterized approximability for the parameter k.For the bi-DSN_Planar problem, the aim is to compute a planar optimum solution N subseteq G in a bidirected graph G, i.e. for every edge uv of G the reverse edge vu exists and has the same weight. This problem is a generalization of several well-studied special cases. Our main result is that this problem admits a parameterized approximation scheme (PAS) for k. We also prove that our result is tight in the sense that (a) the runtime of our PAS cannot be significantly improved, and (b) it is unlikely that a PAS exists for any generalization of bi-DSN_Planar, unless FPT=W[1]. Additionally we study several generalizations of bi-DSN_Planar and obtain upper and lower bounds on obtainable runtimes parameterized by k.One important special case of DSN is the Strongly Connected Steiner Subgraph (SCSS) problem, for which the solution network N subseteq G needs to strongly connect a given set of k terminals. It has been observed before that for SCSS a parameterized 2-approximation exists when parameterized by k [Chitnis et al., IPEC 2013]. We show a tight inapproximability result: under Gap-ETH there is no (2-{epsilon})-approximation algorithm parameterized by k (for any epsilon>0). To the best of our knowledge, this is the first example of a W[1]-hard problem admitting a non-trivial parameterized approximation factor which is also known to be tight! Additionally we show that when restricting the input of SCSS to bidirected graphs, the problem remains NP-hard but becomes FPT for k.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9483Online Facility Location with Deletions
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9484
In this paper we study three previously unstudied variants of the online Facility Location problem, considering an intrinsic scenario when the clients and facilities are not only allowed to arrive to the system, but they can also depart at any moment.We begin with the study of a natural fully-dynamic online uncapacitated model where clients can be both added and removed. When a client arrives, then it has to be assigned either to an existing facility or to a new facility opened at the client's location. However, when a client who has been also one of the open facilities is to be removed, then our model has to allow to reconnect all clients that have been connected to that removed facility. In this model, we present an optimal O(log(n_{act}) / log log(n_{act}))-competitive algorithm, where n_{act} is the number of active clients at the end of the input sequence.Next, we turn our attention to the capacitated Facility Location problem. We first note that if no deletions are allowed, then one can achieve an optimal competitive ratio of O(log(n) / log(log n)), where n is the length of the sequence. However, when deletions are allowed, the capacitated version of the problem is significantly more challenging than the uncapacitated one. We show that still, using a more sophisticated algorithmic approach, one can obtain an online O(log N + log c log n)-competitive algorithm for the capacitated Facility Location problem in the fully dynamic model, where N is number of points in the input metric and c is the capacity of any open facility.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9484Improved Routing on the Delaunay Triangulation
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9485
A geometric graph G=(P,E) is a set of points in the plane and edges between pairs of points, where the weight of an edge is equal to the Euclidean distance between its two endpoints. In local routing we find a path through G from a source vertex s to a destination vertex t, using only knowledge of the current vertex, its incident edges, and the locations of s and t. We present an algorithm for local routing on the Delaunay triangulation, and show that it finds a path between a source vertex s and a target vertex t that is not longer than 3.56|st|, improving the previous bound of 5.9|st|.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9485On Geometric Prototype and Applications
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9486
In this paper, we propose to study a new geometric optimization problem called the "geometric prototype" in Euclidean space. Given a set of patterns, where each pattern is represented by a (weighted or unweighted) point set, the geometric prototype can be viewed as the "average pattern" minimizing the total matching cost to them. As a general model, the problem finds many applications in real-world, such as Wasserstein barycenter and ensemble clustering. The dimensionality could be either constant or high, depending on the applications. To our best knowledge, the general geometric prototype problem has yet to be seriously considered by the theory community. To bridge the gap between theory and practice, we first show that a small core-set can be obtained to substantially reduce the data size. Consequently, any existing heuristic or algorithm can run on the core-set to achieve a great improvement on the efficiency. As a new application of core-set, it needs to tackle a couple of challenges particularly in theory. Finally, we test our method on both image and high dimensional clustering datasets; the experimental results remain stable even if we run the algorithms on core-sets much smaller than the original datasets, while the running times are reduced significantly.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9486Improved Bounds for Multipass Pairing Heaps and Path-Balanced Binary Search Trees
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9487
We revisit multipass pairing heaps and path-balanced binary search trees (BSTs), two classical algorithms for data structure maintenance. The pairing heap is a simple and efficient "self-adjusting" heap, introduced in 1986 by Fredman, Sedgewick, Sleator, and Tarjan. In the multipass variant (one of the original pairing heap variants described by Fredman et al.) the minimum item is extracted via repeated pairing rounds in which neighboring siblings are linked.Path-balanced BSTs, proposed by Sleator (cf. Subramanian, 1996), are a natural alternative to Splay trees (Sleator and Tarjan, 1983). In a path-balanced BST, whenever an item is accessed, the search path leading to that item is re-arranged into a balanced tree.Despite their simplicity, both algorithms turned out to be difficult to analyse. Fredman et al. showed that operations in multipass pairing heaps take amortized O(log n * log log n / log log log n) time. For searching in path-balanced BSTs, Balasubramanian and Raman showed in 1995 the same amortized time bound of O(log n * log log n / log log log n), using a different argument.In this paper we show an explicit connection between the two algorithms and improve both bounds to O(log n * 2^{log^* n} * log^* n), respectively O(log n * 2^{log^* n} * (log^* n)^2), where log^* denotes the slowly growing iterated logarithm function. These are the first improvements in more than three, resp. two decades, approaching the information-theoretic lower bound of Omega(log n).Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9487Improved Time and Space Bounds for Dynamic Range Mode
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9488
Given an array A of n elements, we wish to support queries for the most frequent and least frequent element in a subrange [l, r] of A. We also wish to support updates that change a particular element at index i or insert/ delete an element at index i. For the range mode problem, our data structure supports all operations in O(n^{2/3}) deterministic time using only O(n) space. This improves two results by Chan et al. [Timothy M. Chan et al., 2014]: a linear space data structure supporting update and query operations in O~(n^{3/4}) time and an O(n^{4/3}) space data structure supporting update and query operations in O~(n^{2/3}) time. For the range least frequent problem, we address two variations. In the first, we are allowed to answer with an element of A that may not appear in the query range, and in the second, the returned element must be present in the query range. For the first variation, we develop a data structure that supports queries in O~(n^{2/3}) time, updates in O(n^{2/3}) time, and occupies O(n) space. For the second variation, we develop a Monte Carlo data structure that supports queries in O(n^{2/3}) time, updates in O~(n^{2/3}) time, and occupies O~(n) space, but requires that updates are made independently of the results of previous queries. The Monte Carlo data structure is also capable of answering k-frequency queries; that is, the problem of finding an element of given frequency in the specified query range. Previously, no dynamic data structures were known for least frequent element or k-frequency queries.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9488Online Makespan Scheduling with Job Migration on Uniform Machines
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9489
In the classic minimum makespan scheduling problem, we are given an input sequence of n jobs with sizes. A scheduling algorithm has to assign the jobs to m parallel machines. The objective is to minimize the makespan, which is the time it takes until all jobs are processed. In this paper, we consider online scheduling algorithms without preemption. However, we allow the online algorithm to reassign up to k jobs to different machines in the final assignment.For m identical machines, Albers and Hellwig (Algorithmica, 2017) give tight bounds on the competitive ratio in this model. The precise ratio depends on, and increases with, m. It lies between 4/3 and ~~ 1.4659. They show that k = O(m) is sufficient to achieve this bound and no k = o(n) can result in a better bound.We study m uniform machines, i.e., machines with different speeds, and show that this setting is strictly harder. For sufficiently large m, there is a delta = Theta(1) such that, for m machines with only two different machine speeds, no online algorithm can achieve a competitive ratio of less than 1.4659 + delta with k = o(n).We present a new algorithm for the uniform machine setting. Depending on the speeds of the machines, our scheduling algorithm achieves a competitive ratio that lies between 4/3 and ~~ 1.7992 with k = O(m). We also show that k = Omega(m) is necessary to achieve a competitive ratio below 2.Our algorithm is based on a subtle imbalance with respect to the completion times of the machines, complemented by a bicriteria approximation algorithm that minimizes the makespan and maximizes the average completion time for certain sets of machines.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9489Truthful Prompt Scheduling for Minimizing Sum of Completion Times
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9490
We give a prompt online mechanism for minimizing the sum of [weighted] completion times. This is the first prompt online algorithm for the problem. When such jobs are strategic agents, delaying scheduling decisions makes little sense. Moreover, the mechanism has a particularly simple form of an anonymous menu of options.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9490Weighted Model Counting on the GPU by Exploiting Small Treewidth
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9491
We propose a novel solver that efficiently finds almost the exact number of solutions of a Boolean formula (#Sat) and the weighted model count of a weighted Boolean formula (WMC) if the treewidth of the given formula is sufficiently small. The basis of our approach are dynamic programming algorithms on tree decompositions, which we engineered towards efficient parallel execution on the GPU. We provide thorough experiments and compare the runtime of our system with state-of-the-art #Sat and WMC solvers. Our results are encouraging in the sense that also complex reasoning problems can be tackled by parameterized algorithms executed on the GPU if instances have treewidth at most 30, which is the case for more than half of counting and weighted counting benchmark instances.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9491Light Spanners for High Dimensional Norms via Stochastic Decompositions
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9492
Spanners for low dimensional spaces (e.g. Euclidean space of constant dimension, or doubling metrics) are well understood. This lies in contrast to the situation in high dimensional spaces, where except for the work of Har-Peled, Indyk and Sidiropoulos (SODA 2013), who showed that any n-point Euclidean metric has an O(t)-spanner with O~(n^{1+1/t^2}) edges, little is known.In this paper we study several aspects of spanners in high dimensional normed spaces. First, we build spanners for finite subsets of l_p with 1<p <=2. Second, our construction yields a spanner which is both sparse and also light, i.e., its total weight is not much larger than that of the minimum spanning tree. In particular, we show that any n-point subset of l_p for 1<p <=2 has an O(t)-spanner with n^{1+O~(1/t^p)} edges and lightness n^{O~(1/t^p)}.In fact, our results are more general, and they apply to any metric space admitting a certain low diameter stochastic decomposition. It is known that arbitrary metric spaces have an O(t)-spanner with lightness O(n^{1/t}). We exhibit the following tradeoff: metrics with decomposability parameter nu=nu(t) admit an O(t)-spanner with lightness O~(nu^{1/t}). For example, n-point Euclidean metrics have nu <=n^{1/t}, metrics with doubling constant lambda have nu <=lambda, and graphs of genus g have nu <=g. While these families do admit a (1+epsilon)-spanner, its lightness depend exponentially on the dimension (resp. log g). Our construction alleviates this exponential dependency, at the cost of incurring larger stretch.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9492On the Tractability of Optimization Problems on H-Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9493
For a graph H, a graph G is an H-graph if it is an intersection graph of connected subgraphs of some subdivision of H. These graphs naturally generalize several important graph classes like interval graphs or circular-arc graph. This notion was introduced in the early 1990s by Biro, Hujter, and Tuza. Recently, Chaplick et al. initiated the algorithmic study of H-graphs by showing that a number of fundamental optimization problems like Clique, Independent Set, or Dominating Set are solvable in polynomial time on H-graphs. We extend and complement these algorithmic findings in several directions.First we show that for every fixed H, the class of H-graphs is of logarithmically-bounded boolean-width. We also prove that H-graphs are graphs with polynomially many minimal separators. Pipelined with the plethora of known algorithms on graphs of bounded boolean-width and graphs with polynomially many minimal separators, this describes a large class of optimization problems that are solvable in polynomial time on H-graphs.The most fundamental optimization problems among those solvable in polynomial time on H-graphs are Clique, Independent Set, and Dominating Set. We provide a more refined complexity analysis of these problems from the perspective of parameterized complexity. We show that Independent Set and Dominating Set are W[1]-hard being parameterized by the size of H plus the size of the solution. On the other hand, we prove that when H is a tree, Dominating Set is fixed-parameter tractable (FPT) parameterized by the size of H. Besides, we show that Clique admits a polynomial kernel parameterized by H and the solution size.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9493On the Optimality of Pseudo-polynomial Algorithms for Integer Programming
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9494
In the classic Integer Programming (IP) problem, the objective is to decide whether, for a given m x n matrix A and an m-vector b=(b_1,..., b_m), there is a non-negative integer n-vector x such that Ax=b. Solving (IP) is an important step in numerous algorithms and it is important to obtain an understanding of the precise complexity of this problem as a function of natural parameters of the input.The classic pseudo-polynomial time algorithm of Papadimitriou [J. ACM 1981] for instances of (IP) with a constant number of constraints was only recently improved upon by Eisenbrand and Weismantel [SODA 2018] and Jansen and Rohwedder [ArXiv 2018]. We continue this line of work and show that under the Exponential Time Hypothesis (ETH), the algorithm of Jansen and Rohwedder is nearly optimal. We also show that when the matrix A is assumed to be non-negative, a component of Papadimitriou's original algorithm is already nearly optimal under ETH.This motivates us to pick up the line of research initiated by Cunningham and Geelen [IPCO 2007] who studied the complexity of solving (IP) with non-negative matrices in which the number of constraints may be unbounded, but the branch-width of the column-matroid corresponding to the constraint matrix is a constant. We prove a lower bound on the complexity of solving (IP) for such instances and obtain optimal results with respect to a closely related parameter, path-width. Specifically, we prove matching upper and lower bounds for (IP) when the path-width of the corresponding column-matroid is a constant.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9494Symmetry Exploitation for Online Machine Covering with Bounded Migration
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9495
Online models that allow recourse are highly effective in situations where classical models are too pessimistic. One such problem is the online machine covering problem on identical machines. In this setting, jobs arrive one by one and must be assigned to machines with the objective of maximizing the minimum machine load. When a job arrives, we are allowed to reassign some jobs as long as their total size is (at most) proportional to the processing time of the arriving job. The proportionality constant is called the migration factor of the algorithm.By rounding the processing times, which yields useful structural properties for online packing and covering problems, we design first a simple (1.7 + epsilon)-competitive algorithm using a migration factor of O(1/epsilon) which maintains at every arrival a locally optimal solution with respect to the Jump neighborhood. After that, we present as our main contribution a more involved (4/3+epsilon)-competitive algorithm using a migration factor of O~(1/epsilon^3). At every arrival, we run an adaptation of the Largest Processing Time first (LPT) algorithm. Since the new job can cause a complete change of the assignment of smaller jobs in both cases, a low migration factor is achieved by carefully exploiting the highly symmetric structure obtained by the rounding procedure.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9495Edit Distance with Block Operations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9496
We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)).The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9496A QPTAS for Gapless MEC
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9497
We consider the problem Minimum Error Correction (MEC). A MEC instance is an n x m matrix M with entries from {0,1,-}. Feasible solutions are composed of two binary m-bit strings, together with an assignment of each row of M to one of the two strings. The objective is to minimize the number of mismatches (errors) where the row has a value that differs from the assigned solution string. The symbol "-" is a wildcard that matches both 0 and 1. A MEC instance is gapless, if in each row of M all binary entries are consecutive.Gapless-MEC is a relevant problem in computational biology, and it is closely related to segmentation problems that were introduced by {[}Kleinberg-Papadimitriou-Raghavan STOC'98{]} in the context of data mining.Without restrictions, it is known to be UG-hard to compute an O(1)-approximate solution to MEC. For both MEC and Gapless-MEC, the best polynomial time approximation algorithm has a logarithmic performance guarantee. We partially settle the approximation status of Gapless-MEC by providing a quasi-polynomial time approximation scheme (QPTAS). Additionally, for the relevant case where the binary part of a row is not contained in the binary part of another row, we provide a polynomial time approximation scheme (PTAS).Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9497FPT Algorithms for Embedding into Low Complexity Graphic Metrics
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9498
The Metric Embedding problem takes as input two metric spaces (X,D_X) and (Y,D_Y), and a positive integer d. The objective is to determine whether there is an embedding F:X - > Y such that the distortion d_{F} <= d. Such an embedding is called a distortion d embedding. In parameterized complexity, the Metric Embedding problem is known to be W-hard and therefore, not expected to have an FPT algorithm. In this paper, we consider the Gen-Graph Metric Embedding problem, where the two metric spaces are graph metrics. We explore the extent of tractability of the problem in the parameterized complexity setting. We determine whether an unweighted graph metric (G,D_G) can be embedded, or bijectively embedded, into another unweighted graph metric (H,D_H), where the graph H has low structural complexity. For example, H is a cycle, or H has bounded treewidth or bounded connected treewidth. The parameters for the algorithms are chosen from the upper bound d on distortion, bound Delta on the maximum degree of H, treewidth alpha of H, and the connected treewidth alpha_{c} of H.Our general approach to these problems can be summarized as trying to understand the behavior of the shortest paths in G under a low distortion embedding into H, and the structural relation the mapping of these paths has to shortest paths in H.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9498The Stochastic Score Classification Problem
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9499
Consider the following Stochastic Score Classification Problem. A doctor is assessing a patient's risk of developing a certain disease, and can perform n tests on the patient. Each test has a binary outcome, positive or negative. A positive result is an indication of risk, and a patient's score is the total number of positive test results. Test results are accurate. The doctor needs to classify the patient into one of B risk classes, depending on the score (e.g., LOW, MEDIUM, and HIGH risk). Each of these classes corresponds to a contiguous range of scores. Test i has probability p_i of being positive, and it costs c_i to perform. To reduce costs, instead of performing all tests, the doctor will perform them sequentially and stop testing when it is possible to determine the patient's risk category. The problem is to determine the order in which the doctor should perform the tests, so as to minimize expected testing cost. We provide approximation algorithms for adaptive and non-adaptive versions of this problem, and pose a number of open questions.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9499Improved Space-Time Tradeoffs for kSUM
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9500
In the kSUM problem we are given an array of numbers a_1,a_2,...,a_n and we are required to determine if there are k different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional hardness. Several works investigated the interplay between time and space in the context of SUBSET-SUM. Recently, improved time-space tradeoffs were proven for kSUM using both randomized and deterministic algorithms.In this paper we obtain an improvement over the best known results for the time-space tradeoff for kSUM. A major ingredient in achieving these results is a general self-reduction from kSUM to mSUM where m<k, and several useful observations that enable this reduction and its implications. The main results we prove in this paper include the following: (i) The best known Las Vegas solution to kSUM running in approximately O(n^{k-delta sqrt{2k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (ii) The best known deterministic solution to kSUM running in approximately O(n^{k-delta sqrt{k}}) time and using O(n^{delta}) space, for 0 <= delta <= 1. (iii) A space-time tradeoff for solving kSUM using O(n^{delta}) space, for delta>1. (iv) An algorithm for 6SUM running in O(n^4) time using just O(n^{2/3}) space. (v) A solution to 3SUM on random input using O(n^2) time and O(n^{1/3}) space, under the assumption of a random read-only access to random bits.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9500Dynamic Trees with Almost-Optimal Access Cost
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9501
An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost optimal search trees. There exists a long literature of constructing almost optimal search trees dynamically, i.e., when the access pattern is not known in advance. All of these trees, e.g., splay trees and treaps, provide a multiplicative approximation to the optimal search cost.In this paper we show how to maintain an almost optimal weighted binary search tree under access operations and insertions of new elements where the approximation is an additive constant. More technically, we maintain a tree in which the depth of the leaf holding an element e_i does not exceed min(log(W/w_i),log n)+O(1) where w_i is the number of times e_i was accessed and W is the total length of the access sequence.Our techniques can also be used to encode a sequence of m symbols with a dynamic alphabetic code in O(m) time so that the encoding length is bounded by m(H+O(1)), where H is the entropy of the sequence. This is the first efficient algorithm for adaptive alphabetic coding that runs in constant time per symbol.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9501A Tree Structure For Dynamic Facility Location
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9502
We study the metric facility location problem with client insertions and deletions. This setting differs from the classic dynamic facility location problem, where the set of clients remains the same, but the metric space can change over time. We show a deterministic algorithm that maintains a constant factor approximation to the optimal solution in worst-case time O~(2^{O(kappa^2)}) per client insertion or deletion in metric spaces while answering queries about the cost in O(1) time, where kappa denotes the doubling dimension of the metric. For metric spaces with bounded doubling dimension, the update time is polylogarithmic in the parameters of the problem.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9502Dynamic Effective Resistances and Approximate Schur Complement on Separable Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9503
We consider the problem of dynamically maintaining (approximate) all-pairs effective resistances in separable graphs, which are those that admit an n^{c}-separator theorem for some c<1. We give a fully dynamic algorithm that maintains (1+epsilon)-approximations of the all-pairs effective resistances of an n-vertex graph G undergoing edge insertions and deletions with O~(sqrt{n}/epsilon^2) worst-case update time and O~(sqrt{n}/epsilon^2) worst-case query time, if G is guaranteed to be sqrt{n}-separable (i.e., it is taken from a class satisfying a sqrt{n}-separator theorem) and its separator can be computed in O~(n) time. Our algorithm is built upon a dynamic algorithm for maintaining approximate Schur complement that approximately preserves pairwise effective resistances among a set of terminals for separable graphs, which might be of independent interest.We complement our result by proving that for any two fixed vertices s and t, no incremental or decremental algorithm can maintain the s-t effective resistance for sqrt{n}-separable graphs with worst-case update time O(n^{1/2-delta}) and query time O(n^{1-delta}) for any delta>0, unless the Online Matrix Vector Multiplication (OMv) conjecture is false.We further show that for general graphs, no incremental or decremental algorithm can maintain the s-t effective resistance problem with worst-case update time O(n^{1-delta}) and query-time O(n^{2-delta}) for any delta >0, unless the OMv conjecture is false.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9503Buffered Count-Min Sketch on SSD: Theory and Experiments
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9504
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation.In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD.Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M > 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure.Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) >= n epsilon (1+o(1))] <= delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9504Scalable Katz Ranking Computation in Large Static and Dynamic Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9505
Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality is one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we propose upper and lower bounds on the Katz score of a given node. While previous approaches relied on numerical approximation or heuristics to compute Katz centrality rankings, we construct an algorithm that iteratively improves those upper and lower bounds until a correct Katz ranking is obtained. We extend our algorithm to dynamic graphs while maintaining its correctness guarantees. Experiments demonstrate that our static graph algorithm outperforms both numerical approaches and heuristics with speedups between 1.5 x and 3.5 x, depending on the desired quality guarantees. Our dynamic graph algorithm improves upon the static algorithm for update batches of less than 10000 edges. We provide efficient parallel CPU and GPU implementations of our algorithms that enable near real-time Katz centrality computation for graphs with hundreds of millions of nodes in fractions of seconds.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9505Round-Hashing for Data Storage: Distributed Servers and External-Memory Tables
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9506
This paper proposes round-hashing, which is suitable for data storage on distributed servers and for implementing external-memory tables in which each lookup retrieves at most one single block of external memory, using a stash. For data storage, round-hashing is like consistent hashing as it avoids a full rehashing of the keys when new servers are added. Experiments show that the speed to serve requests is tenfold or more than the state of the art. In distributed data storage, this guarantees better throughput for serving requests and, moreover, greatly reduces decision times for which data should move to new servers as rescanning data is much faster.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9506Algorithmic Building Blocks for Asymmetric Memories
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9507
The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy, bandwidth, and latency. This asymmetry can have a significant effect on algorithm design, and in many cases it is possible to reduce writes at the cost of more reads. This paper studies which algorithmic techniques are useful in designing practical write-efficient algorithms. We focus on several fundamental algorithmic building blocks including unordered set/map implemented using hash tables, comparison sort, and graph traversal algorithms including breadth-first search and Dijkstra's algorithm. We introduce new algorithms and implementations that can reduce writes, and analyze the performance experimentally using a software simulator. Finally, we summarize interesting lessons and directions in designing write-efficient algorithms that can be valuable to share.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9507On the Decision Tree Complexity of String Matching
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9508
String matching is one of the most fundamental problems in computer science. A natural problem is to determine the number of characters that need to be queried (i.e. the decision tree complexity) in a string in order to decide whether this string contains a certain pattern. Rivest showed that for every pattern p, in the worst case any deterministic algorithm needs to query at least n-|p|+1 characters, where n is the length of the string and |p| is the length of the pattern. He further conjectured that this bound is tight. By using the adversary method, Tuza disproved this conjecture and showed that more than one half of binary patterns are evasive, i.e. any algorithm needs to query all the characters (see Section 1.1 for more details).In this paper, we give a query algorithm which settles the decision tree complexity of string matching except for a negligible fraction of patterns. Our algorithm shows that Tuza's criteria of evasive patterns are almost complete. Using the algebraic approach of Rivest and Vuillemin, we also give a new sufficient condition for the evasiveness of patterns, which is beyond Tuza's criteria. In addition, our result reveals an interesting connection to Skolem's Problem in mathematics.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9508Decremental SPQR-trees for Planar Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9509
We present a decremental data structure for maintaining the SPQR-tree of a planar graph subject to edge contractions and deletions. The update time, amortized over Omega(n) operations, is O(log^2 n). Via SPQR-trees, we give a decremental data structure for maintaining 3-vertex connectivity in planar graphs. It answers queries in O(1) time and processes edge deletions and contractions in O(log^2 n) amortized time. The previous best supported deletions and insertions in O(sqrt{n}) time.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9509Computing the Chromatic Number Using Graph Decompositions via Matrix Rank
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9510
Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9510Polynomial Kernels for Hitting Forbidden Minors under Structural Parameterizations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9511
We investigate polynomial-time preprocessing for the problem of hitting forbidden minors in a graph, using the framework of kernelization. For a fixed finite set of graphs F, the F-Deletion problem is the following: given a graph G and integer k, is it possible to delete k vertices from G to ensure the resulting graph does not contain any graph from F as a minor? Earlier work by Fomin, Lokshtanov, Misra, and Saurabh [FOCS'12] showed that when F contains a planar graph, an instance (G,k) can be reduced in polynomial time to an equivalent one of size k^{O(1)}. In this work we focus on structural measures of the complexity of an instance, with the aim of giving nontrivial preprocessing guarantees for instances whose solutions are large. Motivated by several impossibility results, we parameterize the F-Deletion problem by the size of a vertex modulator whose removal results in a graph of constant treedepth eta.We prove that for each set F of connected graphs and constant eta, the F-Deletion problem parameterized by the size of a treedepth-eta modulator has a polynomial kernel. Our kernelization is fully explicit and does not depend on protrusion reduction or well-quasi-ordering, which are sources of algorithmic non-constructivity in earlier works on F-Deletion. Our main technical contribution is to analyze how models of a forbidden minor in a graph G with modulator X, interact with the various connected components of G-X. Using the language of labeled minors, we analyze the fragments of potential forbidden minor models that can remain after removing an optimal F-Deletion solution from a single connected component of G-X. By bounding the number of different types of behavior that can occur by a polynomial in |X|, we obtain a polynomial kernel using a recursive preprocessing strategy. Our results extend earlier work for specific instances of F-Deletion such as Vertex Cover and Feedback Vertex Set. It also generalizes earlier preprocessing results for F-Deletion parameterized by a vertex cover, which is a treedepth-one modulator.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9511Quantum Algorithms for Connectivity and Related Problems
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9512
An important family of span programs, st-connectivity span programs, have been used to design quantum algorithms in various contexts, including a number of graph problems and formula evaluation problems. The complexity of the resulting algorithms depends on the largest positive witness size of any 1-input, and the largest negative witness size of any 0-input. Belovs and Reichardt first showed that the positive witness size is exactly characterized by the effective resistance of the input graph, but only rough upper bounds were known previously on the negative witness size. We show that the negative witness size in an st-connectivity span program is exactly characterized by the capacitance of the input graph. This gives a tight analysis for algorithms based on st-connectivity span programs on any set of inputs.We use this analysis to give a new quantum algorithm for estimating the capacitance of a graph. We also describe a new quantum algorithm for deciding if a graph is connected, which improves the previous best quantum algorithm for this problem if we're promised that either the graph has at least kappa>1 components, or the graph is connected and has small average resistance, which is upper bounded by the diameter. We also give an alternative algorithm for deciding if a graph is connected that can be better than our first algorithm when the maximum degree is small. Finally, using ideas from our second connectivity algorithm, we give an algorithm for estimating the algebraic connectivity of a graph, the second largest eigenvalue of the Laplacian.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9512Generalized Coloring of Permutations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9513
A permutation pi is a merge of a permutation sigma and a permutation tau, if we can color the elements of pi red and blue so that the red elements have the same relative order as sigma and the blue ones as tau. We consider, for fixed hereditary permutation classes C and D, the complexity of determining whether a given permutation pi is a merge of an element of C with an element of D.We develop general algorithmic approaches for identifying polynomially tractable cases of merge recognition. Our tools include a version of nondeterministic logspace streaming recognizability of permutations, which we introduce, and a concept of bounded width decomposition, inspired by the work of Ahal and Rabinovich.As a consequence of the general results, we can provide nontrivial examples of tractable permutation merges involving commonly studied permutation classes, such as the class of layered permutations, the class of separable permutations, or the class of permutations avoiding a decreasing sequence of a given length.On the negative side, we obtain a general hardness result which implies, for example, that it is NP-complete to recognize the permutations that can be merged from two subpermutations avoiding the pattern 2413.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9513Solving Partition Problems Almost Always Requires Pushing Many Vertices Around
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9514
A fundamental graph problem is to recognize whether the vertex set of a graph G can be bipartitioned into sets A and B such that G[A] and G[B] satisfy properties Pi_A and Pi_B, respectively. This so-called (Pi_A,Pi_B)-Recognition problem generalizes amongst others the recognition of 3-colorable, bipartite, split, and monopolar graphs. A powerful algorithmic technique that can be used to obtain fixed-parameter algorithms for many cases of (Pi_A,Pi_B)-Recognition, as well as several other problems, is the pushing process. For bipartition problems, the process starts with an "almost correct" bipartition (A',B'), and pushes appropriate vertices from A' to B' and vice versa to eventually arrive at a correct bipartition.In this paper, we study whether (Pi_A,Pi_B)-Recognition problems for which the pushing process yields fixed-parameter algorithms also admit polynomial problem kernels. In our study, we focus on the first level above triviality, where Pi_A is the set of P_3-free graphs (disjoint unions of cliques, or cluster graphs), the parameter is the number of clusters in the cluster graph G[A], and Pi_B is characterized by a set H of connected forbidden induced subgraphs. We prove that, under the assumption that NP not subseteq coNP/poly, (Pi_A,Pi_B)-Recognition admits a polynomial kernel if and only if H contains a graph of order at most 2. In both the kernelization and the lower bound results, we make crucial use of the pushing process.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9514String Attractors: Verification and Optimization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9515
String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set Gamma subseteq [1..n] is a k-attractor for a string S in Sigma^n if and only if every distinct substring of S of length at most k has an occurrence crossing at least one of the positions in Gamma. Finding the smallest k-attractor is NP-hard for k >= 3, but polylogarithmic approximations can be found using reductions from dictionary compressors. It is easy to reduce the k-attractor problem to a set-cover instance where the string's positions are interpreted as sets of substrings. The main result of this paper is a much more powerful reduction based on the truncated suffix tree. Our new characterization of the problem leads to more efficient algorithms for string attractors: we show how to check the validity and minimality of a k-attractor in near-optimal time and how to quickly compute exact solutions. For example, we prove that a minimum 3-attractor can be found in O(n) time when |Sigma| in O(sqrt[3+epsilon]{log n}) for some constant epsilon > 0, despite the problem being NP-hard for large Sigma.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9515Data Reduction for Maximum Matching on Real-World Graphs: Theory and Experiments
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9516
Finding a maximum-cardinality or maximum-weight matching in (edge-weighted) undirected graphs is among the most prominent problems of algorithmic graph theory. For n-vertex and m-edge graphs, the best known algorithms run in O~(m sqrt{n}) time. We build on recent theoretical work focusing on linear-time data reduction rules for finding maximum-cardinality matchings and complement the theoretical results by presenting and analyzing (thereby employing the kernelization methodology of parameterized complexity analysis) linear-time data reduction rules for the positive-integer-weighted case. Moreover, we experimentally demonstrate that these data reduction rules provide significant speedups of the state-of-the art implementation for computing matchings in real-world graphs: the average speedup is 3800% in the unweighted case and "just" 30% in the weighted case.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9516Searching a Tree with Permanently Noisy Advice
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9517
We consider a search problem on trees using unreliable guiding instructions. Specifically, an agent starts a search at the root of a tree aiming to find a treasure hidden at one of the nodes by an adversary. Each visited node holds information, called advice, regarding the most promising neighbor to continue the search. However, the memory holding this information may be unreliable. Modeling this scenario, we focus on a probabilistic setting. That is, the advice at a node is a pointer to one of its neighbors. With probability q each node is faulty, independently of other nodes, in which case its advice points at an arbitrary neighbor, chosen uniformly at random. Otherwise, the node is sound and points at the correct neighbor. Crucially, the advice is permanent, in the sense that querying a node several times would yield the same answer. We evaluate efficiency by two measures: The move complexity denotes the expected number of edge traversals, and the query complexity denotes the expected number of queries.Let Delta denote the maximal degree. Roughly speaking, the main message of this paper is that a phase transition occurs when the noise parameter q is roughly 1/sqrt{Delta}. More precisely, we prove that above the threshold, every search algorithm has query complexity (and move complexity) which is both exponential in the depth d of the treasure and polynomial in the number of nodes n. Conversely, below the threshold, there exists an algorithm with move complexity O(d sqrt{Delta}), and an algorithm with query complexity O(sqrt{Delta}log Delta log^2 n). Moreover, for the case of regular trees, we obtain an algorithm with query complexity O(sqrt{Delta}log n log log n). For q that is below but close to the threshold, the bound for the move complexity is tight, and the bounds for the query complexity are not far from the lower bound of Omega(sqrt{Delta}log_Delta n).In addition, we also consider a semi-adversarial variant, in which an adversary chooses the direction of advice at faulty nodes. For this variant, the threshold for efficient moving algorithms happens when the noise parameter is roughly 1/Delta. Above this threshold a simple protocol that follows each advice with a fixed probability already achieves optimal move complexity.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9517Efficient and Adaptive Parameterized Algorithms on Modular Decompositions
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9518
We study the influence of a graph parameter called modular-width on the time complexity for optimally solving well-known polynomial problems such as Maximum Matching, Triangle Counting, and Maximum s-t Vertex-Capacitated Flow. The modular-width of a graph depends on its (unique) modular decomposition tree, and can be computed in linear time O(n+m) for graphs with n vertices and m edges. Modular decompositions are an important tool for graph algorithms, e.g., for linear-time recognition of certain graph classes.Throughout, we obtain efficient parameterized algorithms of running times O(f(mw)n+m), O(n+f(mw)m) , or O(f(mw)+n+m) for low polynomial functions f and graphs of modular-width mw. Our algorithm for Maximum Matching, running in time O(mw^2 log mw n+m), is both faster and simpler than the recent O(mw^4n+m) time algorithm of Coudert et al. (SODA 2018). For several other problems, e.g., Triangle Counting and Maximum b-Matching, we give adaptive algorithms, meaning that their running times match the best unparameterized algorithms for worst-case modular-width of mw=Theta(n) and they outperform them already for mw=o(n), until reaching linear time for mw=O(1).Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9518On Nondeterministic Derandomization of Freivalds' Algorithm: Consequences, Avenues and Algorithmic Progress
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9519
Motivated by studying the power of randomness, certifying algorithms and barriers for fine-grained reductions, we investigate the question whether the multiplication of two n x n matrices can be performed in near-optimal nondeterministic time O~(n^2). Since a classic algorithm due to Freivalds verifies correctness of matrix products probabilistically in time O(n^2), our question is a relaxation of the open problem of derandomizing Freivalds' algorithm.We discuss consequences of a positive or negative resolution of this problem and provide potential avenues towards resolving it. Particularly, we show that sufficiently fast deterministic verifiers for 3SUM or univariate polynomial identity testing yield faster deterministic verifiers for matrix multiplication. Furthermore, we present the partial algorithmic progress that distinguishing whether an integer matrix product is correct or contains between 1 and n erroneous entries can be performed in time O~(n^2) - interestingly, the difficult case of deterministic matrix product verification is not a problem of "finding a needle in the haystack", but rather cancellation effects in the presence of many errors.Our main technical contribution is a deterministic algorithm that corrects an integer matrix product containing at most t errors in time O~(sqrt{t} n^2 + t^2). To obtain this result, we show how to compute an integer matrix product with at most t nonzeroes in the same running time. This improves upon known deterministic output-sensitive integer matrix multiplication algorithms for t = Omega(n^{2/3}) nonzeroes, which is of independent interest.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9519Optimal Online Contention Resolution Schemes via Ex-Ante Prophet Inequalities
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9520
Online contention resolution schemes (OCRSs) were proposed by Feldman, Svensson, and Zenklusen [Moran Feldman et al., 2016] as a generic technique to round a fractional solution in the matroid polytope in an online fashion. It has found applications in several stochastic combinatorial problems where there is a commitment constraint: on seeing the value of a stochastic element, the algorithm has to immediately and irrevocably decide whether to select it while always maintaining an independent set in the matroid. Although OCRSs immediately lead to prophet inequalities, these prophet inequalities are not optimal. Can we instead use prophet inequalities to design optimal OCRSs?We design the first optimal 1/2-OCRS for matroids by reducing the problem to designing a matroid prophet inequality where we compare to the stronger benchmark of an ex-ante relaxation. We also introduce and design optimal (1-1/e)-random order CRSs for matroids, which are similar to OCRSs but the arrival order is chosen uniformly at random.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9520Equilibrium Computation in Atomic Splittable Routing Games
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9521
We present polynomial-time algorithms as well as hardness results for equilibrium computation in atomic splittable routing games, for the case of general convex cost functions. These games model traffic in freight transportation, market oligopolies, data networks, and various other applications. An atomic splittable routing game is played on a network where the edges have traffic-dependent cost functions, and player strategies correspond to flows in the network. A player can thus split its traffic arbitrarily among different paths. While many properties of equilibria in these games have been studied, efficient algorithms for equilibrium computation are known for only two cases: if cost functions are affine, or if players are symmetric. Neither of these conditions is met in most practical applications. We present two algorithms for routing games with general convex cost functions on parallel links. The first algorithm is exponential in the number of players, while the second is exponential in the number of edges; thus if either of these is small, we get a polynomial-time algorithm. These are the first algorithms for these games with convex cost functions. Lastly, we show that in general networks, given input C, it is NP-hard to decide if there exists an equilibrium where every player has cost at most C.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9521Online Non-Preemptive Scheduling to Minimize Weighted Flow-time on Unrelated Machines
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9522
In this paper, we consider the online problem of scheduling independent jobs non-preemptively so as to minimize the weighted flow-time on a set of unrelated machines. There has been a considerable amount of work on this problem in the preemptive setting where several competitive algorithms are known in the classical competitive model. However, the problem in the non-preemptive setting admits a strong lower bound. Recently, Lucarelli et al. presented an algorithm that achieves a O(1/epsilon^2)-competitive ratio when the algorithm is allowed to reject epsilon-fraction of total weight of jobs and has an epsilon-speed augmentation. They further showed that speed augmentation alone is insufficient to derive any competitive algorithm. An intriguing open question is whether there exists a scalable competitive algorithm that rejects a small fraction of total weights.In this paper, we affirmatively answer this question. Specifically, we show that there exists a O(1/epsilon^3)-competitive algorithm for minimizing weighted flow-time on a set of unrelated machine that rejects at most O(epsilon)-fraction of total weight of jobs. The design and analysis of the algorithm is based on the primal-dual technique. Our result asserts that alternative models beyond speed augmentation should be explored when designing online schedulers in the non-preemptive setting in an effort to find provably good algorithms.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9522Finding Stable Matchings That Are Robust to Errors in the Input
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9523
In this paper, we introduce the issue of finding solutions to the stable matching problem that are robust to errors in the input and we obtain the first algorithmic results on this topic. In the process, we also initiate work on a new structural question concerning the stable matching problem, namely finding relationships between the lattices of solutions of two "nearby" instances.Our main algorithmic result is the following: We identify a polynomially large class of errors, D, that can be introduced in a stable matching instance. Given an instance A of stable matching, let B be the instance that results after introducing one error from D, chosen via a discrete probability distribution. The problem is to find a stable matching for A that maximizes the probability of being stable for B as well. Via new structural properties of the type described in the question stated above, we give a polynomial time algorithm for this problem.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9523Disconnected Cuts in Claw-free Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9524
A disconnected cut of a connected graph is a vertex cut that itself also induces a disconnected subgraph. The corresponding decision problem is called Disconnected Cut. It is known that Disconnected Cut is NP-hard on general graphs, while polynomial-time algorithms exist for several graph classes. However, the complexity of the problem on claw-free graphs remained an open question. Its connection to the complexity of the problem to contract a claw-free graph to the 4-vertex cycle C_4 led Ito et al. (TCS 2011) to explicitly ask to resolve this open question. We prove that Disconnected Cut is polynomial-time solvable on claw-free graphs, answering the question of Ito et al. The basis for our result is a decomposition theorem for claw-free graphs of diameter 2, which we believe is of independent interest and builds on the research line initiated by Chudnovsky and Seymour (JCTB 2007-2012) and Hermelin et al. (ICALP 2011). On our way to exploit this decomposition theorem, we characterize how disconnected cuts interact with certain cobipartite subgraphs, and prove two further algorithmic results, namely that Disconnected Cut is polynomial-time solvable on circular-arc graphs and line graphs.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9524Practical Low-Dimensional Halfspace Range Space Sampling
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9525
We develop, analyze, implement, and compare new algorithms for creating epsilon-samples of range spaces defined by halfspaces which have size sub-quadratic in 1/epsilon, and have runtime linear in the input size and near-quadratic in 1/epsilon. The key to our solution is an efficient construction of partition trees. Despite not requiring any techniques developed after the early 1990s, apparently such a result was never explicitly described. We demonstrate that our implementations, including new implementations of several variants of partition trees, do indeed run in time linear in the input, appear to run linear in output size, and observe smaller error for the same size sample compared to the ubiquitous random sample (which requires size quadratic in 1/epsilon). This result has direct applications in speeding up discrepancy evaluation, approximate range counting, and spatial anomaly detection.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9525Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9526
We present two stable mergesort variants, "peeksort" and "powersort", that exploit existing runs and find nearly-optimal merging orders with negligible overhead. Previous methods either require substantial effort for determining the merging order (Takaoka 2009; Barbay & Navarro 2013) or do not have an optimal worst-case guarantee (Peters 2002; Auger, Nicaud & Pivoteau 2015; Buss & Knop 2018) . We demonstrate that our methods are competitive in terms of running time with state-of-the-art implementations of stable sorting methods.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9526On a Problem of Danzer
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9527
Let C be a bounded convex object in R^d, and P a set of n points lying outside C. Further let c_p, c_q be two integers with 1 <= c_q <= c_p <= n - floor[d/2], such that every c_p + floor[d/2] points of P contains a subset of size c_q + floor[d/2] whose convex-hull is disjoint from C. Then our main theorem states the existence of a partition of P into a small number of subsets, each of whose convex-hull is disjoint from C. Our proof is constructive and implies that such a partition can be computed in polynomial time.In particular, our general theorem implies polynomial bounds for Hadwiger-Debrunner (p, q) numbers for balls in R^d. For example, it follows from our theorem that when p > q >= (1+beta) * d/2 for beta > 0, then any set of balls satisfying the HD(p,q) property can be hit by O(q^2 p^{1+1/(beta)} log p) points. This is the first improvement over a nearly 60-year old exponential bound of roughly O(2^d).Our results also complement the results obtained in a recent work of Keller et al. where, apart from improvements to the bound on HD(p, q) for convex sets in R^d for various ranges of p and q, a polynomial bound is obtained for regions with low union complexity in the plane.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9527Quasi-Polynomial Time Approximation Schemes for Packing and Covering Problems in Planar Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9528
We consider two optimization problems in planar graphs. In {Maximum Weight Independent Set of Objects} we are given a graph G and a family D of {objects}, each being a connected subgraph of G with a prescribed weight, and the task is to find a maximum-weight subfamily of D consisting of pairwise disjoint objects. In {Minimum Weight Distance Set Cover} we are given an edge-weighted graph G, two sets D,C of vertices of G, where vertices of D have prescribed weights, and a nonnegative radius r. The task is to find a minimum-weight subset of D such that every vertex of C is at distance at most r from some selected vertex. Via simple reductions, these two problems generalize a number of geometric optimization tasks, notably {Maximum Weight Independent Set} for polygons in the plane and {Weighted Geometric Set Cover} for unit disks and unit squares. We present {quasi-polynomial time approximation schemes} (QPTASs) for both of the above problems in planar graphs: given an accuracy parameter epsilon>0 we can compute a solution whose weight is within multiplicative factor of (1+epsilon) from the optimum in time 2^{poly(1/epsilon,log |D|)}* n^{O(1)}, where n is the number of vertices of the input graph. Our main technical contribution is to transfer the techniques used for recursive approximation schemes for geometric problems due to Adamaszek, Har-Peled, and Wiese [Adamaszek and Wiese, 2013; Adamaszek and Wiese, 2014; Sariel Har-Peled, 2014] to the setting of planar graphs. In particular, this yields a purely combinatorial viewpoint on these methods.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9528On Learning Linear Functions from Subset and Its Applications in Quantum Computing
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9529
Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9529Strong Collapse for Persistence
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9530
We introduce a fast and memory efficient approach to compute the persistent homology (PH) of a sequence of simplicial complexes. The basic idea is to simplify the complexes of the input sequence by using strong collapses, as introduced by J. Barmak and E. Miniam [DCG (2012)], and to compute the PH of an induced sequence of reduced simplicial complexes that has the same PH as the initial one. Our approach has several salient features that distinguishes it from previous work. It is not limited to filtrations (i.e. sequences of nested simplicial subcomplexes) but works for other types of sequences like towers and zigzags. To strong collapse a simplicial complex, we only need to store the maximal simplices of the complex, not the full set of all its simplices, which saves a lot of space and time. Moreover, the complexes in the sequence can be strong collapsed independently and in parallel. As a result and as demonstrated by numerous experiments on publicly available data sets, our approach is extremely fast and memory efficient in practice.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9530On the Complexity of the (Approximate) Nearest Colored Node Problem
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9531
Given a graph G=(V,E) where each vertex is assigned a color from the set C={c_1, c_2, .., c_sigma}. In the (approximate) nearest colored node problem, we want to query, given v in V and c in C, for the (approximate) distance dist^(v, c) from v to the nearest node of color c. For any integer 1 <= k <= log n, we present a Color Distance Oracle (also often referred to as Vertex-label Distance Oracle) of stretch 4k-5 using space O(kn sigma^{1/k}) and query time O(log{k}). This improves the query time from O(k) to O(log{k}) over the best known Color Distance Oracle by Chechik [Chechik, 2012].We then prove a lower bound in the cell probe model showing that even for unweighted undirected paths any static data structure that uses space S requires at least Omega (log (log{sigma} / log(S/n)+log log{n})) query time to give a distance estimate of stretch O(polylog(n)). This implies for the important case when sigma = Theta(n^{epsilon}) for some constant 0 < epsilon < 1, that our Color Distance Oracle has asymptotically optimal query time in regard to k, and that recent Color Distance Oracles for trees [Tsur, 2018] and planar graphs [Mozes and Skop, 2018] achieve asymptotically optimal query time in regard to n.We also investigate the setting where the data structure additionally has to support color-reassignments. We present the first Color Distance Oracle that achieves query times matching our lower bound from the static setting for large stretch yielding an exponential improvement over the best known query time [Chechik, 2014]. Finally, we give new conditional lower bounds proving the hardness of answering queries if edge insertions and deletion are allowed that strictly improve over recent bounds in time and generality.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9531Planar Support for Non-piercing Regions and Applications
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9532
Given a hypergraph H=(X,S), a planar support for H is a planar graph G with vertex set X, such that for each hyperedge S in S, the sub-graph of G induced by the vertices in S is connected. Planar supports for hypergraphs have found several algorithmic applications, including several packing and covering problems, hypergraph coloring, and in hypergraph visualization.The main result proved in this paper is the following: given two families of regions R and B in the plane, each of which consists of connected, non-piercing regions, the intersection hypergraph H_R(B) = (B, {B_r}_{r in R}), where B_r = {b in B: b cap r != empty set} has a planar support. Further, such a planar support can be computed in time polynomial in |R|, |B|, and the number of vertices in the arrangement of the regions in R cup B. Special cases of this result include the setting where either the family R, or the family B is a set of points.Our result unifies and generalizes several previous results on planar supports, PTASs for packing and covering problems on non-piercing regions in the plane and coloring of intersection hypergraph of non-piercing regions.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9532An Exact Algorithm for the Steiner Forest Problem
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9533
The Steiner forest problem asks for a minimum weight forest that spans a given number of terminal sets. The problem has famous linear programming based 2-approximations [Agrawal et al., 1995; Goemans and Williamson, 1995; Jain, 2001] whose bottleneck is the fact that the most natural formulation of the problem as an integer linear program (ILP) has an integrality gap of 2. We propose new cut-based ILP formulations for the problem along with exact branch-and-bound based algorithms. While our new formulations cannot improve the integrality gap, we can prove that one of them yields stronger linear programming bounds than the two previous strongest formulations: The directed cut formulation [Balakrishnan et al., 1989; Chopra and Rao, 1994] and the advanced flow-based formulation by Magnanti and Raghavan [Magnanti and Raghavan, 2005]. In an experimental evaluation, we show that the linear programming bounds of the new formulations are indeed strong on practical instances and that our new branch-and-bound algorithms outperform branch-and-bound algorithms based on the previous formulations. Our formulations can be seen as a cut-based analogon to [Magnanti and Raghavan, 2005], whose existence was an open problem.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9533Large Low-Diameter Graphs are Good Expanders
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9534
We revisit the classical question of the relationship between the diameter of a graph and its expansion properties. One direction is well understood: expander graphs exhibit essentially the lowest possible diameter. We focus on the reverse direction, showing that "sufficiently large" graphs of fixed diameter and degree must be "good" expanders. We prove this statement for various definitions of "sufficiently large" (multiplicative/additive factor from the largest possible size), for different forms of expansion (edge, vertex, and spectral expansion), and for both directed and undirected graphs. A recurring theme is that the lower the diameter of the graph and (more importantly) the larger its size, the better the expansion guarantees. Aside from inherent theoretical interest, our motivation stems from the domain of network design. Both low-diameter networks and expanders are prominent approaches to designing high-performance networks in parallel computing, HPC, datacenter networking, and beyond. Our results establish that these two approaches are, in fact, inextricably intertwined. We leave the reader with many intriguing questions for future research.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9534Improved Dynamic Graph Coloring
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9535
This paper studies the fundamental problem of graph coloring in fully dynamic graphs. Since the problem of computing an optimal coloring, or even approximating it to within n^{1-epsilon} for any epsilon > 0, is NP-hard in static graphs, there is no hope to achieve any meaningful computational results for general graphs in the dynamic setting. It is therefore only natural to consider the combinatorial aspects of dynamic coloring, or alternatively, study restricted families of graphs.Towards understanding the combinatorial aspects of this problem, one may assume a black-box access to a static algorithm for C-coloring any subgraph of the dynamic graph, and investigate the trade-off between the number of colors and the number of recolorings per update step. Optimizing the number of recolorings, sometimes referred to as the recourse bound, is important for various practical applications. In WADS'17, Barba et al. devised two complementary algorithms: For any beta > 0, the first (respectively, second) maintains an O(C beta n^{1/beta}) (resp., O(C beta))-coloring while recoloring O(beta) (resp., O(beta n^{1/beta})) vertices per update. Barba et al. also showed that the second trade-off appears to exhibit the right behavior, at least for beta = O(1): Any algorithm that maintains a c-coloring of an n-vertex dynamic forest must recolor Omega(n^{2/(c(c-1))}) vertices per update, for any constant c >= 2. Our contribution is two-fold:- We devise a new algorithm for general graphs that improves significantly upon the first trade-off in a wide range of parameters: For any beta > 0, we get a O~(C/(beta)log^2 n)-coloring with O(beta) recolorings per update, where the O~ notation supresses polyloglog(n) factors. In particular, for beta = O(1) we get constant recolorings with polylog(n) colors; not only is this an exponential improvement over the previous bound, but it also unveils a rather surprising phenomenon: The trade-off between the number of colors and recolorings is highly non-symmetric.- For uniformly sparse graphs, we use low out-degree orientations to strengthen the above result by bounding the update time of the algorithm rather than the number of recolorings. Then, we further improve this result by introducing a new data structure that refines bounded out-degree edge orientations and is of independent interest.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9535Soft Subdivision Motion Planning for Complex Planar Robots
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9536
The design and implementation of theoretically-sound robot motion planning algorithms is challenging. Within the framework of resolution-exact algorithms, it is possible to exploit soft predicates for collision detection. The design of soft predicates is a balancing act between easily implementable predicates and their accuracy/effectivity.In this paper, we focus on the class of planar polygonal rigid robots with arbitrarily complex geometry. We exploit the remarkable decomposability property of soft collision-detection predicates of such robots. We introduce a general technique to produce such a decomposition. If the robot is an m-gon, the complexity of this approach scales linearly in m. This contrasts with the O(m^3) complexity known for exact planners. It follows that we can now routinely produce soft predicates for any rigid polygonal robot. This results in resolution-exact planners for such robots within the general Soft Subdivision Search (SSS) framework. This is a significant advancement in the theory of sound and complete planners for planar robots.We implemented such decomposed predicates in our open-source Core Library. The experiments show that our algorithms are effective, perform in real time on non-trivial environments, and can outperform many sampling-based methods.Wed, 08 Aug 2018 16:13:25 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9536Front Matter, Table of Contents, Preface, Conference Organization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9404
Front Matter, Table of Contents, Preface, Conference OrganizationThu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9404Polylogarithmic Approximation Algorithms for Weighted-F-Deletion Problems
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9405
Let F be a family of graphs. A canonical vertex deletion problem corresponding to F is defined as follows: given an n-vertex undirected graph G and a weight function w: V(G) - >R^+, find a minimum weight subset S subseteq V(G) such that G-S belongs to F. This is known as Weighted F Vertex Deletion problem. In this paper we devise a recursive scheme to obtain O(log^{O(1)} n)-approximation algorithms for such problems, building upon the classical technique of finding balanced separators in a graph. Roughly speaking, our scheme applies to those problems, where an optimum solution S together with a well-structured set X, form a balanced separator of the input graph. In this paper, we obtain the first O(log^{O(1)} n)-approximation algorithms for the following vertex deletion problems.- Let {F} be a finite set of graphs containing a planar graph, and F=G(F) be the family of graphs such that every graph H in G(F) excludes all graphs in F as minors. The vertex deletion problem corresponding to F=G(F) is the Weighted Planar F-Minor-Free Deletion (WPF-MFD) problem. We give randomized and deterministic approximation algorithms for WPF-MFD with ratios O(log^{1.5} n) and O(log^2 n), respectively. Previously, only a randomized constant factor approximation algorithm for the unweighted version of the problem was known [FOCS 2012].- We give an O(log^2 n)-factor approximation algorithm for Weighted Chordal Vertex Deletion (WCVD), the vertex deletion problem to the family of chordal graphs. On the way to this algorithm, we also obtain a constant factor approximation algorithm for Multicut on chordal graphs.- We give an O(log^3 n)-factor approximation algorithm for Weighted Distance Hereditary Vertex Deletion (WDHVD), also known as Weighted Rankwidth-1 Vertex Deletion (WR-1VD). This is the vertex deletion problem to the family of distance hereditary graphs, or equivalently, the family of graphs of rankwidth one. We believe that our recursive scheme can be applied to obtain O(log^{O(1)} n)-approximation algorithms for many other problems as well.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9405Improved Approximation Bounds for the Minimum Constraint Removal Problem
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9406
In the minimum constraint removal problem, we are given a set of geometric objects as obstacles in the plane, and we want to find the minimum number of obstacles that must be removed to reach a target point t from the source point s by an obstacle-free path. The problem is known to be intractable, and (perhaps surprisingly) no sub-linear approximations are known even for simple obstacles such as rectangles and disks. The main result of our paper is a new approximation technique that gives O(sqrt{n})-approximation for rectangles, disks as well as rectilinear polygons. The technique also gives O(sqrt{n})-approximation for the minimum color path problem in graphs. We also present some inapproximability results for the geometric constraint removal problem.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9406A Tight 4/3 Approximation for Capacitated Vehicle Routing in Trees
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9407
Given a set of clients with demands, the Capacitated Vehicle Routing problem is to find a set of tours that collectively cover all client demand, such that the capacity of each vehicle is not exceeded and such that the sum of the tour lengths is minimized. In this paper, we provide a 4/3-approximation algorithm for Capacitated Vehicle Routing on trees, improving over the previous best-known approximation ratio of (sqrt{41}-1)/4 by Asano et al.[Asano et al., 2001], while using the same lower bound. Asano et al. show that there exist instances whose optimal cost is 4/3 times this lower bound. Notably, our 4/3 approximation ratio is therefore tight for this lower bound, achieving the best-possible performance.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9407Low Rank Approximation in the Presence of Outliers
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9408
We consider the problem of principal component analysis (PCA) in the presence of outliers. Given a matrix A (d x n) and parameters k, m, the goal is to remove a set of at most m columns of A (outliers), so as to minimize the rank-k approximation error of the remaining matrix (inliers). While much of the work on this problem has focused on recovery of the rank-k subspace under assumptions on the inliers and outliers, we focus on the approximation problem. Our main result shows that sampling-based methods developed in the outlier-free case give non-trivial guarantees even in the presence of outliers. Using this insight, we develop a simple algorithm that has bi-criteria guarantees. Further, unlike similar formulations for clustering, we show that bi-criteria guarantees are unavoidable for the problem, under appropriate complexity assumptions.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9408Greedy Bipartite Matching in Random Type Poisson Arrival Model
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9409
We introduce a new random input model for bipartite matching which we call the Random Type Poisson Arrival Model. Just like in the known i.i.d. model (introduced by Feldman et al. [Feldman et al., 2009]), online nodes have types in our model. In contrast to the adversarial types studied in the known i.i.d. model, following the random graphs studied in Mastin and Jaillet [A. Mastin, 2013], in our model each type graph is generated randomly by including each offline node in the neighborhood of an online node with probability c/n independently. In our model, nodes of the same type appear consecutively in the input and the number of times each type node appears is distributed according to the Poisson distribution with parameter 1. We analyze the performance of the simple greedy algorithm under this input model. The performance is controlled by the parameter c and we are able to exactly characterize the competitive ratio for the regimes c = o(1) and c = omega(1). We also provide a precise bound on the expected size of the matching in the remaining regime of constant c. We compare our results to the previous work of Mastin and Jaillet who analyzed the simple greedy algorithm in the G_{n,n,p} model where each online node type occurs exactly once. We essentially show that the approach of Mastin and Jaillet can be extended to work for the Random Type Poisson Arrival Model, although several nontrivial technical challenges need to be overcome. Intuitively, one can view the Random Type Poisson Arrival Model as the G_{n,n,p} model with less randomness; that is, instead of each online node having a new type, each online node has a chance of repeating the previous type.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9409Semi-Direct Sum Theorem and Nearest Neighbor under l_infty
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9410
We introduce semi-direct sum theorem as a framework for proving asymmetric communication lower bounds for the functions of the form V_{i=1}^n f(x,y_i). Utilizing tools developed in proving direct sum theorem for information complexity, we show that if the function is of the form V_{i=1}^n f(x,y_i) where Alice is given x and Bob is given y_i's, it suffices to prove a lower bound for a single f(x,y_i). This opens a new avenue of attack other than the conventional combinatorial technique (i.e. "richness lemma" from [Miltersen et al., 1995]) for proving randomized lower bounds for asymmetric communication for functions of such form.As the main technical result and an application of semi-direct sum framework, we prove an information lower bound on c-approximate Nearest Neighbor (ANN) under l_infty which implies that the algorithm of [Indyk, 2001] for c-approximate Nearest Neighbor under l_infty is optimal even under randomization for both decision tree and cell probe data structure model (under certain parameter assumption for the latter). In particular, this shows that randomization cannot improve [Indyk, 2001] under decision tree model. Previously only a deterministic lower bound was known by [Andoni et al., 2008] and randomized lower bound for cell probe model by [Kapralov and Panigrahy, 2012]. We suspect further applications of our framework in exhibiting randomized asymmetric communication lower bounds for big data applications.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9410Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9411
We study the distinct elements and l_p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram{} along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and l_p-heavy hitters that are nearly optimal in both n and epsilon.Applying our new composable histogram framework, we provide an algorithm that outputs a (1+epsilon)-approximation to the number of distinct elements in the sliding window model and uses O{1/(epsilon^2) log n log (1/epsilon)log log n+ (1/epsilon) log^2 n} bits of space. For l_p-heavy hitters, we provide an algorithm using space O{(1/epsilon^p) log^2 n (log^2 log n+log 1/epsilon)} for 0<p <=2, improving upon the best-known algorithm for l_2-heavy hitters (Braverman et al., COCOON 2014), which has space complexity O{1/epsilon^4 log^3 n}. We also show complementing nearly optimal lower bounds of Omega ((1/epsilon) log^2 n+(1/epsilon^2) log n) for distinct elements and Omega ((1/epsilon^p) log^2 n) for l_p-heavy hitters, both tight up to O{log log n} and O{log 1/epsilon} factors.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9411Survivable Network Design for Group Connectivity in Low-Treewidth Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9412
In the Group Steiner Tree problem (GST), we are given a (edge or vertex)-weighted graph G=(V,E) on n vertices, together with a root vertex r and a collection of groups {S_i}_{i in [h]}: S_i subseteq V(G). The goal is to find a minimum-cost subgraph H that connects the root to every group. We consider a fault-tolerant variant of GST, which we call Restricted (Rooted) Group SNDP. In this setting, each group S_i has a demand k_i in [k], k in N, and we wish to find a minimum-cost subgraph H subseteq G such that, for each group S_i, there is a vertex in the group that is connected to the root via k_i (vertex or edge) disjoint paths.While GST admits O(log^2 n log h) approximation, its higher connectivity variants are known to be Label-Cover hard, and for the vertex-weighted version, the hardness holds even when k=2 (it is widely believed that there is no subpolynomial approximation for the Label-Cover problem [Bellare et al., STOC 1993]). More precisely, the problem admits no 2^{log^{1-epsilon}n}-approximation unless NP subseteq DTIME(n^{polylog(n)}). Previously, positive results were known only for the edge-weighted version when k=2 [Gupta et al., SODA 2010; Khandekar et al., Theor. Comput. Sci., 2012] and for a relaxed variant where k_i disjoint paths from r may end at different vertices in a group [Chalermsook et al., SODA 2015], for which the authors gave a bicriteria approximation. For k >= 3, there is no non-trivial approximation algorithm known for edge-weighted Restricted Group SNDP, except for the special case of the relaxed variant on trees (folklore).Our main result is an O(log n log h) approximation algorithm for Restricted Group SNDP that runs in time n^{f(k, w)}, where w is the treewidth of the input graph. Our algorithm works for both edge and vertex weighted variants, and the approximation ratio nearly matches the lower bound when k and w are constants. The key to achieving this result is a non-trivial extension of a framework introduced in [Chalermsook et al., SODA 2017]. This framework first embeds all feasible solutions to the problem into a dynamic program (DP) table. However, finding the optimal solution in the DP table remains intractable. We formulate a linear program relaxation for the DP and obtain an approximate solution via randomized rounding. This framework also allows us to systematically construct DP tables for high-connectivity problems. As a result, we present new exact algorithms for several variants of survivable network design problems in low-treewidth graphs.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9412Perturbation Resilient Clustering for k-Center and Related Problems via LP Relaxations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9413
We consider clustering in the perturbation resilience model that has been studied since the work of Bilu and Linial [Yonatan Bilu and Nathan Linial, 2010] and Awasthi, Blum and Sheffet [Awasthi et al., 2012]. A clustering instance I is said to be alpha-perturbation resilient if the optimal solution does not change when the pairwise distances are modified by a factor of alpha and the perturbed distances satisfy the metric property - this is the metric perturbation resilience property introduced in [Angelidakis et al., 2017] and a weaker requirement than prior models. We make two high-level contributions. - We show that the natural LP relaxation of k-center and asymmetric k-center is integral for 2-perturbation resilient instances. We belive that demonstrating the goodness of standard LP relaxations complements existing results [Maria{-}Florina Balcan et al., 2016; Angelidakis et al., 2017] that are based on new algorithms designed for the perturbation model. - We define a simple new model of perturbation resilience for clustering with outliers. Using this model we show that the unified MST and dynamic programming based algorithm proposed in [Angelidakis et al., 2017] exactly solves the clustering with outliers problem for several common center based objectives (like k-center, k-means, k-median) when the instances is 2-perturbation resilient. We further show that a natural LP relxation is integral for 2-perturbation resilient instances of k-center with outliers.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9413Sherali-Adams Integrality Gaps Matching the Log-Density Threshold
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9414
The log-density method is a powerful algorithmic framework which in recent years has given rise to the best-known approximations for a variety of problems, including Densest-k-Subgraph and Small Set Bipartite Vertex Expansion. These approximations have been conjectured to be optimal based on various instantiations of a general conjecture: that it is hard to distinguish a fully random combinatorial structure from one which contains a similar planted sub-structure with the same "log-density".We bolster this conjecture by showing that in a random hypergraph with edge probability n^{-alpha}, Omega(log n) rounds of Sherali-Adams cannot rule out the existence of a k-subhypergraph with edge density k^{-alpha-o(1)}, for any k and alpha. This holds even when the bound on the objective function is lifted. This gives strong integrality gaps which exactly match the gap in the above distinguishing problems, as well as the best-known approximations, for Densest k-Subgraph, Smallest p-Edge Subgraph, their hypergraph extensions, and Small Set Bipartite Vertex Expansion (or equivalently, Minimum p-Union). Previously, such integrality gaps were known only for Densest k-Subgraph for one specific parameter setting.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9414Lower Bounds for Approximating Graph Parameters via Communication Complexity
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9415
In a celebrated work, Blais, Brody, and Matulef [Blais et al., 2012] developed a technique for proving property testing lower bounds via reductions from communication complexity. Their work focused on testing properties of functions, and yielded new lower bounds as well as simplified analyses of known lower bounds. Here, we take a further step in generalizing the methodology of [Blais et al., 2012] to analyze the query complexity of graph parameter estimation problems. In particular, our technique decouples the lower bound arguments from the representation of the graph, allowing it to work with any query type.We illustrate our technique by providing new simpler proofs of previously known tight lower bounds for the query complexity of several graph problems: estimating the number of edges in a graph, sampling edges from an almost-uniform distribution, estimating the number of triangles (and more generally, r-cliques) in a graph, and estimating the moments of the degree distribution of a graph. We also prove new lower bounds for estimating the edge connectivity of a graph and estimating the number of instances of any fixed subgraph in a graph. We show that the lower bounds for estimating the number of triangles and edge connectivity also hold in a strictly stronger computational model that allows access to uniformly random edge samples.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9415Communication Complexity of Correlated Equilibrium with Small Support
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9416
We define a two-player N x N game called the 2-cycle game, that has a unique pure Nash equilibrium which is also the only correlated equilibrium of the game. In this game, every 1/poly(N)-approximate correlated equilibrium is concentrated on the pure Nash equilibrium. We show that the randomized communication complexity of finding any 1/poly(N)-approximate correlated equilibrium of the game is Omega(N). For small approximation values, our lower bound answers an open question of Babichenko and Rubinstein (STOC 2017).Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9416
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9417
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9417Online Makespan Minimization: The Power of Restart
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9418
We consider the online makespan minimization problem on identical machines. Chen and Vestjens (ORL 1997) show that the largest processing time first (LPT) algorithm is 1.5-competitive. For the special case of two machines, Noga and Seiden (TCS 2001) introduce the SLEEPY algorithm that achieves a competitive ratio of (5 - sqrt{5})/2 ~~ 1.382, matching the lower bound by Chen and Vestjens (ORL 1997). Furthermore, Noga and Seiden note that in many applications one can kill a job and restart it later, and they leave an open problem whether algorithms with restart can obtain better competitive ratios.We resolve this long-standing open problem on the positive end. Our algorithm has a natural rule for killing a processing job: a newly-arrived job replaces the smallest processing job if 1) the new job is larger than other pending jobs, 2) the new job is much larger than the processing one, and 3) the processed portion is small relative to the size of the new job. With appropriate choice of parameters, we show that our algorithm improves the 1.5 competitive ratio for the general case, and the 1.382 competitive ratio for the two-machine case.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9418On Sketching the q to p Norms
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9419
We initiate the study of data dimensionality reduction, or sketching, for the q -> p norms. Given an n x d matrix A, the q -> p norm, denoted |A |_{q -> p} = sup_{x in R^d \ 0} |Ax |_p / |x |_q, is a natural generalization of several matrix and vector norms studied in the data stream and sketching models, with applications to datamining, hardness of approximation, and oblivious routing. We say a distribution S on random matrices L in R^{nd} - > R^k is a (k,alpha)-sketching family if from L(A), one can approximate |A |_{q -> p} up to a factor alpha with constant probability. We provide upper and lower bounds on the sketching dimension k for every p, q in [1, infty], and in a number of cases our bounds are tight. While we mostly focus on constant alpha, we also consider large approximation factors alpha, as well as other variants of the problem such as when A has low rank.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9419Flow-time Optimization for Concurrent Open-Shop and Precedence Constrained Scheduling Models
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9420
Scheduling a set of jobs over a collection of machines is a fundamental problem that needs to be solved millions of times a day in various computing platforms: in operating systems, in large data clusters, and in data centers. Along with makespan, flow-time, which measures the length of time a job spends in a system before it completes, is arguably the most important metric to measure the performance of a scheduling algorithm. In recent years, there has been a remarkable progress in understanding flow-time based objective functions in diverse settings such as unrelated machines scheduling, broadcast scheduling, multi-dimensional scheduling, to name a few.Yet, our understanding of the flow-time objective is limited mostly to the scenarios where jobs have no dependencies. On the other hand, in almost all real world applications, think of MapReduce settings for example, jobs have dependencies that need to be respected while making scheduling decisions. In this paper, we take first steps towards understanding this complex problem. In particular, we consider two classical scheduling problems that capture dependencies across jobs: 1) concurrent open-shop scheduling (COSSP) and 2) precedence constrained scheduling. Our main motivation to study these problems specifically comes from their relevance to two scheduling problems that have gained importance in the context of data centers: co-flow scheduling and DAG scheduling. We design almost optimal approximation algorithms for COSSP and PCSP, and show hardness results.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9420Sublinear-Time Quadratic Minimization via Spectral Decomposition of Matrices
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9421
We design a sublinear-time approximation algorithm for quadratic function minimization problems with a better error bound than the previous algorithm by Hayashi and Yoshida (NIPS'16). Our approximation algorithm can be modified to handle the case where the minimization is done over a sphere. The analysis of our algorithms is obtained by combining results from graph limit theory, along with a novel spectral decomposition of matrices. Specifically, we prove that a matrix A can be decomposed into a structured part and a pseudorandom part, where the structured part is a block matrix with a polylogarithmic number of blocks, such that in each block all the entries are the same, and the pseudorandom part has a small spectral norm, achieving better error bound than the existing decomposition theorem of Frieze and Kannan (FOCS'96). As an additional application of the decomposition theorem, we give a sublinear-time approximation algorithm for computing the top singular values of a matrix.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9421Deterministic Heavy Hitters with Sublinear Query Time
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9422
We study the classic problem of finding l_1 heavy hitters in the streaming model. In the general turnstile model, we give the first deterministic sublinear-time sketching algorithm which takes a linear sketch of length O(epsilon^{-2} log n * log^*(epsilon^{-1})), which is only a factor of log^*(epsilon^{-1}) more than the best existing polynomial-time sketching algorithm (Nelson et al., RANDOM '12). Our approach is based on an iterative procedure, where most unrecovered heavy hitters are identified in each iteration. Although this technique has been extensively employed in the related problem of sparse recovery, this is the first time, to the best of our knowledge, that it has been used in the context of heavy hitters. Along the way we also obtain a sublinear time algorithm for the closely related problem of the l_1/l_1 compressed sensing, matching the space usage of previous (super-)linear time algorithms. In the strict turnstile model, we show that the runtime can be improved and the sketching matrix can be made strongly explicit with O(epsilon^{-2}log^3 n/log^3(1/epsilon)) rows.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9422On Low-Risk Heavy Hitters and Sparse Recovery Schemes
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9423
We study the heavy hitters and related sparse recovery problems in the low failure probability regime. This regime is not well-understood, and the main previous work on this is by Gilbert et al. (ICALP'13). We recognize an error in their analysis, improve their results, and contribute new sparse recovery algorithms, as well as provide upper and lower bounds for the heavy hitters problem with low failure probability. Our results are summarized as follows: 1) (Heavy Hitters) We study three natural variants for finding heavy hitters in the strict turnstile model, where the variant depends on the quality of the desired output. For the weakest variant, we give a randomized algorithm improving the failure probability analysis of the ubiquitous Count-Min data structure. We also give a new lower bound for deterministic schemes, resolving a question about this variant posed in Question 4 in the IITK Workshop on Algorithms for Data Streams (2006). Under the strongest and well-studied l_{infty}/ l_2 variant, we show that the classical Count-Sketch data structure is optimal for very low failure probabilities, which was previously unknown. 2) (Sparse Recovery Algorithms) For non-adaptive sparse-recovery, we give sublinear-time algorithms with low-failure probability, which improve upon Gilbert et al. (ICALP'13). In the adaptive case, we improve the failure probability from a constant by Indyk et al. (FOCS '11) to e^{-k^{0.99}}, where k is the sparsity parameter. 3) (Optimal Average-Case Sparse Recovery Bounds) We give matching upper and lower bounds in all parameters, including the failure probability, for the measurement complexity of the l_2/l_2 sparse recovery problem in the spiked-covariance model, completely settling its complexity in this model.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9423Mildly Exponential Time Approximation Algorithms for Vertex Cover, Balanced Separator and Uniform Sparsest Cut
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9424
In this work, we study the trade-off between the running time of approximation algorithms and their approximation guarantees. By leveraging a structure of the "hard" instances of the Arora-Rao-Vazirani lemma [Sanjeev Arora et al., 2009; James R. Lee, 2005], we show that the Sum-of-Squares hierarchy can be adapted to provide "fast", but still exponential time, approximation algorithms for several problems in the regime where they are believed to be NP-hard. Specifically, our framework yields the following algorithms; here n denote the number of vertices of the graph and r can be any positive real number greater than 1 (possibly depending on n). - A (2 - 1/(O(r)))-approximation algorithm for Vertex Cover that runs in exp (n/(2^{r^2)})n^{O(1)} time. - An O(r)-approximation algorithms for Uniform Sparsest Cut and Balanced Separator that runs in exp (n/(2^{r^2)})n^{O(1)} time. Our algorithm for Vertex Cover improves upon Bansal et al.'s algorithm [Nikhil Bansal et al., 2017] which achieves (2 - 1/(O(r)))-approximation in time exp (n/(r^r))n^{O(1)}. For Uniform Sparsest Cut and Balanced Separator, our algorithms improve upon O(r)-approximation exp (n/(2^r))n^{O(1)}-time algorithms that follow from a work of Charikar et al. [Moses Charikar et al., 2010].Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9424Deterministic O(1)-Approximation Algorithms to 1-Center Clustering with Outliers
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9425
The 1-center clustering with outliers problem asks about identifying a prototypical robust statistic that approximates the location of a cluster of points. Given some constant 0 < alpha < 1 and n points such that alpha n of them are in some (unknown) ball of radius r, the goal is to compute a ball of radius O(r) that also contains alpha n points. This problem can be formulated with the points in a normed vector space such as R^d or in a general metric space.The problem has a simple randomized solution: a randomly selected point is a correct solution with constant probability, and its correctness can be verified in linear time. However, the deterministic complexity of this problem was not known. In this paper, for any L^p vector space, we show an O(nd)-time solution with a ball of radius O(r) for a fixed alpha > 1/2, and for any normed vector space, we show an O(nd)-time solution with a ball of radius O(r) when alpha > 1/2 as well as an O(nd log^{(k)}(n))-time solution with a ball of radius O(r) for all alpha > 0, k in N, where log^{(k)}(n) represents the kth iterated logarithm, assuming distance computation and vector space operations take O(d) time. For an arbitrary metric space, we show for any C in N an O(n^{1+1/C})-time solution that finds a ball of radius 2Cr, assuming distance computation between any pair of points takes O(1)-time, and show that for any alpha, C, an O(n^{1+1/C})-time solution that finds a ball of radius ((2C-3)(1-alpha)-1)r cannot exist.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9425Robust Online Speed Scaling With Deadline Uncertainty
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9426
A speed scaling problem is considered, where time is divided into slots, and jobs with payoff v arrive at the beginning of the slot with associated deadlines d. Each job takes one slot to be processed, and multiple jobs can be processed by the server in each slot with energy cost g(k) for processing k jobs in one slot. The payoff is accrued by the algorithm only if the job is processed by its deadline. We consider a robust version of this speed scaling problem, where a job on its arrival reveals its payoff v, however, the deadline is hidden to the online algorithm, which could potentially be chosen adversarially and known to the optimal offline algorithm. The objective is to derive a robust (to deadlines) and optimal online algorithm that achieves the best competitive ratio. We propose an algorithm (called min-LCR) and show that it is an optimal online algorithm for any convex energy cost function g(.). We do so without actually evaluating the optimal competitive ratio, and give a general proof that works for any convex g, which is rather novel. For the popular choice of energy cost function g(k) = k^alpha, alpha >= 2, we give concrete bounds on the competitive ratio of the algorithm, which ranges between 2.618 and 3 depending on the value of alpha. The best known online algorithm for the same problem, but where deadlines are revealed to the online algorithm has competitive ratio of 2 and a lower bound of sqrt{2}. Thus, importantly, lack of deadline knowledge does not make the problem degenerate, and the effect of deadline information on the optimal competitive ratio is limited.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9426Multi-Agent Submodular Optimization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9427
Recent years have seen many algorithmic advances in the area of submodular optimization: (SO) min/max~f(S): S in F, where F is a given family of feasible sets over a ground set V and f:2^V - > R is submodular. This progress has been coupled with a wealth of new applications for these models. Our focus is on a more general class of multi-agent submodular optimization (MASO) min/max Sum_{i=1}^{k} f_i(S_i): S_1 u+ S_2 u+ ... u+ S_k in F. Here we use u+ to denote disjoint union and hence this model is attractive where resources are being allocated across k agents, each with its own submodular cost function f_i(). This was introduced in the minimization setting by Goel et al. In this paper we explore the extent to which the approximability of the multi-agent problems are linked to their single-agent versions, referred to informally as the multi-agent gap.We present different reductions that transform a multi-agent problem into a single-agent one. For minimization, we show that (MASO) has an O(alpha * min{k, log^2 (n)})-approximation whenever (SO) admits an alpha-approximation over the convex formulation. In addition, we discuss the class of "bounded blocker" families where there is a provably tight O(log n) multi-agent gap between (MASO) and (SO). For maximization, we show that monotone (resp. nonmonotone) (MASO) admits an alpha (1-1/e) (resp. alpha * 0.385) approximation whenever monotone (resp. nonmonotone) (SO) admits an alpha-approximation over the multilinear formulation; and the 1-1/e multi-agent gap for monotone objectives is tight. We also discuss several families (such as spanning trees, matroids, and p-systems) that have an (optimal) multi-agent gap of 1. These results substantially expand the family of tractable models for submodular maximization.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9427Generalized Assignment of Time-Sensitive Item Groups
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9428
We study the generalized assignment problem with time-sensitive item groups (chi-AGAP). It has central applications in advertisement placement on the Internet, and in virtual network embedding in Cloud data centers. We are given a set of items, partitioned into n groups, and a set of T identical bins (or, time-slots). Each group 1 <= j <= n has a time-window chi_j = [r_j, d_j]subseteq [T] in which it can be packed. Each item i in group j has a size s_i>0 and a non-negative utility u_{it} when packed into bin t in chi_j. A bin can accommodate at most one item from each group and the total size of the items in a bin cannot exceed its capacity. The goal is to find a feasible packing of a subset of the items in the bins such that the total utility from groups that are completely packed is maximized. Our main result is an Omega(1)-approximation algorithm for chi-AGAP. Our approximation technique relies on a non-trivial rounding of a configuration LP, which can be adapted to other common scenarios of resource allocation in Cloud data centers.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9428On Geodesically Convex Formulations for the Brascamp-Lieb Constant
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9429
We consider two non-convex formulations for computing the optimal constant in the Brascamp-Lieb inequality corresponding to a given datum and show that they are geodesically log-concave on the manifold of positive definite matrices endowed with the Riemannian metric corresponding to the Hessian of the log-determinant function. The first formulation is present in the work of Lieb [Lieb, 1990] and the second is new and inspired by the work of Bennett et al. [Bennett et al., 2008]. Recent work of Garg et al. [Ankit Garg et al., 2017] also implies a geodesically log-concave formulation of the Brascamp-Lieb constant through a reduction to the operator scaling problem. However, the dimension of the arising optimization problem in their reduction depends exponentially on the number of bits needed to describe the Brascamp-Lieb datum. The formulations presented here have dimensions that are polynomial in the bit complexity of the input datum.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9429Tensor Rank is Hard to Approximate
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9430
We prove that approximating the rank of a 3-tensor to within a factor of 1 + 1/1852 - delta, for any delta > 0, is NP-hard over any field. We do this via reduction from bounded occurrence 2-SAT.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9430An O(1)-Approximation Algorithm for Dynamic Weighted Vertex Cover with Soft Capacity
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9431
This study considers the soft capacitated vertex cover problem in a dynamic setting. This problem generalizes the dynamic model of the vertex cover problem, which has been intensively studied in recent years. Given a dynamically changing vertex-weighted graph G=(V,E), which allows edge insertions and edge deletions, the goal is to design a data structure that maintains an approximate minimum vertex cover while satisfying the capacity constraint of each vertex. That is, when picking a copy of a vertex v in the cover, the number of v's incident edges covered by the copy is up to a given capacity of v. We extend Bhattacharya et al.'s work [SODA'15 and ICALP'15] to obtain a deterministic primal-dual algorithm for maintaining a constant-factor approximate minimum capacitated vertex cover with O(log n / epsilon) amortized update time, where n is the number of vertices in the graph. The algorithm can be extended to (1) a more general model in which each edge is associated with a non-uniform and unsplittable demand, and (2) the more general capacitated set cover problem.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9431Fixed-Parameter Approximation Schemes for Weighted Flowtime
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9432
Given a set of n jobs with integral release dates, processing times and weights, it is a natural and important scheduling problem to compute a schedule that minimizes the sum of the weighted flow times of the jobs. There are strong lower bounds for the possible approximation ratios. In the non-preemptive case, even on a single machine the best known result is a O(sqrt{n})-approximation which is best possible. In the preemptive case on m identical machines there is a O(log min{n/m,P})-approximation (where P denotes the maximum job size) which is also best possible.We study the problem in the parametrized setting where our parameter k is an upper bound on the maximum (integral) processing time and weight of a job, a standard parameter for scheduling problems. We present a (1+epsilon)-approximation algorithm for the preemptive and the non-preemptive case of minimizing weighted flow time on m machines with a running time of f(k,epsilon,m)* n^{O(1)}, i.e., our combined parameters are k,epsilon, and m. Key to our results is to distinguish time intervals according to whether in the optimal solution the pending jobs have large or small total weight. Depending on this we employ dynamic programming, linear programming, greedy routines, or combinations of the latter to compute the schedule for each respective interval.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9432List-Decoding Homomorphism Codes with Arbitrary Codomains
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9433
The codewords of the homomorphism code aHom(G,H) are the affine homomorphisms between two finite groups, G and H, generalizing Hadamard codes. Following the work of Goldreich-Levin (1989), Grigorescu et al. (2006), Dinur et al. (2008), and Guo and Sudan (2014), we further expand the range of groups for which local list-decoding is possible up to mindist, the minimum distance of the code. In particular, for the first time, we do not require either G or H to be solvable. Specifically, we demonstrate a poly(1/epsilon) bound on the list size, i. e., on the number of codewords within distance (mindist-epsilon) from any received word, when G is either abelian or an alternating group, and H is an arbitrary (finite or infinite) group. We conjecture that a similar bound holds for all finite simple groups as domains; the alternating groups serve as the first test case.The abelian vs. arbitrary result permits us to adapt previous techniques to obtain efficient local list-decoding for this case. We also obtain efficient local list-decoding for the permutation representations of alternating groups (the codomain is a symmetric group) under the restriction that the domain G=A_n is paired with codomain H=S_m satisfying m < 2^{n-1}/sqrt{n}.The limitations on the codomain in the latter case arise from severe technical difficulties stemming from the need to solve the homomorphism extension (HomExt) problem in certain cases; these are addressed in a separate paper (Wuu 2018).We introduce an intermediate "semi-algorithmic" model we call Certificate List-Decoding that bypasses the HomExt bottleneck and works in the alternating vs. arbitrary setting. A certificate list-decoder produces partial homomorphisms that uniquely extend to the homomorphisms in the list. A homomorphism extender applied to a list of certificates yields the desired list.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9433Optimal Deterministic Extractors for Generalized Santha-Vazirani Sources
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9434
Let F be a finite alphabet and D be a finite set of distributions over F. A Generalized Santha-Vazirani (GSV) source of type (F, D), introduced by Beigi, Etesami and Gohari (ICALP 2015, SICOMP 2017), is a random sequence (F_1, ..., F_n) in F^n, where F_i is a sample from some distribution d in D whose choice may depend on F_1, ..., F_{i-1}.We show that all GSV source types (F, D) fall into one of three categories: (1) non-extractable; (2) extractable with error n^{-Theta(1)}; (3) extractable with error 2^{-Omega(n)}. We provide essentially randomness-optimal extraction algorithms for extractable sources. Our algorithm for category (2) sources extracts one bit with error epsilon from n = poly(1/epsilon) samples in time linear in n. Our algorithm for category (3) sources extracts m bits with error epsilon from n = O(m + log 1/epsilon) samples in time min{O(m2^m * n),n^{O(|F|)}}.We also give algorithms for classifying a GSV source type (F, D): Membership in category (1) can be decided in NP, while membership in category (3) is polynomial-time decidable.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9434Adaptive Lower Bound for Testing Monotonicity on the Line
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9435
In the property testing model, the task is to distinguish objects possessing some property from the objects that are far from it. One of such properties is monotonicity, when the objects are functions from one poset to another. This is an active area of research. In this paper we study query complexity of epsilon-testing monotonicity of a function f : [n]->[r]. All our lower bounds are for adaptive two-sided testers. - We prove a nearly tight lower bound for this problem in terms of r. The bound is Omega((log r)/(log log r)) when epsilon = 1/2. No previous satisfactory lower bound in terms of r was known.- We completely characterise query complexity of this problem in terms of n for smaller values of epsilon. The complexity is Theta(epsilon^{-1} log (epsilon n)). Apart from giving the lower bound, this improves on the best known upper bound.Finally, we give an alternative proof of the Omega(epsilon^{-1}d log n - epsilon^{-1}log epsilon^{-1}) lower bound for testing monotonicity on the hypergrid [n]^d due to Chakrabarty and Seshadhri (RANDOM'13).Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9435Swendsen-Wang Dynamics for General Graphs in the Tree Uniqueness Region
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9436
The Swendsen-Wang dynamics is a popular algorithm for sampling from the Gibbs distribution for the ferromagnetic Ising model on a graph G=(V,E). The dynamics is a "global" Markov chain which is conjectured to converge to equilibrium in O(|V|^{1/4}) steps for any graph G at any (inverse) temperature beta. It was recently proved by Guo and Jerrum (2017) that the Swendsen-Wang dynamics has polynomial mixing time on any graph at all temperatures, yet there are few results providing o(|V|) upper bounds on its convergence time.We prove fast convergence of the Swendsen-Wang dynamics on general graphs in the tree uniqueness region of the ferromagnetic Ising model. In particular, when beta < beta_c(d) where beta_c(d) denotes the uniqueness/non-uniqueness threshold on infinite d-regular trees, we prove that the relaxation time (i.e., the inverse spectral gap) of the Swendsen-Wang dynamics is Theta(1) on any graph of maximum degree d >= 3. Our proof utilizes a version of the Swendsen-Wang dynamics which only updates isolated vertices. We establish that this variant of the Swendsen-Wang dynamics has mixing time O(log{|V|}) and relaxation time Theta(1) on any graph of maximum degree d for all beta < beta_c(d). We believe that this Markov chain may be of independent interest, as it is a monotone Swendsen-Wang type chain. As part of our proofs, we provide modest extensions of the technology of Mossel and Sly (2013) for analyzing mixing times and of the censoring result of Peres and Winkler (2013). Both of these results are for the Glauber dynamics, and we extend them here to general monotone Markov chains. This class of dynamics includes for example the heat-bath block dynamics, for which we obtain new tight mixing time bounds.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9436Sampling in Uniqueness from the Potts and Random-Cluster Models on Random Regular Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9437
We consider the problem of sampling from the Potts model on random regular graphs. It is conjectured that sampling is possible when the temperature of the model is in the so-called uniqueness regime of the regular tree, but positive algorithmic results have been for the most part elusive. In this paper, for all integers q >= 3 and Delta >= 3, we develop algorithms that produce samples within error o(1) from the q-state Potts model on random Delta-regular graphs, whenever the temperature is in uniqueness, for both the ferromagnetic and antiferromagnetic cases.The algorithm for the antiferromagnetic Potts model is based on iteratively adding the edges of the graph and resampling a bichromatic class that contains the endpoints of the newly added edge. Key to the algorithm is how to perform the resampling step efficiently since bichromatic classes can potentially induce linear-sized components. To this end, we exploit the tree uniqueness to show that the average growth of bichromatic components is typically small, which allows us to use correlation decay algorithms for the resampling step. While the precise uniqueness threshold on the tree is not known for general values of q and Delta in the antiferromagnetic case, our algorithm works throughout uniqueness regardless of its value.In the case of the ferromagnetic Potts model, we are able to simplify the algorithm significantly by utilising the random-cluster representation of the model. In particular, we demonstrate that a percolation-type algorithm succeeds in sampling from the random-cluster model with parameters p,q on random Delta-regular graphs for all values of q >= 1 and p<p_c(q,Delta), where p_c(q,Delta) corresponds to a uniqueness threshold for the model on the Delta-regular tree. When restricted to integer values of q, this yields a simplified algorithm for the ferromagnetic Potts model on random Delta-regular graphs.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9437Polar Codes with Exponentially Small Error at Finite Block Length
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9438
We show that the entire class of polar codes (up to a natural necessary condition) converge to capacity at block lengths polynomial in the gap to capacity, while simultaneously achieving failure probabilities that are exponentially small in the block length (i.e., decoding fails with probability exp(-N^{Omega(1)}) for codes of length N). Previously this combination was known only for one specific family within the class of polar codes, whereas we establish this whenever the polar code exhibits a condition necessary for any polarization.Our results adapt and strengthen a local analysis of polar codes due to the authors with Nakkiran and Rudra [Proc. STOC 2018]. Their analysis related the time-local behavior of a martingale to its global convergence, and this allowed them to prove that the broad class of polar codes converge to capacity at polynomial block lengths. Their analysis easily adapts to show exponentially small failure probabilities, provided the associated martingale, the "Arikan martingale", exhibits a corresponding strong local effect. The main contribution of this work is a much stronger local analysis of the Arikan martingale. This leads to the general result claimed above.In addition to our general result, we also show, for the first time, polar codes that achieve failure probability exp(-N^{beta}) for any beta < 1 while converging to capacity at block length polynomial in the gap to capacity. Finally we also show that the "local" approach can be combined with any analysis of failure probability of an arbitrary polar code to get essentially the same failure probability while achieving block length polynomial in the gap to capacity.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9438Approximate Degree and the Complexity of Depth Three Circuits
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9439
Threshold weight, margin complexity, and Majority-of-Threshold circuit size are basic complexity measures of Boolean functions that arise in learning theory, communication complexity, and circuit complexity. Each of these measures might exhibit a chasm at depth three: namely, all polynomial size Boolean circuits of depth two have polynomial complexity under the measure, but there may exist Boolean circuits of depth three that have essentially maximal complexity exp(Theta(n)). However, existing techniques are far from showing this: for all three measures, the best lower bound for depth three circuits is exp(Omega(n^{2/5})). Moreover, prior methods exclusively study block-composed functions. Such methods appear intrinsically unable to prove lower bounds better than exp(Omega(sqrt{n})) even for depth four circuits, and have yet to prove lower bounds better than exp(Omega(sqrt{n})) for circuits of any constant depth.We take a step toward showing that all of these complexity measures indeed exhibit a chasm at depth three. Specifically, for any arbitrarily small constant delta > 0, we exhibit a depth three circuit of polynomial size (in fact, an O(log n)-decision list) of complexity exp(Omega(n^{1/2-delta})) under each of these measures.Our methods go beyond the block-composed functions studied in prior work, and hence may not be subject to the same barriers. Accordingly, we suggest natural candidate functions that may exhibit stronger bounds.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9439Speeding up Switch Markov Chains for Sampling Bipartite Graphs with Given Degree Sequence
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9440
We consider the well-studied problem of uniformly sampling (bipartite) graphs with a given degree sequence, or equivalently, the uniform sampling of binary matrices with fixed row and column sums. In particular, we focus on Markov Chain Monte Carlo (MCMC) approaches, which proceed by making small changes that preserve the degree sequence to a given graph. Such Markov chains converge to the uniform distribution, but the challenge is to show that they do so quickly, i.e., that they are rapidly mixing.The standard example of this Markov chain approach for sampling bipartite graphs is the switch algorithm, that proceeds by locally switching two edges while preserving the degree sequence. The Curveball algorithm is a variation on this approach in which essentially multiple switches (trades) are performed simultaneously, with the goal of speeding up switch-based algorithms. Even though the Curveball algorithm is expected to mix faster than switch-based algorithms for many degree sequences, nothing is currently known about its mixing time. On the other hand, the switch algorithm has been proven to be rapidly mixing for several classes of degree sequences.In this work we present the first results regarding the mixing time of the Curveball algorithm. We give a theoretical comparison between the switch and Curveball algorithms in terms of their underlying Markov chains. As our main result, we show that the Curveball chain is rapidly mixing whenever a switch-based chain is rapidly mixing. We do this using a novel state space graph decomposition of the switch chain into Johnson graphs. This decomposition is of independent interest.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9440Randomness Extraction in AC0 and with Small Locality
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9441
Randomness extractors, which extract high quality (almost-uniform) random bits from biased random sources, are important objects both in theory and in practice. While there have been significant progress in obtaining near optimal constructions of randomness extractors in various settings, the computational complexity of randomness extractors is still much less studied. In particular, it is not clear whether randomness extractors with good parameters can be computed in several interesting complexity classes that are much weaker than P.In this paper we study randomness extractors in the following two models of computation: (1) constant-depth circuits (AC^0), and (2) the local computation model. Previous work in these models, such as [Viola, 2005], [Goldreich et al., 2015] and [Bogdanov and Guo, 2013], only achieve constructions with weak parameters. In this work we give explicit constructions of randomness extractors with much better parameters. Our results on AC^0 extractors refute a conjecture in [Goldreich et al., 2015] and answer several open problems there. We also provide a lower bound on the error of extractors in AC^0, which together with the entropy lower bound in [Viola, 2005; Goldreich et al., 2015] almost completely characterizes extractors in this class. Our results on local extractors also significantly improve the seed length in [Bogdanov and Guo, 2013]. As an application, we use our AC^0 extractors to study pseudorandom generators in AC^0, and show that we can construct both cryptographic pseudorandom generators (under reasonable computational assumptions) and unconditional pseudorandom generators for space bounded computation with very good parameters.Our constructions combine several previous techniques in randomness extractors, as well as introduce new techniques to reduce or preserve the complexity of extractors, which may be of independent interest. These include (1) a general way to reduce the error of strong seeded extractors while preserving the AC^0 property and small locality, and (2) a seeded randomness condenser with small locality.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9441Boolean Function Analysis on High-Dimensional Expanders
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9442
We initiate the study of Boolean function analysis on high-dimensional expanders. We describe an analog of the Fourier expansion and of the Fourier levels on simplicial complexes, and generalize the FKN theorem to high-dimensional expanders.Our results demonstrate that a high-dimensional expanding complex X can sometimes serve as a sparse model for the Boolean slice or hypercube, and quite possibly additional results from Boolean function analysis can be carried over to this sparse model. Therefore, this model can be viewed as a derandomization of the Boolean slice, containing |X(k)|=O(n) points in comparison to binom{n}{k+1} points in the (k+1)-slice (which consists of all n-bit strings with exactly k+1 ones).Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9442Percolation of Lipschitz Surface and Tight Bounds on the Spread of Information Among Mobile Agents
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9443
We consider the problem of spread of information among mobile agents on the torus. The agents are initially distributed as a Poisson point process on the torus, and move as independent simple random walks. Two agents can share information whenever they are at the same vertex of the torus. We study the so-called flooding time: the amount of time it takes for information to be known by all agents. We establish a tight upper bound on the flooding time, and introduce a technique which we believe can be applicable to analyze other processes involving mobile agents.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9443Flipping out with Many Flips: Hardness of Testing k-Monotonicity
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9444
A function f:{0,1}^n - > {0,1} is said to be k-monotone if it flips between 0 and 1 at most k times on every ascending chain. Such functions represent a natural generalization of (1-)monotone functions, and have been recently studied in circuit complexity, PAC learning, and cryptography. Our work is part of a renewed focus in understanding testability of properties characterized by freeness of arbitrary order patterns as a generalization of monotonicity. Recently, Canonne et al. (ITCS 2017) initiate the study of k-monotone functions in the area of property testing, and Newman et al. (SODA 2017) study testability of families characterized by freeness from order patterns on real-valued functions over the line [n] domain.We study k-monotone functions in the more relaxed parametrized property testing model, introduced by Parnas et al. (JCSS, 72(6), 2006). In this process we resolve a problem left open in previous work. Specifically, our results include the following. 1) Testing 2-monotonicity on the hypercube non-adaptively with one-sided error requires an exponential in sqrt{n} number of queries. This behavior shows a stark contrast with testing (1-)monotonicity, which only needs O~(sqrt{n}) queries (Khot et al. (FOCS 2015)). Furthermore, even the apparently easier task of distinguishing 2-monotone functions from functions that are far from being n^{.01}-monotone also requires an exponential number of queries.2) On the hypercube [n]^d domain, there exists a testing algorithm that makes a constant number of queries and distinguishes functions that are k-monotone from functions that are far from being O(kd^2) -monotone. Such a dependency is likely necessary, given the lower bound above for the hypercube.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9444How Long Can Optimal Locally Repairable Codes Be?
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9445
A locally repairable code (LRC) with locality r allows for the recovery of any erased codeword symbol using only r other codeword symbols. A Singleton-type bound dictates the best possible trade-off between the dimension and distance of LRCs - an LRC attaining this trade-off is deemed optimal. Such optimal LRCs have been constructed over alphabets growing linearly in the block length. Unlike the classical Singleton bound, however, it was not known if such a linear growth in the alphabet size is necessary, or for that matter even if the alphabet needs to grow at all with the block length. Indeed, for small code distances 3,4, arbitrarily long optimal LRCs were known over fixed alphabets. Here, we prove that for distances d >=slant 5, the code length n of an optimal LRC over an alphabet of size q must be at most roughly O(d q^3). For the case d=5, our upper bound is O(q^2). We complement these bounds by showing the existence of optimal LRCs of length Omega_{d,r}(q^{1+1/floor[(d-3)/2]}) when d <=slant r+2. Our bounds match when d=5, pinning down n=Theta(q^2) as the asymptotically largest length of an optimal LRC for this case.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9445On Minrank and Forbidden Subgraphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9446
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9446Preserving Randomness for Adaptive Algorithms
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9447
Suppose Est is a randomized estimation algorithm that uses n random bits and outputs values in R^d. We show how to execute Est on k adaptively chosen inputs using only n + O(k log(d + 1)) random bits instead of the trivial nk (at the cost of mild increases in the error and failure probability). Our algorithm combines a variant of the INW pseudorandom generator [Impagliazzo et al., 1994] with a new scheme for shifting and rounding the outputs of Est. We prove that modifying the outputs of Est is necessary in this setting, and furthermore, our algorithm's randomness complexity is near-optimal in the case d <= O(1). As an application, we give a randomness-efficient version of the Goldreich-Levin algorithm; our algorithm finds all Fourier coefficients with absolute value at least theta of a function F: {0, 1}^n -> {-1, 1} using O(n log n) * poly(1/theta) queries to F and O(n) random bits (independent of theta), improving previous work by Bshouty et al. [Bshouty et al., 2004].Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9447Commutative Algorithms Approximate the LLL-distribution
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9448
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9448The Cover Time of a Biased Random Walk on a Random Regular Graph of Odd Degree
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9449
We consider a random walk process, introduced by Orenshtein and Shinkar [Tal Orenshtein and Igor Shinkar, 2014], which prefers to visit previously unvisited edges, on the random r-regular graph G_r for any odd r >= 3. We show that this random walk process has asymptotic vertex and edge cover times 1/(r-2)n log n and r/(2(r-2))n log n, respectively, generalizing the result from [Cooper et al., to appear] from r = 3 to any larger odd r. This completes the study of the vertex cover time for fixed r >= 3, with [Petra Berenbrink et al., 2015] having previously shown that G_r has vertex cover time asymptotic to rn/2 when r >= 4 is even.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9449Satisfiability and Derandomization for Small Polynomial Threshold Circuits
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9450
A polynomial threshold function (PTF) is defined as the sign of a polynomial p : {0,1}^n ->R. A PTF circuit is a Boolean circuit whose gates are PTFs. We study the problems of exact and (promise) approximate counting for PTF circuits of constant depth. - Satisfiability (#SAT). We give the first zero-error randomized algorithm faster than exhaustive search that counts the number of satisfying assignments of a given constant-depth circuit with a super-linear number of wires whose gates are s-sparse PTFs, for s almost quadratic in the input size of the circuit; here a PTF is called s-sparse if its underlying polynomial has at most s monomials. More specifically, we show that, for any large enough constant c, given a depth-d circuit with (n^{2-1/c})-sparse PTF gates that has at most n^{1+epsilon_d} wires, where epsilon_d depends only on c and d, the number of satisfying assignments of the circuit can be computed in randomized time 2^{n-n^{epsilon_d}} with zero error. This generalizes the result by Chen, Santhanam and Srinivasan (CCC, 2016) who gave a SAT algorithm for constant-depth circuits of super-linear wire complexity with linear threshold function (LTF) gates only. - Quantified derandomization. The quantified derandomization problem, introduced by Goldreich and Wigderson (STOC, 2014), asks to compute the majority value of a given Boolean circuit, under the promise that the minority-value inputs to the circuit are very few. We give a quantified derandomization algorithm for constant-depth PTF circuits with a super-linear number of wires that runs in quasi-polynomial time. More specifically, we show that for any sufficiently large constant c, there is an algorithm that, given a degree-Delta PTF circuit C of depth d with n^{1+1/c^d} wires such that C has at most 2^{n^{1-1/c}} minority-value inputs, runs in quasi-polynomial time exp ((log n)^{O (Delta^2)}) and determines the majority value of C. (We obtain a similar quantified derandomization result for PTF circuits with n^{Delta}-sparse PTF gates.) This extends the recent result of Tell (STOC, 2018) for constant-depth LTF circuits of super-linear wire complexity. - Pseudorandom generators. We show how the classical Nisan-Wigderson (NW) generator (JCSS, 1994) yields a nontrivial pseudorandom generator for PTF circuits (of unrestricted depth) with sub-linearly many gates. As a corollary, we get a PRG for degree-Delta PTFs with the seed length exp (sqrt{Delta * log n})* log^2(1/epsilon).Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9450High Order Random Walks: Beyond Spectral Gap
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9451
We study high order random walks on high dimensional expanders on simplicial complexes (i.e., hypergraphs). These walks walk from a k-face (i.e., a k-hyperedge) to a k-face if they are both contained in a k+1-face (i.e, a k+1 hyperedge). This naturally generalizes the random walks on graphs that walk from a vertex (0-face) to a vertex if they are both contained in an edge (1-face).Recent works have studied the spectrum of high order walks operators and deduced fast mixing. However, the spectral gap of high order walks operators is inherently small, due to natural obstructions (called coboundaries) that do not happen for walks on expander graphs.In this work we go beyond spectral gap, and relate the expansion of a function on k-faces (called k-cochain, for k=0, this is a function on vertices) to its structure.We show a Decomposition Theorem: For every k-cochain defined on high dimensional expander, there exists a decomposition of the cochain into i-cochains such that the square norm of the k-cochain is a sum of the square norms of the i-chains and such that the more weight the k-cochain has on higher levels of the decomposition the better is its expansion, or equivalently, the better is its shrinkage by the high order random walk operator. The following corollaries are implied by the Decomposition Theorem: - We characterize highly expanding k-cochains as those whose mass is concentrated on the highest levels of the decomposition that we construct. For example, a function on edges (i.e. a 1-cochain) which is locally thin (i.e. it contains few edges through every vertex) is highly expanding, while a function on edges that contains all edges through a single vertex is not highly expanding. - We get optimal mixing for high order random walks on Ramanujan complexes. Ramanujan complexes are recently discovered bounded degree high dimensional expanders. The optimality in their mixing that we prove here, enable us to get from them more efficient Two-Layer-Samplers than those presented by the previous work of Dinur and Kaufman.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9451Improved Composition Theorems for Functions and Relations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9452
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9452Round Complexity Versus Randomness Complexity in Interactive Proofs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9453
Consider an interactive proof system for some set S that has randomness complexity r(n) for instances of length n, and arbitrary round complexity. We show a public-coin interactive proof system for S of round complexity O(r(n)/log n). Furthermore, the randomness complexity is preserved up to a constant factor, and the resulting interactive proof system has perfect completeness.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9453Improved List-Decodability of Random Linear Binary Codes
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9454
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9454Sunflowers and Quasi-Sunflowers from Randomness Extractors
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9455
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9455Torpid Mixing of Markov Chains for the Six-vertex Model on Z^2
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9456
In this paper, we study the mixing time of two widely used Markov chain algorithms for the six-vertex model, Glauber dynamics and the directed-loop algorithm, on the square lattice Z^2. We prove, for the first time that, on finite regions of the square lattice these Markov chains are torpidly mixing under parameter settings in the ferroelectric phase and the anti-ferroelectric phase.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9456On the Testability of Graph Partition Properties
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9457
In this work we study the testability of a family of graph partition properties that generalizes a family previously studied by Goldreich, Goldwasser, and Ron (Journal of the ACM, 1998 ). While the family studied by Goldreich, Goldwasser, and Ron includes a variety of natural properties, such as k-colorability and containing a large cut, it does not include other properties of interest, such as split graphs, and more generally (p,q)-colorable graphs. The generalization we consider allows us to impose constraints on the edge-densities within and between parts (relative to the sizes of the parts). We denote the family studied in this work by GPP.We first show that all properties in GPP have a testing algorithm whose query complexity is polynomial in 1/epsilon, where epsilon is the given proximity parameter (and there is no dependence on the size of the graph). As the testing algorithm has two-sided error, we next address the question of which properties in GPP can be tested with one-sided error and query complexity polynomial in 1/epsilon. We answer this question by establishing a characterization result. Namely, we define a subfamily GPP_{0,1} of GPP and show that a property P in GPP is testable by a one-sided error algorithm that has query complexity poly(1/epsilon) if and only if P in GPP_{0,1}.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9457On Closeness to k-Wise Uniformity
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9458
A probability distribution over {-1, 1}^n is (epsilon, k)-wise uniform if, roughly, it is epsilon-close to the uniform distribution when restricted to any k coordinates. We consider the problem of how far an (epsilon, k)-wise uniform distribution can be from any globally k-wise uniform distribution. We show that every (epsilon, k)-wise uniform distribution is O(n^{k/2}epsilon)-close to a k-wise uniform distribution in total variation distance. In addition, we show that this bound is optimal for all even k: we find an (epsilon, k)-wise uniform distribution that is Omega(n^{k/2}epsilon)-far from any k-wise uniform distribution in total variation distance. For k=1, we get a better upper bound of O(epsilon), which is also optimal.One application of our closeness result is to the sample complexity of testing whether a distribution is k-wise uniform or delta-far from k-wise uniform. We give an upper bound of O(n^{k}/delta^2) (or O(log n/delta^2) when k = 1) on the required samples. We show an improved upper bound of O~(n^{k/2}/delta^2) for the special case of testing fully uniform vs. delta-far from k-wise uniform. Finally, we complement this with a matching lower bound of Omega(n/delta^2) when k = 2.Our results improve upon the best known bounds from [Alon et al., 2007], and have simpler proofs.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9458Pseudo-Derandomizing Learning and Approximation
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9459
We continue the study of pseudo-deterministic algorithms initiated by Gat and Goldwasser [Eran Gat and Shafi Goldwasser, 2011]. A pseudo-deterministic algorithm is a probabilistic algorithm which produces a fixed output with high probability. We explore pseudo-determinism in the settings of learning and approximation. Our goal is to simulate known randomized algorithms in these settings by pseudo-deterministic algorithms in a generic fashion - a goal we succinctly term pseudo-derandomization. Learning. In the setting of learning with membership queries, we first show that randomized learning algorithms can be derandomized (resp. pseudo-derandomized) under the standard hardness assumption that E (resp. BPE) requires large Boolean circuits. Thus, despite the fact that learning is an algorithmic task that requires interaction with an oracle, standard hardness assumptions suffice to (pseudo-)derandomize it. We also unconditionally pseudo-derandomize any {quasi-polynomial} time learning algorithm for polynomial size circuits on infinitely many input lengths in sub-exponential time.Next, we establish a generic connection between learning and derandomization in the reverse direction, by showing that deterministic (resp. pseudo-deterministic) learning algorithms for a concept class C imply hitting sets against C that are computable deterministically (resp. pseudo-deterministically). In particular, this suggests a new approach to constructing hitting set generators against AC^0[p] circuits by giving a deterministic learning algorithm for AC^0[p]. Approximation. Turning to approximation, we unconditionally pseudo-derandomize any poly-time randomized approximation scheme for integer-valued functions infinitely often in subexponential time over any samplable distribution on inputs. As a corollary, we get that the (0,1)-Permanent has a fully pseudo-deterministic approximation scheme running in sub-exponential time infinitely often over any samplable distribution on inputs.Finally, we {investigate} the notion of approximate canonization of Boolean circuits. We use a connection between pseudodeterministic learning and approximate canonization to show that if BPE does not have sub-exponential size circuits infinitely often, then there is a pseudo-deterministic approximate canonizer for AC^0[p] computable in quasi-polynomial time.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9459Luby-Velickovic-Wigderson Revisited: Improved Correlation Bounds and Pseudorandom Generators for Depth-Two Circuits
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9460
Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9460Randomly Coloring Graphs of Logarithmically Bounded Pathwidth
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9461
We consider the problem of sampling a proper k-coloring of a graph of maximal degree Delta uniformly at random. We describe a new Markov chain for sampling colorings, and show that it mixes rapidly on graphs of logarithmically bounded pathwidth if k >=(1+epsilon)Delta, for any epsilon>0, using a hybrid paths argument.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9461Explicit Strong LTCs with Inverse Poly-Log Rate and Constant Soundness
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9462
An error-correcting code C subseteq F^n is called (q,epsilon)-strong locally testable code (LTC) if there exists a tester that makes at most q queries to the input word. This tester accepts all codewords with probability 1 and rejects all non-codewords x not in C with probability at least epsilon * delta(x,C), where delta(x,C) denotes the relative Hamming distance between the word x and the code C. The parameter q is called the query complexity and the parameter epsilon is called soundness.Goldreich and Sudan (J.ACM 2006) asked about the existence of strong LTCs with constant query complexity, constant relative distance, constant soundness and inverse polylogarithmic rate. They also asked about the explicit constructions of these codes.Strong LTCs with the required range of parameters were obtained recently in the works of Viderman (CCC 2013, FOCS 2013) based on the papers of Meir (SICOMP 2009) and Dinur (J.ACM 2007). However, the construction of these codes was probabilistic.In this work we show that codes presented in the works of Dinur (J.ACM 2007) and Ben-Sasson and Sudan (SICOMP 2005) provide the explicit construction of strong LTCs with the above range of parameters. Previously, such codes were proven to be weak LTCs. Using the results of Viderman (CCC 2013, FOCS 2013) we prove that such codes are, in fact, strong LTCs.Thu, 02 Aug 2018 11:43:18 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9462Front Matter, Table of Contents, Preface, Conference Organization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9328
Front Matter, Table of Contents, Preface, Conference OrganizationMon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9328Early Detection of Herding Behaviour during Emergency Evacuations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9329
Social scientists have observed a number of irrational behaviours during emergency evacuations, caused by a range of possible cognitive biases. One such behaviour is herding - people following and trusting others to guide them, when they do not know where the nearest exit is. This behaviour may lead to safety under a knowledgeable leader, but can also lead to dead-ends. We present a method for the automatic early detection of herding behaviour to avoid suboptimal evacuations. The method comprises three steps: (i) people clusters identification during evacuation, (ii) collection of clusters' spatio-temporal information to extract features for describing cluster behaviour, and (iii) unsupervised learning classification of clusters' behaviour into 'benign' or 'harmful' herding. Results using a set of different detection scores show accuracies higher than baselines in identifying harmful behaviour; thus, laying the ground for timely irrational behaviour detection to increase the performance of emergency evacuation systems.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9329What Makes Spatial Data Big? A Discussion on How to Partition Spatial Data
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9330
The amount of available spatial data has significantly increased in the last years so that traditional analysis tools have become inappropriate to effectively manage them. Therefore, many attempts have been made in order to define extensions of existing MapReduce tools, such as Hadoop or Spark, with spatial capabilities in terms of data types and algorithms. Such extensions are mainly based on the partitioning techniques implemented for textual data where the dimension is given in terms of the number of occupied bytes. However, spatial data are characterized by other features which describe their dimension, such as the number of vertices or the MBR size of geometries, which greatly affect the performance of operations, like the spatial join, during data analysis. The result is that the use of traditional partitioning techniques prevents to completely exploit the benefit of the parallel execution provided by a MapReduce environment. This paper extensively analyses the problem considering the spatial join operation as use case, performing both a theoretical and an experimental analysis for it. Moreover, it provides a solution based on a different partitioning technique, which splits complex or extensive geometries. Finally, we validate the proposed solution by means of some experiments on synthetic and real datasets.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9330Intersections of Our World
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9331
There are several situations where the type of a street intersections can become very important, especially in the case of navigation studies. The types of intersections affect the route complexity and this has to be accounted for, e.g., already during the experimental design phase of a navigation study. In this work we introduce a formal definition for intersection types and present a framework that allows for extracting information about the intersections of our planet. We present a case study that demonstrates the importance and necessity of being able to extract this information.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9331Considerations of Graphical Proximity and Geographical Nearness
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9332
"Near things are more similar than more distant things" states Tobler's first law of geography. This seems obvious and is part to much cognitive research into the perception of the environment. The statement's validity for assessments of geographical nearness purely from map symbols has yet to be ascertained. This paper considers this issue through a theoretical framework grounded in Gestalt concepts, behavioral ecological psychology and information psychology. It sets out to consider how influential experience or training may be on the association of graphical proximity with geographical nearness. A pilot study presents some initial findings. The findings regarding the influence of experience or training are ambiguous, but point to the rapid acquisition of affordances in the survey instruments as another factor for future research.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9332An Empirical Study on the Names of Points of Interest and Their Changes with Geographic Distance
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9333
While Points Of Interest (POIs), such as restaurants, hotels, and barber shops, are part of urban areas irrespective of their specific locations, the names of these POIs often reveal valuable information related to local culture, landmarks, influential families, figures, events, and so on. Place names have long been studied by geographers, e.g., to understand their origins and relations to family names. However, there is a lack of large-scale empirical studies that examine the localness of place names and their changes with geographic distance. In addition to enhancing our understanding of the coherence of geographic regions, such empirical studies are also significant for geographic information retrieval where they can inform computational models and improve the accuracy of place name disambiguation. In this work, we conduct an empirical study based on 112,071 POIs in seven US metropolitan areas extracted from an open Yelp dataset. We propose to adopt term frequency and inverse document frequency in geographic contexts to identify local terms used in POI names and to analyze their usages across different POI types. Our results show an uneven usage of local terms across POI types, which is highly consistent among different geographic regions. We also examine the decaying effect of POI name similarity with the increase of distance among POIs. While our analysis focuses on urban POI names, the presented methods can be generalized to other place types as well, such as mountain peaks and streets.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9333Outlier Detection and Comparison of Origin-Destination Flows Using Data Depth
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9334
Advances in location-aware technology have resulted in massive trajectory data. Origin-destination (OD) trajectories provide rich information on urban flow and transport demand. This study describes a new method for detecting OD flows outliers and conducting hypothesis testing between two OD flow datasets in terms of the variations of spatial extent, that is, spread. The proposed method is based on data depth, which measures the centrality and outlyingness of a point with respect to a given dataset in R^d. Based on the center-outward ordering property, the proposed method analyzes the underlying characteristics of OD flows, such as location, outlyingness, and spread. The ability of the method to detect OD anomalies is compared with that of the Mahalanobis distance approach, and an F-test is used to verify the difference in scale. Empirical evaluation has demonstrated that our method effectively identifies OD flows outliers in an interactive way. Furthermore, the method can provide new perspectives such as spatial extent by considering the overall structure of data when comparing two different OD flows in terms of scale.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9334Is Salience Robust? A Heterogeneity Analysis of Survey Ratings
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9335
Differing weights for salience subdimensions (e.g. visual or structural salience) have been suggested since the early days of salience models in GIScience. Up until now, however, it remains unclear whether weights found in studies are robust across environments, objects and observers. In this study we examine the robustness of a survey-based salience model. Based on ratings of N_{o}=720 objects by N_{p}=250 different participants collected in-situ in two different European cities (Regensburg and Augsburg) we conduct a heterogeneity analysis taking into account environment and sense of direction stratified by gender. We find, first, empirical evidence that our model is invariant across environments, i.e. the strength of the relationships between the subdimensions of salience does not differ significantly. The structural model coefficients found can, hence, be used to calculate values for overall salience across different environments. Second, we provide empirical evidence that invariance of our measurement model is partly not given with respect to both, gender and sense of direction. These compositional invariance problems are a strong indicator for personal aspects playing an important role.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9335Labeling Points of Interest in Dynamic Maps using Disk Labels
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9336
Dynamic maps which support panning, rotating and zooming are available on every smartphone today. To label geographic features on these maps such that the user is presented with a consistent map view even on map interaction is a challenge. We are presenting a map labeling scheme, which allows to label maps at an interactive speed. For any possible map rotation the computed labeling remains free of intersections between labels. It is not required to remove labels from the map view to ensure this. The labeling scheme supports map panning and continuous zooming. During zooming a label appears and disappears only once. When zooming out of the map a label disappears only if it may overlap an equally or more important label in an arbitrary map rotation. This guarantees that more important labels are preferred to less important labels on small scale maps. We are presenting some extensions to the labeling that could be used for more sophisticated labeling features such as area labels turning into point labels at smaller map scales.The proposed labeling scheme relies on a preprocessing phase. In this phase for each label the map scale where it is removed from the map view is computed. During the phase of map presentation the precomputed label set must only be filtered, what can be done very fast. We are presenting some hints that allow to efficiently compute the labeling in the preprocessing phase. Using these a labeling of about 11 million labels can be computed in less than 20 minutes. We are also presenting a datastructure to efficiently filter the precomputed label set in the interaction phase.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9336Improving Discovery of Open Civic Data
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9337
We describe a method and system design for improved data discovery in an integrated network of open geospatial data that supports collaborative policy development between governments and local constituents. Metadata about civic data (such as thematic categories, user-generated tags, geo-references, or attribute schemata) primarily rely on technical vocabularies that reflect scientific or organizational hierarchies. By contrast, public consumers of data often search for information using colloquial terminology that does not align with official metadata vocabularies. For example, citizens searching for data about bicycle collisions in an area are unlikely to use the search terms with which organizations like Departments of Transportation describe relevant data. Users may also search with broad terms, such as "traffic safety", and will then not discover data tagged with narrower official terms, such as "vehicular crash". This mismatch raises the question of how to bridge the users' ways of talking and searching with the language of technical metadata. In similar situations, it has been beneficial to augment official metadata with semantic annotations that expand the discoverability and relevance recommendations of data, supporting more inclusive access. Adopting this strategy, we develop a method for automated semantic annotation, which aggregates similar thematic and geographic information. A novelty of our approach is the development and application of a crosscutting base vocabulary that supports the description of geospatial themes. The resulting annotation method is integrated into a novel open access collaboration platform (Esri's ArcGIS Hub) that supports public dissemination of civic data and is in use by thousands of government agencies. Our semantic annotation method improves data discovery for users across organizational repositories and has the potential to facilitate the coordination of community and organizational work, improving the transparency and efficacy of government policies.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9337Local Co-location Pattern Detection: A Summary of Results
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9338
Given a set of spatial objects of different features (e.g., mall, hospital) and a spatial relation (e.g., geographic proximity), the problem of local co-location pattern detection (LCPD) pairs co-location patterns and localities such that the co-location patterns tend to exist inside the paired localities. A co-location pattern is a set of spatial features, the objects of which are often related to each other. Local co-location patterns are common in many fields, such as public security, and public health. For example, assault crimes and drunk driving events co-locate near bars. The problem is computationally challenging because of the exponential number of potential co-location patterns and candidate localities. The related work applies data-unaware or clustering heuristics to partition the study area, which results in incomplete enumeration of possible localities. In this study, we formally defined the LCPD problem where the candidate locality was defined using minimum orthogonal bounding rectangles (MOBRs). Then, we proposed a Quadruplet & Grid Filter-Refine (QGFR) algorithm that leveraged an MOBR enumeration lemma, and a novel upper bound on the participation index to efficiently prune the search space. The experimental evaluation showed that the QGFR algorithm reduced the computation cost substantially. One case study using the North American Atlas-Hydrography and U.S. Major City Datasets was conducted to discover local co-location patterns which would be missed if the entire dataset was analyzed or methods proposed by the related work were applied.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9338Detection and Localization of Traffic Signals with GPS Floating Car Data and Random Forest
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9339
As Floating Car Data are becoming increasingly available, in recent years many research works focused on leveraging them to infer road map geometry, topology and attributes. In this paper, we present an algorithm, relying on supervised learning to detect and localize traffic signals based on the spatial distribution of vehicle stop points. Our main contribution is to provide a single framework to address both problems. The proposed method has been experimented with a one-month dataset of real-world GPS traces, collected on the road network of Mitaka (Japan). The results show that this method provides accurate results in terms of localization and performs advantageously compared to the OpenStreetMap database in exhaustivity. Among many potential applications, the output predictions may be used as a prior map and/or combined with other sources of data to guide autonomous vehicles.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9339Heterogeneous Skeleton for Summarizing Continuously Distributed Demand in a Region
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9340
There has long been interest in the skeleton of a spatial object in GIScience. The reasons for this are many, as it has proven to be an extremely useful summary and explanatory representation of complex objects. While much research has focused on issues of computational complexity and efficiency in extracting the skeletal and medial axis representations as well as interpreting the final product, little attention has been paid to fundamental assumptions about the underlying object. This paper discusses the implied assumption of homogeneity associated with methods for deriving a skeleton. Further, it is demonstrated that addressing heterogeneity complicates both the interpretation and identification of a meaningful skeleton. The heterogeneous skeleton is introduced and formalized, along with a method for its identification. Application results are presented to illustrate the heterogeneous skeleton and provides comparative contrast to homogeneity assumptions.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9340A Network Flow Model for the Analysis of Green Spaces in Urban Areas
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9341
Green spaces in urban areas offer great possibilities of recreation, provided that they are easily accessible. Therefore, an ideal city should offer large green spaces close to where its residents live. Although there are several measures for the assessment of urban green spaces, the existing measures usually focus either on the total size of green spaces or on their accessibility. Hence, in this paper, we present a new methodology for assessing green-space provision and accessibility in an integrated way. The core of our methodology is an algorithm based on linear programming that computes an optimal assignment between residential areas and green spaces. In a basic setting, it assigns a green space of a prescribed size exclusively to each resident such that the average distance between residents and assigned green spaces is minimized. We contribute a detailed presentation on how to engineer an assignment-based method such that it yields reasonable results (e.g., by considering distances in the road network) and becomes efficient enough for the analysis of large metropolitan areas (e.g., we were able to process an instance of Berlin with about 130000 polygons representing green spaces, 18000 polygons representing residential areas, and 6 million road segments). Furthermore, we show that the optimal assignments resulting from our method enable a subsequent analysis that reveals both interesting global properties of a city as well as spatial patterns. For example, our method allows us to identify neighborhoods with a shortage of green spaces, which will help spatial planners in their decision making.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9341Continuous Obstructed Detour Queries
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9342
In this paper, we introduce Continuous Obstructed Detour (COD) Queries, a novel query type in spatial databases. COD queries continuously return the nearest point of interests (POIs) such as a restaurant, an ATM machine and a pharmacy with respect to the current location and the fixed destination of a moving pedestrian in presence of obstacles like a fence, a lake or a private building. The path towards a destination is typically not predetermined and the nearest POIs can change over time with the change of a pedestrian's current location towards a fixed destination. The distance to a POI is measured as the summation of the obstructed distance from the pedestrian's current location to the POI and the obstructed distance from the POI to the pedestrian's destination. Evaluating the query for every change of a pedestrian's location would incur extremely high processing overhead. We develop an efficient solution for COD queries and verify the effectiveness and efficiency of our solution in experiments.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9342Enhanced Multi Criteria Decision Analysis for Planning Power Transmission Lines
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9343
The energy transition towards alternative energy sources requires new power transmission lines to connect these additional energy production plants with electricity distribution centers. For this reason, Multi Criteria Decision Analysis (MCDA) offers a useful approach to determine the optimal path of future transmission lines with minimum impact on the environment, on the landscape, and on affected citizens. As objections could deteriorate such a project and in turn increase costs, transparent communication regarding the planning procedure is required that fosters citizens' acceptance. In this context, GIS-based information on the criteria taken into account and for modeling possible power transmission lines is essential. However, planners often forget that the underlying multi criteria decision model and the used data might lead to biased results. Therefore, this study empirically investigates the effect of various MCDA parameters by applying a sensitivity analysis on a multi criteria decision model. The output of this analysis is evaluated combining a Cluster Analysis, a Principal Component Analysis, and a Multivariate Analysis of Variance. Our results indicate that the variability of different corridor alternatives can be increased by using different MCDA parameter combinations. In particular, we found that applying continuous boundary models on areas leads to more distinct corridor alternatives than using a sharp-edged model, and better reflects actual planning practice for protecting areas against transmission lines. Comparing the results of two study areas, we conclude that our decision model behaved similarly across both sites and, hence, that the proposed procedure for enhancing the decision model is applicable to other study areas with comparable topographies. These results can help decision-makers and transmission line planners in simplifying and improving their decision models in order to increase credibility, legitimacy, and thus practical applicability.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9343FUTURES-AMR: Towards an Adaptive Mesh Refinement Framework for Geosimulations
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9344
Adaptive Mesh Refinement (AMR) is a computational technique used to reduce the amount of computation and memory required in scientific simulations. Geosimulations are scientific simulations using geographic data, routinely used to predict outcomes of urbanization in urban studies. However, the lack of support for AMR techniques with geosimulations limits exploring prediction outcomes at multiple resolutions. In this paper, we propose an adaptive mesh refinement framework FUTURES-AMR, based on static user-defined policies to enable multi-resolution geosimulations. We develop a prototype for the cellular automaton based urban growth simulation FUTURES by exploiting static and dynamic mesh refinement techniques in conjunction with the Patch Growing Algorithm (PGA). While, the static refinement technique supports a statically defined fixed resolution mesh simulation at a location, the dynamic refinement technique supports dynamically refining the resolution based on simulation outcomes at runtime. Further, we develop two approaches - asynchronous AMR and synchronous AMR, suitable for parallel execution in a distributed computing environment with varying support for solution integration of the multi-resolution results. Finally, using the FUTURES-AMR framework with different policies in an urban study, we demonstrate reduced execution time, and low memory overhead for a multi-resolution simulation.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9344xNet+SC: Classifying Places Based on Images by Incorporating Spatial Contexts
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9345
With recent advancements in deep convolutional neural networks, researchers in geographic information science gained access to powerful models to address challenging problems such as extracting objects from satellite imagery. However, as the underlying techniques are essentially borrowed from other research fields, e.g., computer vision or machine translation, they are often not spatially explicit. In this paper, we demonstrate how utilizing the rich information embedded in spatial contexts (SC) can substantially improve the classification of place types from images of their facades and interiors. By experimenting with different types of spatial contexts, namely spatial relatedness, spatial co-location, and spatial sequence pattern, we improve the accuracy of state-of-the-art models such as ResNet - which are known to outperform humans on the ImageNet dataset - by over 40%. Our study raises awareness for leveraging spatial contexts and domain knowledge in general in advancing deep learning models, thereby also demonstrating that theory-driven and data-driven approaches are mutually beneficial.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9345A Critical Look at Cryptogovernance of the Real World: Challenges for Spatial Representation and Uncertainty on the Blockchain (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9346
Innovation in distributed ledger technologies-blockchains and smart contracts-has been lauded as a game-changer for environmental governance and transparency. Here we critically consider how problems related to spatial representation and uncertainty complicate the picture, focusing on two cases. The first regards the impact of uncertainty on the transfer of spatial assets, and the second regards its impact on smart contract code that relies on software oracles that report sensor measurements of the physical world. Cryptogovernance of the environment will require substantial research on both these fronts if it is to become a reality.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9346Towards Optimal Deployment of a Sensor Network in a 3D Indoor Environment for the Mobility of People with Disabilities (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9347
Mobility of people with disabilities is one of the most important challenges for their social integration. There have been significant effort to develop assistive technologies to guide the PWD during their mobility in recent years. However, these technologies have limitations when it comes to the navigation and guidance of these people through accessible routes. This is specifically problematic in indoor environments where detection, location and tracking of people, and other dynamic objects that may limit the mobility of these people, are very challenging. Thus, many researches have leveraged the use of sensors to track users and dynamic objects in indoor environments. However, in most of the described methods, the sensors are manually deployed. Due to the complexity of indoor environments, the diversity of sensors and their sensing models, as well as the diversity of the profiles of people with disabilities and their needs during their mobility, the optimal deployment of a sensor network is a challenging task. There exist several optimization methods to maximize coverage and minimize the number of sensors while maintaining the minimum connectivity between the sensor nodes in a network. Most of the current sensor network optimization methods oversimplify the environment and do not consider the complexity of 3D indoor environments. In this paper, we propose a novel 3D local optimization algorithm based on a geometric spatial data structure that takes into account some of these complexities for the purpose of helping PWD in their mobility in 3D indoor environments such as shopping centers, museums and other public buildings.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9347Challenges in Creating an Annotated Set of Geospatial Natural Language Descriptions (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9348
In order to extract and map location information from natural language descriptions, a first step is to identify different language elements within the descriptions. In this paper, we describe a method and discuss the challenges faced in creating an annotated set of geospatial natural language descriptions using manual tagging, with the purpose of supporting validation and machine learning approaches to annotation and text interpretation.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9348Improved and More Complete Conceptual Model for the Revision of IndoorGML (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9349
With the increasing number of indoor navigation applications, it is essential to have clear and complete conceptual model (in the form of UML class diagram) for IndoorGML. The current version of IndoorGML standard has an incomplete class diagram (incomplete w.r.t. attributes, of which some are appearing in the XML/GML schema), and that provides confusion for the users of the standard. Furthermore, there are some issues related to unclear association names, unclear class names, classes that related to the Primal space and the Dual space, code lists not specific per type (which should have their own code list values), untyped relationships to external object classes, and semantically overlapping classes. In this paper, we propose an enhancement for IndoorGML conceptual model (UML class diagram) to avoid the misunderstanding. We propose a conceptual model that maps the classes of the standard in a better way. This conceptual model is the basis for 1) a database schema when storing IndoorGML data, 2) the XML schema when exchanging IndoorGML data, and 3) when developing IndoorGML applications with an intuitive and clear GUI. Furthermore, the proposed conceptual model provides constraints for more meaningful model and to define more sharply what is considered valid data. This paper briefly reports these preliminary results on the UML conceptual model.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9349Design for Geospatially Enabled Climate Modeling and Alert System (CLIMSYS): A Position Paper (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9350
The paper brings the focus on to multi-disciplinary approach of presenting climate analysis studies, taking help of interdisciplinary fields to structure the information. The system CLIMSYS provides the crucial element of spatially enabling climate data processing. Even though climate change is a matter of great scientific relevance and of broad general interest, there are some problems related to its communication. Its a fact that finding practical, workable and cost-efficient solutions to the problems posed by climate change is now a world priority and one which links government and non-government organizations in a way not seen before. An approach that should suffice is to create an accessible intelligent system that houses prior knowledge and curates the incoming data to deliver meaningful results. The objective of the proposed research is to develop a generalized system for climate data analysis that facilitates open sharing, central implementation, integrated components, knowledge creation, data format understanding, inferencing and ultimately optimal solution delivery, by the way of geospatial enablement.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9350Geographical Exploration and Analysis Extended to Textual Content (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9351
Textual and socio-economical regional features can be integrated and merged by linearly combining the between-regions corresponding dissimilarities. The scheme accommodates for various squared Euclidean socio-economical and textual dissimilarities (such as chi2 or cosine dissimilarities derived from document-term matrix or topic modelling). Also, spatial configuration of the regions can be represented by a weighted unoriented network whose vertex weights match the relative importance of regions. Association between the network and the dissimilarities expresses in the multivariate spatial autocorrelation index delta, generalizing Moran's I, whose local version can be cartographied. Our case study bears on the Wikipedia notices and socio-economic profiles for the 2251 Swiss municipalities, whose weights (socio-economical or textual) can be freely chosen.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9351Evaluating Efficiency of Spatial Analysis in Cloud Computing Platforms (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9352
The increase of high-resolution spatial data and methodological developments in recent years has enabled a detailed analysis of individuals' experience in space and over time. However, despite the increasing availability of data and technological advances, such individual-level analysis is not always possible in practice because of its computing requirements. To overcome this limitation, there has been a considerable amount of research on the use of high-performance, public cloud computing platforms for spatial analysis and simulation. In this paper, we aim to evaluate the efficiency of spatial analysis in cloud computing platforms. We compared the computing speed for calculating the Moran's I index between a local machine and spot instances on clouds, and our results demonstrated that there could be significant improvements in terms of computing time when the analysis was performed parallel on clouds.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9352Towards the Usefulness of User-Generated Content to Understand Traffic Events (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9353
This paper explores the usefulness of Twitter data to detect traffic events and their geographical locations in India through machine learning and NLP. We develop a classification module that can identify tweets relevant for traffic authorities with 0.80 recall accuracy using a Naive Bayes classifier. The proposed model also handles vernacular geographical aspects while retrieving place information from unstructured texts using a multi-layered georeferencing module. This work shows Mumbai has a wide spread use of Twitter for traffic information dissemination with substantial geographical information contributed by the users.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9353Unfolding Urban Structures: Towards Route Prediction and Automated City Modeling (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9354
This paper extends previous work concerning intersection classification by including a new set of statistics that enable to describe the structure of a city at a higher level of detail. Namely, we suggest to analyze sequences of intersections of different types. We start with sequences of length two and present a probabilistic model to derive statistics for longer sequences. We validate the results by comparing them with real frequencies. Finally, we discuss how this work can contribute to the generation of virtual cities as well as to spatial configuration search.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9354Deconstructed and Inverted Multi-Criteria Evaluation for On-The-Fly Scenario Development and Decision-Making (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9355
We propose a variation of the conventional spatial multi-criteria evaluation workflow for suitability analysis that allows efficient on-the fly scenario development for decision-making. Our approach proposes to reconstruct the conventional MCE workflow in order to exclude computationally expensive geoprocessing from the iterative scenario development. We then introduce a procedure that replaces costly iterations of spatial operations with one off-line preprocessing step followed by iterations of much less computationally expensive database queries. We illustrate our approach for deconstructed and inverted multi-criteria analysis with a case study aiming at selecting suitable sites for wind turbines in the Swiss alps.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9355Space-Time Representation of Accessible Areas for Wheelchair Users in Urban Areas (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9356
Providing personalized information on the accessibility of urban places for people with disabilities can significantly increase their social participation. This information should be adapted with respect to their needs at the specific time and space. Location-based technologies are considered as proper services to provide such information and encourage mobility of these people in urban areas. However, generally these services focus on the spatial conditions of the accessibility and ignore users' capabilities and time dependent constraints. This is much more challenging for people with disabilities given the diversity of their physical capabilities and preferences. To address this issue, we propose an approach to measure the space-time accessibility of urban areas considering environmental characteristics, users' capabilities, and time constraints. The proposed approach is unique and it highlights time constraint that is rooted in time geography theory. Unlike the classical time geography, which suggests a uniform travel velocity, we consider a variable travel velocity in the proposed approach, which is more relevant to the mobility of people with disabilities. To implement the proposed method, a Fuzzy approach is applied to evaluate the wheelchair speeds for the segments of a pedestrian network. The proposed approach is implemented in Saint-Roch, Quebec City for a case study and the results are presented and discussed.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9356Spatial Periodicity Analysis of Urban Elements Application to the Ancient City of Amida (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9357
The characterization of urban structures using morphological indicators is the subject of many applications in the domains of urban planning and transport, but also in less traditional disciplines, such as urban archeology. When reading actual urban plans, it may be possible to identify relics of ancient cities, and to characterize them with the help of appropriate indicators. In this context, we propose a method for the characterization of the spacing between urban elements based on the analysis of their spatial periodicity. The purpose of this method is to detect specific distances in the actual urban structure, potentially characteristic of ancient measurement units. This method is implemented in a GIS software, to facilitate its use by historians and archeologists, and is illustrated by an application on the ancient roman city of Amida (Diyarbakir, Turkey).Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9357Gaze Sequences and Map Task Complexity (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9358
As maps are visual representations of spatial context to communicate geographic information, analysis of gaze behavior is promising to improve map design. In this research we investigate the impact of map task complexity and different legend types on the visual attention of a user. With an eye tracking experiment we could show that the complexity of two map tasks can be measured and compared based on AOI sequences analysis. This knowledge can help to improve map design for static maps or in the context of interactive systems, create better map interfaces, that adapt to the user's current task.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9358Facilitating the Interoperable Use of Cross-Domain Statistical Data Based on Standardized Identifiers (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9359
In the big data era, the successful sharing and integration of data from various resources becomes an essential requirement. As statistical data serves as the foundation for professional domains to report the phenomena in the reality according to the selected administration units, its importance has been well recognized. However, statistical data is typically collected and published by different responsible agencies, hence the heterogeneity of how the data is designed, prepared and disseminated becomes an obstacle impeding the automatic and interoperable use in multidisciplinary applications. From a standardization perspective, this research proposes an identifier-based framework for modeling the spatial, temporal and thematic aspects of cross-domain statistical data, such that any piece of distributed statistical information can be correctly and automatically interpreted without any ambiguity for further analysis and exploration. The results indicate the proposed mechanism successfully enables a comprehensive management of indicators from different resources and enhances the easier data retrieval and correct use across different domains. Meanwhile, the interface design exemplifies an innovated improvement on the presentation and interpretation of statistical information. The proposed solution can be readily implemented for building a transparent sharing environment for the National Spatial Data Infrastructure (NSDI).Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9359Identification of Geographical Segmentation of the Rental Apartment Market in the Tokyo Metropolitan Area (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9360
It is often said that the real estate market is divided geographically in such a manner that the value of attributes of real estate properties is different for each area. This study proposes a new approach to the investigation of the geographical segmentation of the real estate market. We develop a price model with many regional explanatory variables, and implement the generalized fused lasso - a regression method for promoting sparsity - to extract the areas where the valuation standard is the same. The proposed method is applied to rental data of apartments in the Tokyo metropolitan area, and we find that the geographical segmentation displays hierarchal patterns. Specifically, we observe that the market is divided by wards, railway lines and stations, and neighbourhoods.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9360Automatic Wall Detection and Building Topology and Property of 2D Floor Plan (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9361
Recently, indoor space construction information has been actively carried out primarily in large buildings and in underground facilities. However, the building of this data was done by only a handful of people, and it was a time- and money-intensive task. Therefore, the technology of automatically extracting a wall and constructing a 3D model from architectural floor plans was developed. Complete automation is still limited by accuracy issues, and only a few sets of floor plan data to which the technology can be applied exist. In addition, it is difficult to extract complicated walls and their thickness to build the wall-junction structure of indoor spatial information, which requires significant topological information in the automation process. In this paper, we propose an automatic method of extracting the wall from an architectural floor plan suitable for the restoration of the indoor spatial information according to the indoor spatial information standard.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9361Mapping Wildlife Species Distribution With Social Media: Augmenting Text Classification With Species Names (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9362
Social media has considerable potential as a source of passive citizen science observations of the natural environment, including wildlife monitoring. Here we compare and combine two main strategies for using social media postings to predict species distributions: (i) identifying postings that explicitly mention the target species name and (ii) using a text classifier that exploits all tags to construct a model of the locations where the species occurs. We find that the first strategy has high precision but suffers from low recall, with the second strategy achieving a better overall performance. We furthermore show that even better performance is achieved with a meta classifier that combines data on the presence or absence of species name tags with the predictions from the text classifier.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9362Multimodal-Transport Collaborative Evacuation Strategies for Urban Serious Emergency Incidents Based on Multi-Sources Spatiotemporal Data (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9363
When serious emergency events happen in metropolitan cities where pedestrians and vehicles are in high-density, single modal-transport cannot meet the requirements of quick evacuations. Existing mixed modes of transportation lacks spatiotemporal collaborative ability, which cannot work together to accomplish evacuation tasks in a safe and efficient way. It is of great scientific significance and application value for emergency response to adopt multimodal-transport evacuations and improve their spatial-temporal collaboration ability. However, multimodal-transport evacuation strategies for urban serious emergency event are great challenge to be solved. The reasons lie in that: (1) large-scale urban emergency environment are extremely complicated involving many geographical elements (e.g., road, buildings, over-pass, square, hydrographic net, etc.); (2) Evacuated objects are dynamic and hard to be predicted. (3) the distributions of pedestrians and vehicles are unknown. To such issues, this paper reveals both collaborative and competitive mechanisms of multimodal-transport, and further makes global optimal evacuation strategies from the macro-optimization perspective. Considering detailed geographical environment, pedestrian, vehicle and urban rail transit, a multi-objective multi-dynamic-constraints optimization model for multimodal-transport collaborative emergency evacuation is constructed. Take crowd incidents in Shenzhen as example, empirical experiments with real-world data are conducted to evaluate the evacuation strategies and path planning. It is expected to obtain innovative research achievements on theory and method of urban emergency evacuation in serious emergency events. Moreover, this research results provide spatial-temporal decision support for urban emergency response, which is benefit to constructing smart and safe cities.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9363A New Map Symbol Design Method for Real-Time Visualization of Geo-Sensor Data (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9364
Maps are an excellent way to present data with spatial components. For the large-scale geo-sensors being utilized in recent years, the map-based management and visualization of geo-senor data have become ubiquitous. Without a doubt, managing and visualizing geo-sensor data on maps will have vastly more future applications. However, current maps typically do not support real-time communication in the Internet of Things (IoT), and it is difficult to implement real-time visualization of sensor data on a map. Map symbols are the language of maps. In this paper, we describe a new map symbol design method for geo-sensor data acquisition and visualization on maps. We refer to the sensor data visual method in supervisory control and data acquisition system (SCADA) and apply it to the design process of map symbols. Based on the traditional vector map symbol, the mapping relationship between the sensor data and the graphic element is defined in the map symbol design process. When the map symbol is rendered in the map, the map symbol is integrated into the map layer. The communication module in the map that communicates with the sensor device receives real-time sensor data and triggers a refresh of the map layer according to the mapping profile. All the methods and processes shown herein have been verified in GeoTools.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9364How Do Texture and Color Communicate Uncertainty in Climate Change Map Displays? (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9365
We report on an empirical study with over hundred online participants where we investigated how texture and color value, two popular visual variables used to convey uncertainty in maps, are understood by non-domain-experts. Participants intuit denser dot textures to mean greater attribute certainty; irrespective of whether the dot pattern is labeled certain or uncertain. With this additional empirical evidence, we hope to further improve our understanding of how non-domain experts interpret uncertainty information depicted in map displays. This in turn will allow us to more clearly and legibly communicate uncertainty information in climate change maps, so that these displays can be unmistakably understood by decision-makers and the general public.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9365An Analytical Framework for Understanding Urban Functionality from Human Activities (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9366
The intertwined relationship between urban functionality and human activity has been widely recognized and quantified with the assistance of big geospatial data. In specific, urban land uses as an important facet of urban structure can be identified from spatiotemporal patterns of aggregate human activities. In this article, we propose a space, time and activity cuboid based analytical framework for clustering urban spaces into different categories of urban functionality based on the variation of activity intensity (T-fiber), mixture (A-fiber) and interaction (I- and O-fiber). The ability of the proposed framework is empirically evaluated by three case studies.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9366Application of Style Transfer in the Vectorization Process of Floorplans (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9367
As the market for indoor spatial information burgeons, the construction of indoor spatial databases consequently gain attention. Since floorplans are portable records of buildings, they are an indispensable source for the efficient construction of indoor environments. However, as previous research on floorplan information retrieval usually targeted specific formats, a system for constructing spatial information must include heuristic refinement steps. This study aims to convert diverse floorplans into an integrated format using the style transfer by deep networks. Our deep networks mimic a robust perception of human that recognize the cell structure of floorplans under various formats. The integrated format ensures that unified post-processing steps are required to the vectorization of floorplans. Through this process, indoor spatial information is constructed in a pragmatic way, using a plethora of architectural floorplans.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9367Estimating Building Age from Google Street View Images Using Deep Learning (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9368
Building databases are a fundamental component of urban analysis. However such databases usually lack detailed attributes such as building age. With a large volume of building images being accessible online via API (such as Google Street View), as well as the fast development of image processing techniques such as deep learning, it becomes feasible to extract information from images to enrich building databases. This paper proposes a novel method to estimate building age based on the convolutional neural network for image features extraction and support vector machine for construction year regression. The contributions of this paper are two-fold: First, to our knowledge, this is the first attempt for estimating building age from images by using deep learning techniques. It provides new insight for planners to apply image processing and deep learning techniques for building database enrichment. Second, an image-base building age estimation framework is proposed which doesn't require information on building height, floor area, construction materials and therefore makes the analysis process simpler and more efficient.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9368Center Point of Simple Area Feature Based on Triangulation Skeleton Graph (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9369
In the area of cartography and geographic information science, the center points of area features are related to many fields. The centroid is a conventional choice of center point of area feature. However, it is not suitable for features with a complex shape for the center point may be outside the area or not fit the visual center so well. This paper proposes a novel method to calculate the center point of area feature based on triangulation skeleton graph. This paper defines two kinds of centrality of vertices in skeleton graph according to the centrality theory in graph and network analysis. Through the measurement of vertices centrality, the center points of polygon area features are defined as the vertices with maximum centrality.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9369The Use of Particle Swarm Optimization for a Vector Cellular Automata Model of Land Use Change (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9370
Cellular automata (CA) is an important area of research in GIScience, with recent research developing vector-based models in addition to the traditional raster data formats. One active area of research is the calibration of transition rules, particularly when applied to vector CA. Here we evaluate a particle swarm optimization (PSO) process to calibrate a vector CA model of land use change for a sub-region of Ipswich in Queensland, Australia, for the period 1999-2016. We compare the results with those for a raster CA of the same dataset. The spatial indices of the vector PSO-CA model exceed that of the raster model, with spatial accuracies being 82.45% and 76.47%, respectively. In addition, the vector PSO-CA model achieved a higher kappa coefficient. Vector-based PSO-CA model can be used for the exploration of urbanization process and provide a better understanding of land use change.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9370Towards a Comprehensive Temporal Classification of Footfall Patterns in the Cities of Great Britain (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9371
The temporal fluctuations of footfall in the urban areas have long been a neglected research problem, and this mainly has to do with the past technological limitations and inability to consistently collect large volumes of data at fine intra-day temporal resolutions. This paper makes use of the extensive set of footfall measurements acquired by the Wi-Fi sensors installed in the retail units across the British town centres, shopping centres and retail parks. We present the methodology for classifying the diurnal temporal signatures of human activity at the urban microsite locations and identify characteristic profiles which make them distinctive regarding when people visit them. We conclude that there exist significant differences regarding the time when different locations are the busiest during the day, and this undoubtedly has a substantial impact on how retailers should plan where and how their businesses operate.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9371Is This Statement About A Place? Comparing two perspectives (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9372
Text often includes references to places by name; in prior work, more than 20% of a sample of event-related tweets were found to include place names. Research has addressed the challenge of leveraging the geographic data reflected in text statements, with well-developed methods to recognize location mentions in text and related work on automated toponym resolution (deciding which place in the world is meant by a place name). A core issue that remains is to distinguish between text that mentions a place or places and text that is about a place or places. This paper presents the first step in research to address this challenge. The research reported here sets the conceptual and practical groundwork for subsequent supervised machine learning research; that research will leverage human-produced training data, for which a judgment is made about whether a statement is or is not about a place (or places), to train computational methods to do this classification for large volumes of text. The research step presented here focuses on three questions: (1) what kinds of entities are typically conceptualized as places, (2) what features of a statement prompt the reader to judge a statement to be about a place (or not about a place) and (3) how do judgments of whether or not a statement is about a place compare between a group of experts who have studied the concept of "place" from a geographic perspective and a cross-section of individuals recruited through a crowdsourcing platform to make these judgments.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9372Geospatial Semantics for Spatial Prediction (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9373
In this paper the potential of geospatial semantics for spatial predictions is explored. Therefore data from the LinkedGeoData platform is used to predict landcover classes described by the CORINE dataset. Geo-objects obtained from LinkedGeoData are described by an OWL ontology, which is utilized for the purpose of spatial prediction within this paper. This prediction is based on an association analysis which computes the collocations between the landcover classes and the semantically described geo-objects. The paper provides an analysis of the learned association rules and finally concludes with a discussion on the promising potential of geospatial semantics for spatial predictions, as well as potentially fruitful future research within this domain.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9373Docked vs. Dockless Bike-sharing: Contrasting Spatiotemporal Patterns (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9374
U.S. urban centers are currently experiencing explosive growth in commercial dockless bike-sharing services. Tens of thousands of bikes have shown up across the country in recent months providing limited time for municipal governments to set regulations or assess their impact on government-funded dock-based bike-sharing programs. Washington, D.C. offers an unprecedented opportunity to examine the activity patterns of both docked and dockless bike-sharing services given the history of bike-sharing in the city and the recent availability of dockless bike data. This work presents an exploratory step in understanding how dockless bike-sharing services are being used within a city and the ways in which the activity patterns differ from traditional dock station-based programs.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9374OpenPOI: An Open Place of Interest Platform (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9375
Places of Interest (POI) are a principal component of how human behavior is captured in today's geographic information. Increasingly, access to POI datasets are being restricted - even silo-ed - for commercial use, with vendors often impeding access to the very users that contribute the data. Open mapping platforms such as OpenStreetMap (OSM) offer access to a plethora of geospatial data though they can be limited in the attribute resolution or range of information associated with the data. Nuanced descriptive information associated with POI, e.g., ambience, are not captured by such platforms. Furthermore, interactions with a POI, such as checking in, or recommending a menu item, are inherently place-based concepts. Many of these interactions occur with high temporal volatility that involves frequent interaction with a platform, arguably inappropriate for the "changeset" model adopted by OSM and related datasets. In this short paper we propose OpenPOI, an open platform for storing, serving, and interacting with places of interests and the activities they afford.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9375Exploring Shifting Densities through a Movement-based Cartographic Interface (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9376
Animated maps are widely used for representing shifting densities. Though there is evidence that animations can provide better memory recall than static charts, it could be a consequence of using better techniques for animation than for static representations. However, the lack of control makes them frustrating for users, while animated choropleth maps can cause change blindness. In this paper, we propose an interactive animation technique which employs the lenticular printing metaphor and benefits from the user's proprioceptive sense to explore density changes over time. We hypothesized that using a tangible interface based on the body movement would improve memory recall and, consequently, animated map reading.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9376Geotagging Location Information Extracted from Unstructured Data (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9377
Location information is an essential element of location-based services and is used in various ways. Unstructured data contain different types of location information, but coordinate values are required to determine the exact location. In Twitter, a typical social network service (SNS) platform of unstructured data, the number of geotagged tweets is low. If we can estimate the location of text by geotagging a large number of unstructured data, we can estimate the location of the event in real-time. This study is a base study on extracting the location information by using the named entity recognizer provided by the Exobrain API and applying geotagging to unstructured data in Hangul (Korean). We used Chosun news articles, which are grammatically correct and well organized, instead of tweets to extract three location-related categories, namely "location," "organization," and "artifact". We used the named entity recognizer and geotagged each sentence in combination of the fields in each category. The results of the study showed that 61% of the 800 test sentences did not have the location-related information, thus hindering geotagging. In 11.75% of the test sentences, geotagging was possible with only the given location information extracted using the named entity recognizer. The remaining 27.25% of the sentences contained information on more than two locations from the same subcategories and hence required location estimation from candidate locations. In future research, we plan to apply the results of this study to develop location estimation algorithm that makes use of the extracted location-related entities from purely unstructured data such as that on SNSs.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9377Linked Open Data Vocabularies for Semantically Annotated Repositories of Data Quality Measures (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9378
The fitness for purpose concerns many different aspects of data quality. These aspects are usually assessed independently by different data quality measures. However, for the assessment of the fitness for purpose, a holistic understanding of these aspects is needed. In this paper we discuss two Linked Open Data vocabularies for formally describing measures and their relations. These vocabularies can be used to semantically annotate repositories of data quality measures, which accordingly adhere to common standards even if being distributed on multiple servers. This allows for a better understanding of how data quality measures relate and mutually constrain. As a result, it becomes possible to improve intrinsic data quality measures by evaluating their effectivity and by combining them.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9378Need A Boost? A Comparison of Traditional Commuting Models with the XGBoost Model for Predicting Commuting Flows (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9379
Commuting models estimate the number of commuting trips from home to work locations in a given area. Since their infancy, they have been increasingly used in a variety of fields to reduce traffic and pollution, drive infrastructure choices, and solve a variety of other problems. Traditional commuting models, such as gravity and radiation models, typically have a strict structural form and limited number of input variables, which may limit their ability to predict commuting flows as well as machine learning models that might better capture the complex dynamics of the commuting process. To determine whether machine learning models might add value to the field of commuter flow prediction, we compare and discuss the performance of two standard traditional models with the XGBoost machine learning algorithm for predicting home to work commuter flows from a well-known United States commuting dataset. We find that the XGBoost model outperforms the traditional models on three commonly used metrics, indicating that machine learning models may add value to the field of commuter flow prediction.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9379Modeling Road Traffic Takes Time (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9380
To model dynamic road traffic environment, it is imperative to integrate spatial and temporal knowledge about its evolution into a single model. This paper introduces temporal dimension which provides a method to reason about time-varying spatial information in a spatio-temporal graph-based model. Two types of evolution, topological and attributed, of time-varying graph (TVG) are considered which require the time domain to be discrete and/or continuous, and the TVG proposed includes time-varying node/edge presence and labeling functions. Theoretical concepts presented in this paper will guide us through the process of application development in future.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9380Diversity in Spatial Language Within Communities: The Interplay of Culture, Language and Landscape in Representations of Space (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9381
Significant diversity exists in the way languages structure spatial reference, and this has been shown to correlate with diversity in non-linguistic spatial behaviour. However, most research in spatial language has focused on diversity between languages: on which spatial referential strategies are represented in the grammar, and to a lesser extent which of these strategies are preferred overall in a given language. However, comparing languages as a whole and treating each language as a single data point provides a very partial picture of linguistic spatial behaviour, failing to recognise the very significant diversity that exists within languages, a largely under-investigated but now emerging field of research. This paper focuses on language-internal diversity, and on the central role of a range of sociocultural and demographic factors that intervene in the relationship between humans, languages, and the physical environments in which communities live.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9381Flexible Patterns of Place for Function-based Search of Space (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9382
Place is a human interpretation of space; it augments the latter with information related to human activities, services, emotions and so forth. Searching for places rather than traditional space-based search represents significant challenges. The most prevalent method of addressing place-related queries is based on placenames but has limited potential due to the vagueness of natural language and its tendency to lead to ambiguous interpretations. In previous work we proposed a system-oriented formalization of place that goes beyond placenames by introducing composition patterns of place. In this study, we introduce flexibility into these patterns in terms of what is necessarily or possibly included when describing the spatial composition of a place and propose a novel automated process of extracting these patterns relying on both theoretical and empirical knowledge. The proposed methodology is exemplified through the use case of locating all the shopping areas within London, UK.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9382Novel Models for Multi-Scale Spatial and Temporal Analyses (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9383
Multi-scale analysis for spatio-temporal data forms a fundamental challenge for many analytic systems. In geographic information systems, analysis and modeling at pre-defined spatial and temporal scales may miss critical relationships in other scales. Previous studies have investigated the uses of the triangle model as a multi-scale framework in analyzing temporal data. This article demonstrates the utilities of the triangle model and pyramid model for multi-scale spatial analysis through real-world analytical tasks and discusses the potential of developing a unified modeling framework that integrates the two models.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9383Geosocial Media Data as Predictors in a GWR Application to Forecast Crime Hotspots (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9384
In this paper we forecast hotspots of street crime in Portland, Oregon. Our approach uses geosocial media posts, which define the predictors in geographically weighted regression (GWR) models. We use two predictors that are both derived from Twitter data. The first one is the population at risk of being victim of street crime. The second one is the crime related tweets. These two predictors were used in GWR to create models that depict future street crime hotspots. The predicted hotspots enclosed more than 23% of the future street crimes in 1% of the study area and also outperformed the prediction efficiency of a baseline approach. Future work will focus on optimizing the prediction parameters and testing the applicability of this approach to other mobile crime types.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9384Who Masks? Correlates of Individual Location-Masking Behavior in an Online Survey (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9385
Geomasking traditionally refers to a set of techniques employed by a data steward to protect the privacy of data subjects by altering geographic coordinates. Data subjects themselves may make efforts to obfuscate their location data and protect their geoprivacy. Among these individual-level strategies are providing incorrect address data, limiting the precision of address data, or map-based location masking. This study examines the prevalence of these three location-masking behaviors in an online survey of California residents, finding that such behavior takes place across social groups. There are no significant differences across income level, education, ethnicity, sex, and urban locations. Instead, the primary differences are linked to intervening variables of knowledge and attitudes about location privacy.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9385Dynamically-Spaced Geo-Grid Segmentation for Weighted Point Sampling on a Polygon Map Layer (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9386
Geo-grid algorithms divide a large polygon area into several smaller polygons, which are important for studying or executing a set of operations on underlying topological features of a map. The current geo-grid algorithms divide a large polygon in to a set of smaller but equal size polygons only (e.g. is ArcMaps Fishnet). The time to create a geo-grid is typically proportional to number of smaller polygons created. This raises two problems - (i) They cannot skip unwanted areas (such as water bodies, given about 71% percent of the Earth's surface is water-covered); (ii) They are incognizant to any underlying feature set that requires more deliberation. In this work, we propose a novel dynamically spaced geo-grid segmentation algorithm that overcomes these challenges and provides a computationally optimal output for borderline cases of an uneven polygon. Our method uses an underlying topological feature of population distributions, from the LandScan Global 2016 dataset, for creating grids as a function of these weighted features. We benchmark our results against available algorithms and found our approach improves geo-grid creation. Later on, we demonstrate the proposed approach is more effective in harvesting Points of Interest data from a crowd-sourced platform.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9386The Landform Reference Ontology (LFRO): A Foundation for Exploring Linguistic and Geospatial Conceptualization of Landforms (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9387
Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9387Abstract Data Types for Spatio-Temporal Remote Sensing Analysis (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9388
Abstract data types are a helpful framework to formalise analyses and make them more transparent, reproducible and comprehensible. We are revisiting an approach based on the space, time and theme dimensions of remotely sensed data, and extending it with a more differentiated understanding of space-time representations. In contrast to existing approaches and implementations that consider only fixed spatial units (e.g. pixels), our approach allows investigations of the spatial units' spatio-temporal characteristics, such as the size and shape of their geometry, and their relationships. Five different abstract data types are identified to describe geographical phenomenon, either directly or in combination: coverage, time series, trajectory, composition and evolution.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9388Towards Vandalism Detection in OpenStreetMap Through a Data Driven Approach (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9389
Vandalism is a phenomenon that has affected by now the digital domain, in particular in the context of Volunteered Geographic Information projects. This paper aims at proposing a methodology to detect vandalism in the OpenStreetMap project. First, an analysis of related works sheds light on the lack of consensus when it comes to defining vandalism in VGI from both conceptual and practical points of view. Second, we present experiments on the use of clustering-based outlier detection methods to identify vandalism in OSM. The outcome of this study focuses on choosing the right variables when it comes to detecting vandalism in OSM.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9389A Conceptual Framework for Representation of Location-based Social Media Activities (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9390
This research develops a conceptual framework for the representation and analysis of location-based social media activities (LBSMA) in GIS. With increasing popularity of location-based social networking, social media platforms have become new channels to observe human activities in physical and virtual worlds. At the same time, there is a shift of some human interactions from the physical space to the virtual social space. Traditional geographical representation in GIS is not sufficient to handle the increased sophistication of human activities related to, or embedded in, location-based social media data. This research proposes an ontology for the location-based social media activity data and a conceptual framework for them to be modeled in a GIS environment so that interconnections of human activities in spatial-temporal-social dimensions can be represented, organized, retrieved, analyzed, and visualized in the system.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9390Towards the Statistical Analysis and Visualization of Places (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9391
The concept of place recently gains momentum in GIScience. In some fields like human geography, spatial cognition or information theory, this topic already has a longer scholarly tradition. This is however not yet completely the case with statistical spatial analysis and cartography. Despite that, taking full advantage of the plethora of user-generated information that we have available these days requires mature place-based statistical and visualization concepts. This paper contributes to these developments: We integrate existing place definitions into an understanding of places as a system of interlinked, constituent characteristics. Based on this, challenges and first promising conceptual ideas are discussed from statistical and visualization viewpoints.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9391An Experimental Comparison of Two Definitions for Groups of Moving Entities (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9392
Two of the grouping definitions for trajectories that have been developed in recent years allow a continuous motion model and allow varying shape groups. One of these definitions was suggested as a refinement of the other. In this paper we perform an experimental comparison to highlight the differences in these two definitions on various data sets.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9392Extracting Geospatial Information from Social Media Data for Hazard Mitigation, Typhoon Hato as Case Study (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9393
With social media widely used for interpersonal communication, it has served as one important channel for information creation and propagation especially during hazard events. Users of social media in hazard-affected area can capture and upload hazard information more timely by portable and internet-connected electric devices such as smart phones or tablet computers equipped with (Global Positioning System) GPS devices and cameras. The information from social media(e.g. Twitter, facebook, sina-weibo, WebChat, etc.) contains a lot of hazard related information including texts, pictures, and videos. Most important thing is that a fair proportion of these crowd-sourcing information is valuable for the geospatial analysis in Geographic information system (GIS) during the hazard mitigation process. The geospatial information (position of observer, hazard-affected region, status of damages, etc) can be acquired and extracted from social media data. And hazard related information could also be used as the GIS attributes. But social media data obtained from crowd-sourcing is quite complex and fragmented on format or semantics. In this paper, we introduced the method how to acquire and extract fine-grained hazard damage geospatial information. According to the need of hazard relief, we classified the extracted information into eleven hazard loss categories and we also analyzed the public's sentiment to the hazard. The 2017 typhoon "Hato" was selected as the case study to test the method introduced.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9393Propagation of Uncertainty for Volunteered Geographic Information in Machine Learning (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9394
Although crowdsourcing drives much of the interest in Machine Learning (ML) in Geographic Information Science (GIScience), the impact of uncertainty of Volunteered Geographic Information (VGI) on ML has been insufficiently studied. This significantly hampers the application of ML in GIScience. In this paper, we briefly delineate five common stages of employing VGI in ML processes, introduce some examples, and then describe propagation of uncertainty of VGI.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9394Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9395
The rise of Artificial Intelligence (AI) has brought up both opportunities and challenges for today's evolving GIScience. Its ability in image classification, object detection and feature extraction has been frequently praised. However, it may also apply for falsifying geospatial data. To demonstrate the thrilling power of AI, this research explored the potentials of deep learning algorithms in capturing geographic features and creating fake satellite images according to the learned 'sense'. Specifically, Generative Adversarial Networks (GANs) is used to capture geographic features of a certain place from a group of web maps and satellite images, and transfer the features to another place. Corvallis is selected as the study area, and fake datasets with 'learned' style from three big cities (i.e. New York City, Seattle and Beijing) are generated through CycleGAN. The empirical results show that GANs can 'remember' a certain 'sense of place' and further apply that 'sense' to another place. With this paper, we would like to raise both public and GIScientists' awareness in the potential occurrence of fake satellite images, and its impacts on various geospatial applications, such as environmental monitoring, urban planning, and land use development.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9395A Safety Evaluation Method of Evacuation Routes in Urban Areas in Case of Earthquake Disasters Using Ant Colony Optimization and Geographic Information Systems (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9396
The present study aims to propose the method for the quantitative evaluation of safety concerning evacuation routes in case of earthquake disasters in urban areas using Ant Colony Optimization (ACO) algorithm and Geographic Information Systems (GIS). Regarding the safety evaluation method, firstly, the similarity in safety was focused on while taking into consideration road blockage probability, and after classifying roads by means of the hierarchical cluster analysis, the congestion rates of evacuation routes using ACO simulations were estimated. Based on these results, the multiple evacuation routes extracted were visualized on digital maps by means of GIS, and its safety was evaluated. Furthermore, the selection of safe evacuation routes between evacuation sites, for cases when the possibility of large-scale evacuation after an earthquake disaster is high, is made possible. As the safety evaluation method is based on public information, by obtaining the same geographic information as the present study, it is effective in other areas regardless of whether the information is of the past and future. Therefore, in addition to spatial reproducibility, the safety evaluation method also has high temporal reproducibility. Because safety evaluations are conducted on evacuation routes based on quantified data, highly safe evacuation routes that are selected have been quantitatively evaluated, and thus serve as an effective indicator when selecting evacuation routes.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9396Analysis of Irregular Spatial Data with Machine Learning: Classification of Building Patterns with a Graph Convolutional Neural Network (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9397
Machine learning methods such as Convolutional Neural Network (CNN) are becoming an integral part of scientific research in many disciplines, the analysis of spatial data often failed to these powerful methods because of its irregularity. By using the graph Fourier transform and convolution theorem, we try to convert the convolution operation into a point-wise product in Fourier domain and build a learning architecture of graph CNN for the classification of building patterns. Experiments showed that this method has achieved outstanding results in identifying regular and irregular patterns, and has significantly improved in comparing with other methods.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9397Assessing Neighborhood Conditions using Geographic Object-Based Image Analysis and Spatial Analysis (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9398
Traditionally, understanding urban neighborhood conditions heavily relies on time-consuming and labor-intensive surveying. As the growing development of computer vision and GIScience technology, neighborhood conditions assessment can be more cost-effective and time-efficient. This study utilized Google Earth Engine (GEE) to acquire 1m aerial imagery from the National Agriculture Image Program (NAIP). The features within two main categories: (i) aesthetics and (ii) street morphology that have been selected to reflect neighborhood socio-economic (SE) and demographic (DG) conditions were subsequently extracted through geographic object-based image analysis (GEOBIA) routine. Finally, coefficient analysis was performed to validate the relationship between selected SE indicators, generated via spatial analysis, as well as actual SE and DG data within region of interests (ROIs). We hope this pilot study can be leveraged to perform cost- and time- effective neighborhood conditions assessment in support of community data assessment on both demographics and health issues.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9398Spatial Information Extraction from Text Using Spatio-Ontological Reasoning (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9399
This paper is involved with extracting spatial information from text. We seek to geo-reference all spatial entities mentioned in a piece of text. The focus of this paper is to investigate the contribution of spatial and ontological reasoning to spatial interpretation of text. A preliminary study considering descriptions of cities and geographical regions from English Wikipedia suggests that spatial and ontological reasoning can be more effective to resolve ambiguities in text than a classical text understanding pipeline relying on parsing.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9399Scalable Spatial Join for WFS Clients (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9400
Web Feature Service (WFS) is a popular Web service for geospatial data, which is represented as sets of features that can be queried using the GetFeature request protocol. However, queries involving spatial joins are not efficiently supported by WFS server implementations such as GeoServer. Performing spatial join at client side is unfortunately expensive and not scalable. In this paper, we propose a simple and yet scalable strategy for performing spatial joins at client side after querying WFS data. Our approach is based on the fact that Web clients of WFS data are often used for query-based visual exploration. In visual exploration, the queried spatial objects can be filtered for a particular zoom level and spatial extent and be simplified before spatial join and still serve their purpose. This way, we can drastically reduce the number of spatial objects retrieved from WFS servers and reduce the computation cost of spatial join, so that even a simple plane-sweep algorithm can yield acceptable performance for interactive applications.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9400Modelling Spatial Patterns Using Graph Convolutional Networks (Short Paper)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9401
The understanding of geographical reality is a process of data representation and pattern discovery. Former studies mainly adopted continuous-field models to represent spatial variables and to investigate the underlying spatial continuity/heterogeneity in a regular spatial domain. In this article, we introduce a more generalized model based on graph convolutional neural networks that can capture the complex parameters of spatial patterns underlying graph-structured spatial data, which generally contain both Euclidean spatial information and non-Euclidean feature information. A trainable site-selection framework is proposed to demonstrate the feasibility of our model in geographic decision problems.Mon, 30 Jul 2018 20:51:57 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9401Front Matter, Table of Contents, Preface, Conference Organization
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9302
Front Matter, Table of Contents, Preface, Conference OrganizationThu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9302A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9303
Elemental balance, the property of having the same number of each type of atom on both sides of the equation, is a fundamental feature of chemical reactions. In metabolic network models, this property is typically verified on a reaction-by-reaction basis. In this paper we show how violations of elemental balance can be efficiently detected in an entire network, without the need for specifying the chemical formula of each of the metabolites, which enhances a modeler's ability to automatically verify that their model satisfies elemental balance.Our method makes use of duality theory, linear programming, and mixed integer linear programming, and runs efficiently on genome-scale metabolic networks (GSMNs). We detect elemental balance violations in 40 out of 84 metabolic network models in the BiGG database. We also identify a short list of reactions that are candidates for being elementally imbalanced. Out of these candidates, nearly half turn out to be truly imbalanced reactions, and the rest can be seen as witnesses of elemental balance violations elsewhere in the network. The majority of these violations involve a proton imbalance, a known challenge of metabolic network reconstruction.Our approach is efficient, easy to use and powerful. It can be helpful to metabolic network modelers during model verification. Our methods are fully integrated into the MONGOOSE software suite and are available at https://github.com/WGS-TB/MongooseGUI3.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9303Prefix-Free Parsing for Building Big BWTs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9304
High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9304Detecting Mutations by eBWT
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9305
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT.Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the eBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the eBWT and LCP arrays. Preliminary results show that our method requires much less coverage than state-of-the-art tools while drastically improving precision and sensitivity.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9305Haplotype-aware graph indexes
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9306
The variation graph toolkit (VG) represents genetic variation as a graph. Each path in the graph is a potential haplotype, though most paths are unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to be correct. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by indexing the 1000 Genomes Project haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9306Reconciling Multiple Genes Trees via Segmental Duplications and Losses
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9307
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost delta and lambda, respectively. We show that the problem is polynomial-time solvable when delta <= lambda (via LCA-mapping), while if delta > lambda the problem is NP-hard, even when lambda = 0 and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are delta/lambda and the number d of segmental duplications, of time complexity O(ceil[delta/lambda]^d * n * delta/lambda). Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9307Protein Classification with Improved Topological Data Analysis
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9308
Automated annotation and analysis of protein molecules have long been a topic of interest due to immediate applications in medicine and drug design. In this work, we propose a topology based, fast, scalable, and parameter-free technique to generate protein signatures.We build an initial simplicial complex using information about the protein's constituent atoms, including its radius and existing chemical bonds, to model the hierarchical structure of the molecule. Simplicial collapse is used to construct a filtration which we use to compute persistent homology. This information constitutes our signature for the protein. In addition, we demonstrate that this technique scales well to large proteins. Our method shows sizable time and memory improvements compared to other topology based approaches. We use the signature to train a protein domain classifier. Finally, we compare this classifier against models built from state-of-the-art structure-based protein signatures on standard datasets to achieve a substantial improvement in accuracy.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9308A Dynamic Algorithm for Network Propagation
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9309
Network propagation is a powerful transformation that amplifies signal-to-noise ratio in biological and other data. To date, most of its applications in the biological domain employed standard techniques for its computation that require O(m) time for a network with n vertices and m edges. When applied in a dynamic setting where the network is constantly modified, the cost of these computations becomes prohibitive. Here we study, for the first time in the biological context, the complexity of dynamic algorithms for network propagation. We develop a vertex decremental algorithm that is motivated by various biological applications and can maintain propagation scores over general weights at an amortized cost of O(m/(n^{1/4})) per update. In application to real networks, the dynamic algorithm achieves significant, 50- to 100-fold, speedups over conventional static methods for network propagation, demonstrating its great potential in practice.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9309New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9310
Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCM_NJ, published in SODA 2001. The main empirical advantage of DCM_NJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCM_NJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCM_NJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9310An Average-Case Sublinear Exact Li and Stephens Forward Algorithm
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9311
Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithms as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated.To make the Li and Stephens forward algorithm for these datasets computationally tractable, we have created a numerically exact version of the algorithm with observed average case O(nk^{0.35}) runtime in number of genetic sites n and reference panel size k. This avoids any tradeoff between runtime and model complexity. We demonstrate that our approach also provides a succinct data structure for general purpose haplotype data storage. We discuss generalizations of our algorithmic techniques to other hidden Markov models.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9311External memory BWT and LCP computation for sequence collections with applications
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9312
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences.In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9312Kermit: Guided Long Read Assembly using Coloured Overlap Graphs
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9313
With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9313A Succinct Solution to Rmap Alignment
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9314
We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Lastly, we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. Kohdista is available at https://github.com/mmuggli/KOHDISTA/.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9314Spalter: A Meta Machine Learning Approach to Distinguish True DNA Variants from Sequencing Artefacts
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9315
Being able to distinguish between true DNA variants and technical sequencing artefacts is a fundamental task in whole genome, exome or targeted gene analysis. Variant calling tools provide diagnostic parameters, such as strand bias or an aggregated overall quality for each called variant, to help users make an informed choice about which variants to accept or discard. Having several such quality indicators poses a problem for the users of variant callers because they need to set or adjust thresholds for each such indicator. Alternatively, machine learning methods can be used to train a classifier based on these indicators. This approach needs large sets of labeled training data, which is not easily available.The new approach presented here relies on the idea that a true DNA variant exists independently of technical features of the read in which it appears (e.g. base quality, strand, position in the read). Therefore the nucleotide separability classification problem - predicting the nucleotide state of each read in a given pileup based on technical features only - should be near impossible to solve for true variants. Nucleotide separability, i.e. achievable classification accuracy, can either be used to distinguish between true variants and technical artefacts directly, using a thresholding approach, or it can be used as a meta-feature to train a separability-based classifier. This article explores both possibilities with promising results, showing accuracies around 90%.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9315Essential Simplices in Persistent Homology and Subtle Admixture Detection
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9316
We introduce a robust mathematical definition of the notion of essential elements in a basis of the homology space and prove that these elements are unique. Next we give a novel visualization of the essential elements of the basis of the homology space through a rainfall-like plot (RFL). This plot is data-centric, i.e., is associated with the individual samples of the data, as opposed to the structure-centric barcodes of persistent homology. The proof-of-concept was tested on data generated by SimRA that simulates different admixture scenarios. We show that the barcode analysis can be used not just to detect the presence of admixture but also estimate the number of admixed populations. We also demonstrate that data-centric RFL plots have the potential to further disentangle the common history into admixture events and relative timing of the events, even in very complex scenarios.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9316Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9317
Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)=|{R_i[a,b] : 1 <= i <= m}| of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pan-genomic setting.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9317Multiple-Choice Knapsack for Assigning Partial Atomic Charges in Drug-Like Molecules
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9318
A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly NP-complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We show that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule by several orders of magnitude, that is, from hours or even days to below a second.Our software is openly available at https://github.com/enitram/charge_assign.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9318l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9319
Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9319Differentially Mutated Subnetworks Discovery
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9320
We study the problem of identifying differentially mutated subnetworks of a large gene-gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard. We propose a novel and efficient algorithm, called DAMOKLE to identify differentially mutated subnetworks given genome-wide mutation data for two sets of cancer samples. We prove that DAMOKLE identifies subnetworks with a statistically significant difference in mutation frequency when the data comes from a reasonable generative model, provided enough samples are available. We test DAMOKLE on simulated and real data, showing that DAMOKLE does indeed find subnetworks with significant differences in mutation frequency and that it provides novel insights not obtained by standard methods.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9320DGEN: A Test Statistic for Detection of General Introgression Scenarios
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9321
When two species hybridize, one outcome is the integration of genetic material from one species into the genome of the other, a process known as introgression. Detecting introgression in genomic data is a very important question in evolutionary biology. However, given that hybridization occurs between closely related species, a complicating factor for introgression detection is the presence of incomplete lineage sorting, or ILS. The D-statistic, famously referred to as the "ABBA-BABA" test, was proposed for introgression detection in the presence of ILS in data sets that consist of four genomes. More recently, D_FOIL - a set of statistics - was introduced to extend the D-statistic to data sets of five genomes.The major contribution of this paper is demonstrating that the invariants underlying both the D-statistic and D_FOIL can be derived automatically from the probability mass functions of gene tree topologies under the null species tree model and alternative phylogenetic network model. Computational requirements aside, this automatic derivation provides a way to generalize these statistics to data sets of any size and with any scenarios of introgression. We demonstrate the accuracy of the general statistic, which we call D_GEN, on simulated data sets with varying rates of introgression, and apply it to an empirical data set of mosquito genomes.We have implemented D_GEN and made it available, both as a graphical user interface tool and as a command-line tool, as part of the freely available, open-source software package ALPHA (https://github.com/chilleo/ALPHA).Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9321PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9322
Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates.We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method.An implementation of PRINCE in the Python language is freely available at https://github.com/WGS-TB/PythonPRINCE.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9322Degenerate String Comparison and Applications
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9323
A generalised degenerate string (GD string) S^ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet, is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. A similar result can be obtained by employing an automata-based approach but its cost is alphabet-dependent. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S^. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S^. Finally, proof-of-concept experimental results are presented using real protein datasets.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9323A Multi-labeled Tree Edit Distance for Comparing "Clonal Trees" of Tumor Progression
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9324
We introduce a new edit distance measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree edit distance (MLTED) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximal common tree. We show that the MLTED measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well. We have implemented our algorithm to compute MLTED exactly and applied it to a variety of data sets successfully. The source code of our method can be found in: https://github.com/khaled-rahman/leafDelTED.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9324Heuristic Algorithms for the Maximum Colorful Subtree Problem
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9325
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9325Parsimonious Migration History Problem: Complexity and Algorithms
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9326
In many evolutionary processes we observe extant taxa in different geographical or anatomical locations. To reconstruct the migration history from a given phylogenetic tree T, one can model locations using an additional character and apply parsimony criteria to assign a location to each internal vertex of T. The migration criterion assumes that migrations are independent events. This assumption does not hold for evolutionary processes where distinct taxa from different lineages comigrate from one location to another in a single event, as is the case in metastasis and in certain infectious diseases. To account for such cases, the comigration criterion was recently introduced, and used as an optimization criterion in the Parsimonious Migration History (PMH) problem. In this work, we show that PMH is NP-hard. In addition, we show that a variant of PMH is fixed parameter tractable (FPT) in the number of locations. On simulated instances of practical size, we demonstrate that our FPT algorithm outperforms a previous integer linear program in terms of running time.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9326The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9327
We propose a new approach for the comparison of mass spectra using a metric known in the computer science under the name of Earth Mover's Distance and in mathematics as the Wasserstein distance. We argue that this approach allows for natural and robust solutions to various problems in the analysis of mass spectra. In particular, we show an application to the problem of deconvolution, in which we infer proportions of several overlapping isotopic envelopes of similar compounds. Combined with the previously proposed generator of isotopic envelopes, IsoSpec, our approach works for a wide range of masses and charges in the presence of several types of measurement inaccuracies. To reduce the computational complexity of the solution, we derive an effective implementation of the Interior Point Method as the optimization procedure. The software for mass spectral comparison and deconvolution based on Wasserstein distance is available at https://github.com/mciach/wassersteinms.Thu, 26 Jul 2018 14:57:11 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9327Automatic Quality Assurance and Release (Dagstuhl Seminar 18122)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9299
This report documents the program and the outcomes of Dagstuhl Seminar 18122 "Automatic Quality Assurance and Release". The main goal of this seminar was to bridge the knowledge divide on how researchers and industry professionals reason about and implement DevOps for automatic quality assurance. Through the seminar, we have built up a common understanding of DevOps tools and practices, but we have also identified major academic and educational challenges for this field of research. Wed, 25 Jul 2018 07:58:29 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9299Machine Learning and Model Checking Join Forces (Dagstuhl Seminar 18121)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9298
This report documents the program and the outcomes of Dagstuhl Seminar 18121 "Machine Learning and Model Checking Join Forces". This Dagstuhl Seminar brought together researchers working in the fields of machine learning and model checking. It helped to identify new research topics on the one hand and to help with current problems on the other hand.Wed, 25 Jul 2018 07:57:39 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9298Coding Theory for Inference, Learning and Optimization (Dagstuhl Seminar 18112)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9297
This report documents the program and the outcomes of Dagstuhl Seminar 18112, "Coding Theory for Inference, Learning and Optimization."Coding theory has recently found new applications in areas such as distributed machine learning, dimension reduction, and variety of statistical problems involving estimation and inference. In machine learning applications that use large-scale data, it is desirable to communicate the results of distributed computations in an efficient and robust manner. In dimension reduction applications, the pseudorandom properties of algebraic codes may be used to construct projection matrices that are deterministic and facilitate algorithmic efficiency. Finally, relationships that have been forged between coding theory and problems in theoretical computer science, such as k-SAT or the planted clique problem, lead to a new interpretation of the sharp thresholds encountered in these settings in terms of thresholds in channel coding theory.The aim of this Dagstuhl Seminar was to draw together researchers from industry and academia that are working in coding theory, particularly in these different (and somewhat disparate) application areas of machine learning and inference. The discussions and collaborations facilitated by this seminar were intended to spark new ideas about how coding theory may be used to improve and inform modern techniques for data analytics.Wed, 25 Jul 2018 07:56:51 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9297Loop Optimization (Dagstuhl Seminar 18111)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9296
This report documents the programme of Dagstuhl Seminar 18111 "Loop Optimization". The seminar brought together experts from three areas: (1) model-based loop optimization, chiefly, in the polyhedron model, (2) rewriting and program transformation, and (3) metaprogramming and symbolic evaluation. Its aim was to review the 20+ years of progress since the Dagstuhl Seminar 9616 "Loop Parallelization" in 1996 and identify the challenges that remain.Wed, 25 Jul 2018 07:56:02 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9296Dynamic Traffic Models in Transportation Science (Dagstuhl Seminar 18102)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9295
Traffic assignment models are crucial for traffic planners to be able to predict traffic distributions, especially, in light of possible changes of the infrastructure, e.g., road constructions, traffic light controls, etc. The starting point of the seminar was the observation that there is a trend in the transportation community (science as well as industry) to base such predictions on complex computer-based simulations that are capable of resolving many elements of a real transportation system. On the other hand, within the past few years, the theory of dynamic traffic assignments in terms of equilibrium existence and equilibrium computation has not matured to the point matching the model complexity inherent in simulations. In view of the above, this interdisciplinary seminar brought together leading scientists in the areas traffic simulations, algorithmic game theory and dynamic traffic assignment as well as people from industry with strong scientific background who identified possible ways to bridge the described gap.Wed, 25 Jul 2018 07:55:51 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9295Scheduling (Dagstuhl Seminar 18101)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9294
This report documents the program and outcomes of the Dagstuhl Seminar 18101 "Scheduling" in March 2018. The seminar brought together algorithmically oriented researchers from two communities with interests in resource management: (i) the scheduling community and (ii) the networking and distributed computing community. The primary objective of the seminar was to expose each community to the important problems and techniques from the other community, and to facilitate dialog and collaboration between researchers. Wed, 25 Jul 2018 07:55:38 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9294LIPIcs, Volume 109, ECOOP'18, Complete Volume
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9301
LIPIcs, Volume 109, ECOOP'18, Complete VolumeTue, 24 Jul 2018 09:56:33 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9301LIPIcs, Volume 111, TQC'18, Complete Volume
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9300
LIPIcs, Volume 111, TQC'18, Complete VolumeTue, 24 Jul 2018 08:05:53 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9300The Logical Execution Time Paradigm: New Perspectives for Multicore Systems (Dagstuhl Seminar 18092)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9293
This report documents the program and the outcomes of Dagstuhl Seminar 18092 "The Logical Execution Time Paradigm: New Perspectives for Multicore Systems". The seminar brought together academic and industrial researchers working on challenges related to the Logical Execution Time Paradigm (LET). The main purpose was to promote a closer interaction between the sub-communities involved in the application of LET to multicore systems, with a particular emphasis on the automotive domain.Mon, 23 Jul 2018 14:20:06 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9293Data Consistency in Distributed Systems: Algorithms, Programs, and Databases (Dagstuhl Seminar 18091)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9292
For decades distributed computing has been mainly an academic subject. Today, it has become mainstream: our connected world demands applications that are inherently distributed, and the usage of shared, distributed, peer-to-peer or cloud-computing infrastructures are increasingly common. However, writing distributed applications that are both correct and well distributed (e.g., highly available) is extremely challenging.In fact, there exists a fundamental trade-off between data consistency, availability, and the ability to tolerate failures. This trade-off has implications on the design of the entire distributed computing infrastructure, including storage systems, compilers and runtimes, application development frameworks and programming languages. Unfortunately, this also has significant implications on the programming model exposed to the designers and developers of applications. We need to enable programmers who are not experts in these subtle aspects to build distributed applications that remain correct in the presence of concurrency, failures, churn, replication, dynamically-changing and partial information, high load, absence of a single line of time, etc.This Dagstuhl Seminar proposes to bring together researchers and practitioners in the areas of distributed systems, programming languages, verifications, and databases. We would like to understand the lessons learnt in building scalable and correct distributed systems, the design patterns that have emerged, and explore opportunities for distilling these into programming methodologies, programming tools, and languages to make distributed computing easier and more accessible.Main issues in discussion:Application writers are constantly making trade-offs between consistency and availability. What kinds of tools and methodologies can we provide to simplify this decision making? How does one understand the implications of a design choice?Available systems are hard to design, test and debug. Do existing testing and debugging tools suffice for identifying and isolating bugs due to weak consistency? How can these problems be identified in production using live monitoring?Can we formalize commonly desired (generic) correctness (or performance) properties? How can we teach programmers about these formalisms and make them accessible to a wide audience?Can we build verification or testing tools to check that systems have these desired correctness properties?How do applications achieve the required properties, while ensuring adequate performance, in practice? What design patterns and idioms work well?To what degree can these properties be guaranteed by the platform (programming language, libraries, and runtime system)? What are the responsibilities of the application developer, and what tools and information does she have?Mon, 23 Jul 2018 14:18:58 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9292Formal Methods for the Synthesis of Biomolecular Circuits (Dagstuhl Seminar 18082)
http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9291
This report documents the program and the outcomes of Dagstuhl Seminar 18082 "Formal Methods for the Synthesis of Biomolecular Circuits". Synthetic biology aims for the rational bottom-up engineering of new biological functionalities. Recent years have witnessed an increase in the degree of "rationality" in the design of synthetic biomolecular circuits. With it, fewer design-build-test cycles were necessary to achieve a desired circuit performance. Most of these success stories reported the realization of logic circuits, typically operating via regulation of gene expression and/or direct manipulation of DNA sequences with recombinases, executing combinatorial and sometimes sequential logic. This was often achieved with the help of two ingredients, a library of previously well-characterized parts and some computational modeling. Hence, although circuits in synthetic biology are still by far less understood and characterized than electronic circuits, the opportunity for the formal synthesis of circuit designs with respect to a behavioral specification starts to emerge in synthetic biology.Mon, 23 Jul 2018 14:18:29 +0200http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=9291