https://www.dagstuhl.de/09451

01. – 06. November 2009, Dagstuhl Seminar 09451

Geometric Networks, Metric Space Embeddings and Spatial Data Mining

Organisatoren

Gautam Das (University of Texas at Arlington, US)
Joachim Gudmundsson (NICTA – Sydney, AU)
Rolf Klein (Universität Bonn, DE)
Christian Knauer (FU Berlin, DE)
Michiel Smid (Carleton University – Ottawa, CA)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Seminar Proceedings DROPS
Teilnehmerliste

Press Review

Summary

This seminar has brought together scientists from three different communities (geometric networks, metric space embeddings, and spatial data mining) who have numerous interests in common, all of which are related to distance problems. The seminar was a continuation of the Dagstuhl seminar 06481 (Geometric Networks and Metric Space Embeddings) which was held in 2006. The main purpose of the current seminar was to continue, and intensify, the cooperation between the geometric network and the metric space communities. This time, we invited people from spatial data mining to provide the extra application stimulus.

A geometric network is a graph mapped into the Euclidean plane or a Euclidean space of low dimension. It is said to be a spanner if the network distance between any pair of nodes is bounded by a constant times their Euclidean distance. Geometric networks are the backbone of any model for the flow of goods, traffic or information. They also play an important role in telecommunication, VLSI design, motion planning (robotics), pattern matching, data compression, bio-informatics (gene analysis), and sensor networks. One is interested in spanners with other useful properties such as a linear number of edges, small total edge length, small node degree, few crossings, or small link diameter. Apart from these applications, geometric spanners have had great impact on the construction of approximation algorithms, e.g., for the traveling salesperson problem.

The similarity between individual objects of a given finite domain becomes apparent if the objects can be represented by points in the plane, or in 3-space, in such a way that the Euclidean distance between two points corresponds to the similarity of the objects they are associated with. The question when such representations exist has led to the theory of embedding finite metric spaces into normed vector spaces. It is of particular interest for storage, visualization, and retrieval of high-dimensional data, e.g., in information retrieval.

Both problems (metric space embedding and spanner construction) have received a lot of attention during the last 10-15 years, within their respective communities. Indeed, the first monograph on spanners (co-authored by the 5th organizer) has been published meanwhile, and metric space embeddings have been addressed in several books and book chapters. In both cases, we are applying two different metrics to the point pairs of the same set, and we are looking for the maximum (or minimum) ratio of all values. In metric space embeddings we compare the measure of similarity of the objects against the (Euclidean) distance of their associated points, and the maximum ratio is called the distortion of the embedding. In a spanning geometric network, we compare the shortest path distance in the network against the Euclidean distance between two points; here, the extremal ratio is called dilation or stretch factor. In both areas many questions are open. For example, it is not known how to construct the triangulation of minimum dilation over a given point set, with or without extra Steiner points allowed.

Data mining can be seen as the science of extracting useful information from large data sets or databases. It is the principle of sorting through large amounts of data and finding relevant information. It is usually used by business intelligence organizations and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. In many applications of data mining, the high dimensionality of the data restricts the choice of data processing methods. Such application areas include the analysis of market basket data, text documents, image data and so on; in these cases the dimensionality is large due to either a wealth of alternative products, a large vocabulary, or the use of large image windows, respectively. A common tool used to develop efficient algorithms is to reduce the number of dimensions. A statistically optimal way of dimensionality reduction is to project the data onto a lower-dimensional orthogonal subspace that captures as much of the variation of the data as possible. The best (in mean-square sense) and most widely used way to do this is principal component analysis (PCA); unfortunately, it is quite expensive to compute for high-dimensional data sets. A computationally simple method of dimensionality reduction that does not introduce a signifficant distortion in the data set is random projection. Here, the original high-dimensional data is projected onto a lower-dimensional subspace using a random matrix whose columns have unit lengths. Random projection has been found to be a computationally efficient, yet sufficiently accurate method for dimensionality reduction of high-dimensional data sets.

The seminar's aim was at crossfertilization between the three communities. The main results of the seminar can be summarized as follows.

  • During the seminar, it became clear that methods developed in the theory of finite metric spaces help in analyzing geometric networks. Conversely, the algorithmic techniques developed in geometric network design are of interest to people working on embedding problems.
  • There was a fruitful exchange of ideas which stimulated interesting discussions and future cooperations.
  • The seminar will advance a comparative theory of distance measures.
  • The knowledge gained during the seminar will help in reducing the complexity of high-dimensional data, as is important in data mining and related areas.

Related Dagstuhl Seminar

Classification

  • Data Structures
  • Algorithms
  • Complexity
  • Data Mining

Keywords

  • Geometric networks
  • Metric space embedding
  • Data mining

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.