https://www.dagstuhl.de/09451
November 1 – 6 , 2009, Dagstuhl Seminar 09451
Geometric Networks, Metric Space Embeddings and Spatial Data Mining
Organizers
Gautam Das (University of Texas at Arlington, US)
Joachim Gudmundsson (NICTA – Sydney, AU)
Rolf Klein (Universität Bonn, DE)
Christian Knauer (FU Berlin, DE)
Michiel Smid (Carleton University – Ottawa, CA)
For support, please contact
Documents
Dagstuhl Seminar Proceedings
List of Participants
Press Room
Press Review

Geometrische Netzwerke, Einbettung metrischer Räume und Analyse räumlicher Daten.
Wolfgang Back und Wolfgang Rudolph vom Computerclub Zwei im Gespräch mit Prof. Dr. Rolf Klein (Interview ab 20:50 min)
Summary
This seminar has brought together scientists from three different communities (geometric networks, metric space embeddings, and spatial data mining) who have numerous interests in common, all of which are related to distance problems. The seminar was a continuation of the Dagstuhl seminar 06481 (Geometric Networks and Metric Space Embeddings) which was held in 2006. The main purpose of the current seminar was to continue, and intensify, the cooperation between the geometric network and the metric space communities. This time, we invited people from spatial data mining to provide the extra application stimulus.
A geometric network is a graph mapped into the Euclidean plane or a Euclidean space of low dimension. It is said to be a spanner if the network distance between any pair of nodes is bounded by a constant times their Euclidean distance. Geometric networks are the backbone of any model for the flow of goods, traffic or information. They also play an important role in telecommunication, VLSI design, motion planning (robotics), pattern matching, data compression, bioinformatics (gene analysis), and sensor networks. One is interested in spanners with other useful properties such as a linear number of edges, small total edge length, small node degree, few crossings, or small link diameter. Apart from these applications, geometric spanners have had great impact on the construction of approximation algorithms, e.g., for the traveling salesperson problem.
The similarity between individual objects of a given finite domain becomes apparent if the objects can be represented by points in the plane, or in 3space, in such a way that the Euclidean distance between two points corresponds to the similarity of the objects they are associated with. The question when such representations exist has led to the theory of embedding finite metric spaces into normed vector spaces. It is of particular interest for storage, visualization, and retrieval of highdimensional data, e.g., in information retrieval.
Both problems (metric space embedding and spanner construction) have received a lot of attention during the last 1015 years, within their respective communities. Indeed, the first monograph on spanners (coauthored by the 5th organizer) has been published meanwhile, and metric space embeddings have been addressed in several books and book chapters. In both cases, we are applying two different metrics to the point pairs of the same set, and we are looking for the maximum (or minimum) ratio of all values. In metric space embeddings we compare the measure of similarity of the objects against the (Euclidean) distance of their associated points, and the maximum ratio is called the distortion of the embedding. In a spanning geometric network, we compare the shortest path distance in the network against the Euclidean distance between two points; here, the extremal ratio is called dilation or stretch factor. In both areas many questions are open. For example, it is not known how to construct the triangulation of minimum dilation over a given point set, with or without extra Steiner points allowed.
Data mining can be seen as the science of extracting useful information from large data sets or databases. It is the principle of sorting through large amounts of data and finding relevant information. It is usually used by business intelligence organizations and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. In many applications of data mining, the high dimensionality of the data restricts the choice of data processing methods. Such application areas include the analysis of market basket data, text documents, image data and so on; in these cases the dimensionality is large due to either a wealth of alternative products, a large vocabulary, or the use of large image windows, respectively. A common tool used to develop efficient algorithms is to reduce the number of dimensions. A statistically optimal way of dimensionality reduction is to project the data onto a lowerdimensional orthogonal subspace that captures as much of the variation of the data as possible. The best (in meansquare sense) and most widely used way to do this is principal component analysis (PCA); unfortunately, it is quite expensive to compute for highdimensional data sets. A computationally simple method of dimensionality reduction that does not introduce a signifficant distortion in the data set is random projection. Here, the original highdimensional data is projected onto a lowerdimensional subspace using a random matrix whose columns have unit lengths. Random projection has been found to be a computationally efficient, yet sufficiently accurate method for dimensionality reduction of highdimensional data sets.
The seminar's aim was at crossfertilization between the three communities. The main results of the seminar can be summarized as follows.
 During the seminar, it became clear that methods developed in the theory of finite metric spaces help in analyzing geometric networks. Conversely, the algorithmic techniques developed in geometric network design are of interest to people working on embedding problems.
 There was a fruitful exchange of ideas which stimulated interesting discussions and future cooperations.
 The seminar will advance a comparative theory of distance measures.
 The knowledge gained during the seminar will help in reducing the complexity of highdimensional data, as is important in data mining and related areas.
Related Dagstuhl Seminar
 06481: "Geometric Networks and Metric Space Embeddings " (2006)
Classification
 Data Structures
 Algorithms
 Complexity
 Data Mining
Keywords
 Geometric networks
 Metric space embedding
 Data mining