https://www.dagstuhl.de/09451

November 1 – 6 , 2009, Dagstuhl Seminar 09451

Geometric Networks, Metric Space Embeddings and Spatial Data Mining

Organizers

Gautam Das (University of Texas at Arlington, US)
Joachim Gudmundsson (NICTA – Sydney, AU)
Rolf Klein (Universität Bonn, DE)
Christian Knauer (FU Berlin, DE)
Michiel Smid (Carleton University – Ottawa, CA)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Seminar Proceedings DROPS
List of Participants

Press Review

Summary

This seminar has brought together scientists from three different communities (geometric networks, metric space embeddings, and spatial data mining) who have numerous interests in common, all of which are related to distance problems. The seminar was a continuation of the Dagstuhl seminar 06481 (Geometric Networks and Metric Space Embeddings) which was held in 2006. The main purpose of the current seminar was to continue, and intensify, the cooperation between the geometric network and the metric space communities. This time, we invited people from spatial data mining to provide the extra application stimulus.

A geometric network is a graph mapped into the Euclidean plane or a Euclidean space of low dimension. It is said to be a spanner if the network distance between any pair of nodes is bounded by a constant times their Euclidean distance. Geometric networks are the backbone of any model for the flow of goods, traffic or information. They also play an important role in telecommunication, VLSI design, motion planning (robotics), pattern matching, data compression, bio-informatics (gene analysis), and sensor networks. One is interested in spanners with other useful properties such as a linear number of edges, small total edge length, small node degree, few crossings, or small link diameter. Apart from these applications, geometric spanners have had great impact on the construction of approximation algorithms, e.g., for the traveling salesperson problem.

The similarity between individual objects of a given finite domain becomes apparent if the objects can be represented by points in the plane, or in 3-space, in such a way that the Euclidean distance between two points corresponds to the similarity of the objects they are associated with. The question when such representations exist has led to the theory of embedding finite metric spaces into normed vector spaces. It is of particular interest for storage, visualization, and retrieval of high-dimensional data, e.g., in information retrieval.

Both problems (metric space embedding and spanner construction) have received a lot of attention during the last 10-15 years, within their respective communities. Indeed, the first monograph on spanners (co-authored by the 5th organizer) has been published meanwhile, and metric space embeddings have been addressed in several books and book chapters. In both cases, we are applying two different metrics to the point pairs of the same set, and we are looking for the maximum (or minimum) ratio of all values. In metric space embeddings we compare the measure of similarity of the objects against the (Euclidean) distance of their associated points, and the maximum ratio is called the distortion of the embedding. In a spanning geometric network, we compare the shortest path distance in the network against the Euclidean distance between two points; here, the extremal ratio is called dilation or stretch factor. In both areas many questions are open. For example, it is not known how to construct the triangulation of minimum dilation over a given point set, with or without extra Steiner points allowed.

Data mining can be seen as the science of extracting useful information from large data sets or databases. It is the principle of sorting through large amounts of data and finding relevant information. It is usually used by business intelligence organizations and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. In many applications of data mining, the high dimensionality of the data restricts the choice of data processing methods. Such application areas include the analysis of market basket data, text documents, image data and so on; in these cases the dimensionality is large due to either a wealth of alternative products, a large vocabulary, or the use of large image windows, respectively. A common tool used to develop efficient algorithms is to reduce the number of dimensions. A statistically optimal way of dimensionality reduction is to project the data onto a lower-dimensional orthogonal subspace that captures as much of the variation of the data as possible. The best (in mean-square sense) and most widely used way to do this is principal component analysis (PCA); unfortunately, it is quite expensive to compute for high-dimensional data sets. A computationally simple method of dimensionality reduction that does not introduce a signifficant distortion in the data set is random projection. Here, the original high-dimensional data is projected onto a lower-dimensional subspace using a random matrix whose columns have unit lengths. Random projection has been found to be a computationally efficient, yet sufficiently accurate method for dimensionality reduction of high-dimensional data sets.

The seminar's aim was at crossfertilization between the three communities. The main results of the seminar can be summarized as follows.

  • During the seminar, it became clear that methods developed in the theory of finite metric spaces help in analyzing geometric networks. Conversely, the algorithmic techniques developed in geometric network design are of interest to people working on embedding problems.
  • There was a fruitful exchange of ideas which stimulated interesting discussions and future cooperations.
  • The seminar will advance a comparative theory of distance measures.
  • The knowledge gained during the seminar will help in reducing the complexity of high-dimensional data, as is important in data mining and related areas.

Related Dagstuhl Seminar

Classification

  • Data Structures
  • Algorithms
  • Complexity
  • Data Mining

Keywords

  • Geometric networks
  • Metric space embedding
  • Data mining

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.