Dagstuhl Seminar 07071: Web Information Retrieval and Linear Algebra Algorithms

Dagstuhl Seminar 07071

Web Information Retrieval and Linear Algebra Algorithms

( Feb 11 – Feb 16, 2007 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/07071

Organizers

Andreas Frommer (Bergische Universität Wuppertal, DE)
Michael W. Mahoney (Yahoo Research - Sunnyvale, US)
Daniel B. Szyld (Temple University - Philadelphia, US)

Contact

Dagstuhl Service Team

Publications

Web Information Retrieval and Linear Algebra Algorithms. Andreas Frommer, Michael W. Mahoney, and Daniel B. Szyld (Eds.). Dagstuhl Seminar Proceedings, Volume 7071. June 28, 2007

Summary

Show Summary

A seminar concentrating on the intersection of the fields of information retrieval and other web-related aspects with numerical and applied linear algebra techniques was held with the attendance of scientists from industry and academia.

The scientific community has witnessed the increasing importance of linear algebra algorithms and of Markov chain modeling in several applications from computer science. Of particular importance is linear algebra algorithms to study the structure of the Web and information retrieval (IR) on the Web. The main focus of the seminar was the evolving theory and computational aspects of methods for web information retrieval, including search engines, that are inspired by traditional and recent advances in algorithms for linear algebra problems. To this end, the seminar brought together scientists from academia with background in computer science or numerical mathematics and scientists working in industry, mostly from Yahoo Research (both from the US and Europe).

Structure of the seminar

The seminar was attended by forty-seven participants coming from thirteen different countries. We had a good mixture of graduate students, young researchers, scientists in mid-career, and senior investigators from academia and industry. There was a total of thirty-one talks. Due to the diverse backgrounds of the attendees it was decided to have five longer expository talks which included introductions to the subjects and methods of the respective fields.

Outcome of the seminar

We want to highlight that the seminar really fostered interaction between people from academia and industry. Many participants observed that they benefited greatly from the contributions presented from researchers working in other fields or other settings.

Among the findings of this seminar, we mention the following: While it became clear from the scientists working in web retrieval that Pagerank now is just a minor ingredient in web ranking algorithms, it turns out that Pageranklike approaches continue to play an important role in other areas such as social science or community behavior. In this area, but also in more advanced, semantic models, the properties of eigenvalues and eigenvectors of huge sparse matrices and their computation continue to be at the heart of current research. Similarly, other classical matrix factorization techniques like the singular value decomposition have new applications, for example, in cluster analysis.

Techniques using low rank (and thus data efficient) approximations to huge matrices become increasingly important for data analysis and representation. For example, recent work has focused on employing randomization to improve low-rank computations and also large statistical regression problems. A particularly difficult issue is that traditional methods such as the SVD and QR decomposition destroy sparsity. Thus, low-rank approximations that respect sparsity are important. A second issue is that in many applications, one is not interested in the results of low-rank computations per se, but instead one wants to use it to learn from the data. Thus, studying matrix decompositions with good learning or generalization properties is important. Relatedly, in many cases an important question has to do with the best way to represent the data, i.e., which vector space is most appropriate to model the data in order to perform efficient computations.

Asynchronous iterative approaches, as they arise naturally in loosely coupled networks of processors have been analyzed from the theoretical side and are being used in practice. One challenging problem discussed, was that of data streams which cannot be stored, so that standard numerical techniques have to be enhanced, for example, with statistical analyses or using novel algorithmic methods. Another point of intersection between the disciplines were novel graph partitioning approaches using iterative methods from numerical linear algebra. This represents a particularly challenging direction since the local geometry of the data that arise in Web IR applications is very different than the geometry that arises in traditional applications.

Participants

Show Participants

Martin Afanasjew (University of Dundee, GB)
H. Bast (MPI für Informatik - Saarbrücken, DE)
Michael Berry (University of Tennessee, US)
Dario Andrea Bini (University of Pisa, IT)
Stefan Borovac (Bergische Universität Wuppertal, DE)
Peter Buchholz (TU Dortmund, DE) [dblp]
Anirban DasGupta (Yahoo! Research - Santa Clara, US)
Tugrul Dayar (Bilkent University - Ankara, TR)
Gianna Maria Del Corso (University of Pisa, IT)
Debora Donato (Yahoo Research - Barcelona, ES)
Michael Eiermann (TU Bergakademie Freiberg, DE)
Lars Elden (Linköping University, SE)
Robert Elsässer (Universität Paderborn, DE) [dblp]
Andreas Frommer (Bergische Universität Wuppertal, DE) [dblp]
Efstratios Gallopoulos (University of Patras, GR)
David F. Gleich (Stanford University, US) [dblp]
Gene H. Golub (Stanford University, US)
Chen Greif (University of British Columbia - Vancouver, CA)
Stefan Güttel (TU Bergakademie Freiberg, DE)
Boulos Harb (University of Pennsylvania - Philadelphia, US)
Stephen Kirkland (University of Regina, CA) [dblp]
Philip Knight (The University of Strathclyde - Glasgow, GB)
Giorgios Kollias (University of Patras, GR)
Udo Krieger (Universität Bamberg, DE) [dblp]
Bruno Lang (Bergische Universität Wuppertal, DE)
Kevin Lang (Yahoo! Research - Santa Clara, US)
Amy N. Langville (College of Charleston, US)
Sebastien Loisel (Temple University - Philadelphia, US)
Julia Luxenburger (MPI für Informatik - Saarbrücken, DE)
Michael W. Mahoney (Yahoo Research - Sunnyvale, US) [dblp]
Ivo Marek (Czech Technical University - Prague, CZ)
Volker Mehrmann (TU Berlin, DE)
Beatrice Meini (University of Pisa, IT)
Henning Meyerhenke (Universität Paderborn, DE) [dblp]
Reinhard Nabben (TU Berlin, DE)
Michela Redivo-Zaglia (University of Padova, IT)
Tamás Sarlós (Hungarian Academy of Sciences - Budapest, HU) [dblp]
Stefano Serra Capizzano (Università dell'Insubria - Como, IT)
Daniel B. Szyld (Temple University - Philadelphia, US)
Bora Uçar (Cerfacs - Toulouse, FR) [dblp]
Paul Van Dooren (UC Louvain-la-Neuve, BE)
Sebastiano Vigna (University of Milan, IT) [dblp]
Elena Virnik (TU Berlin, DE)
Hugo Zaragoza (Yahoo Research - Barcelona, ES)

Classification

data bases / information retrieval
modelling / simulation
web
security / cryptography
networks
interdisciplinary

Keywords

Markov chains
numerical methods
web information retrieval
performance evaluation
intrusion detection
aggregation-disaggregation methods
graph-oriented decomposition

Seminar 07071

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 07071

Web Information Retrieval and Linear Algebra Algorithms

( Feb 11 – Feb 16, 2007 )

Permalink

Organizers

Contact

Publications

Summary

Structure of the seminar

Outcome of the seminar

Participants

Classification

Keywords