http://www.dagstuhl.de/17181

May 1 – 5 , 2017, Dagstuhl Seminar 17181

Theory and Applications of Hashing

Organizers

Martin Dietzfelbinger (TU Ilmenau, DE)
Michael Mitzenmacher (Harvard University – Cambridge, US)
Rasmus Pagh (IT University of Copenhagen, DK)
David P. Woodruff (IBM Almaden Center – San Jose, US)

Coordinators

Martin Aumüller (IT University of Copenhagen, DK)


1 / 5 >

For support, please contact

Susanne Bach-Bernhard for administrative matters

Andreas Dolzmann for scientific matters

Dagstuhl Reports

As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.

Documents

List of Participants
Shared Documents
Dagstuhl Seminar Wiki

(Use seminar number and access code to log in)

Motivation

The idea of hashing was proposed in the 1950s as an efficient method for implementing symbol tables in compilers. In succeeding decades it has emerged as an algorithmic tool that goes well beyond its original purpose, providing solutions for a wide range of algorithmic problems. While the theory of hashing may appear mature, in fact, many new advances have been made in recent years. The rich interplay between theory and practice in the field of hashing cannot be overstated. Applications drive the need for new algorithms and data structures based on hashing, as well as the need for more efficient hash function families with provable theoretical guarantees. Specific implementations developed in “the field” often do not have theoretical guarantees on performance or accuracy, creating new theoretical problems and driving a need to make theory and practice meet. Thus, the aim of this Dagstuhl Seminar is to bring together researchers with an interest in hashing methods (and more generally random mappings) from various perspectives, spanning theory and a diverse set of application areas. In this way we hope to identify opportunities for further advancing the field.

The theoretical side includes aspects such as the design of hash functions, the design of algorithm and data structure primitives, and the mathematical analysis of hashing schemes. For hash function design, since Carter and Wegman proposed universal hashing in 1977, there has been a fertile research agenda identifying sufficient randomness properties of hash functions for various applications, and on the other hand devising resource-efficient hash functions having these properties. While new simple and efficient hash function constructions with strong theoretical properties keep appearing, there are still many applications of hashing for which no efficient hash function construction is known. At the same time it is increasingly clear that traditional measures of hash function performance are inadequate to describe the randomness properties needed in many applications, and a more direct analysis is needed.

While hashing is interesting in its own right, it has also become a foundational building block for higher level algorithms and data structures, each of which can in turn often find uses in a variety of application spaces. The list of such basic algorithms and structures is long, and growing: cuckoo hash tables, Bloom filters of various kinds, including invertible Bloom lookup tables, balanced allocations, expanders derived from hashing, count-min filters, sketches of many different kinds, similarity measures for text and images, and so on. These algorithms and data structures provide the interface between the lower level of hash function design and the higher level applications; often they extend what we require from the hash functions.

Regarding applications, hashing is of course heavily used in information storage and retrieval contexts. It has been a central tool in the design of algorithms for data streams. In recent years the use of hashing has spread to many other settings where data is stored and accessible, but scalability can be achieved only by resorting to approximation algorithms. One early success story in this direction is min-wise hashing for identifying (near-)duplicate web pages. Bloom filters have become a staple in systems, with countless variations expanding on its initial functionality, such as counts associated with set elements or aging out of set items as new elements are dynamically added. In the field of machine learning, random mappings of data to lower-dimensional vectors that are easier to handle is of increasing importance for big data applications. Designing randomized mappings that meet the criteria of machine learning applications has been an active research area in recent years. We believe the topic of the Dagstuhl Seminar is highly relevant for companies dealing with big data, as some of the invitees who made fundamental contributions to the theory and applications of hashing are now employed in industry; we expect that applications beyond information retrieval, data stream analysis, and machine learning will also be discussed at the Dagstuhl Seminar.

License
  Creative Commons BY 3.0 DE
  Martin Dietzfelbinger and Michael Mitzenmacher and Rasmus Pagh and David P. Woodruff

Classification

  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Hash function construction and analysis
  • Hashing primitives
  • Information retrieval applications
  • Locality-sensitive hashing
  • Data streaming applications
  • Machine learning applications
  • Connections to complexity theory

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.