https://www.dagstuhl.de/19241

10. – 14. Juni 2019, Dagstuhl-Seminar 19241

25 Years of the Burrows-Wheeler Transform

Organisatoren

Travis Gagie (Universidad Diego Portales, CL)
Giovanni Manzini (University of Eastern Piedmont – Alessandria, IT)
Gonzalo Navarro (University of Chile – Santiago de Chile, CL)
Jens Stoye (Universität Bielefeld, DE)

Auskunft zu diesem Dagstuhl-Seminar erteilen

Susanne Bach-Bernhard zu administrativen Fragen

Andreas Dolzmann zu wissenschaftlichen Fragen

Dokumente

Teilnehmerliste
Gemeinsame Dokumente
Programm des Dagstuhl-Seminars [pdf]

Motivation

When it was introduced in a technical report in May 1994, no one could have foreseen the impact the Burrows-Wheeler Transform (BWT) would have far beyond the field of data compression for which it was originally intended. Of course it first made a significant impact on compression, both in theory and in practice (e.g., as the basis for bzip2). New horizons opened up in 2000 with the introduction of the FM-index, a compressed suffix array based on the BWT. Among other applications, in the next decade FM-indexes became the heart of the DNA aligners such as Bowtie, BWA and SOAP 2 that helped pave the way for the genomics revolution.

Generalizations of BWT to labelled trees, de Bruijn graphs, automata, haplotype sequences and genomic reference graphs have kept the exchange of ideas lively between researchers in algorithms and data structures, bioinformatics, combinatorics on words and information retrieval. Burrows and Wheeler's original technical report is still cited hundreds of times every year, subsequent papers are cited thousands of times, and new results about or using the BWT appear in the top conferences and journals.

By now, 25 years since its publication, probably no one person knows all the results that have been proven about the BWT, but we hope the expertise gathered together at this Dagstuhl Seminar will make progress on the following topics, among others:

FM-indexes for genomic databases: FM-indexes shine for indexing one or a few genomes, but they have not scaled well to indexing the genomic databases that have resulted from high-throughput sequencing technologies. An important problem has been that the suffix array samples used to locate occurrences of patterns must be fairly large or locating becomes very slow. Very recently, a way was discovered to greatly compress also the suffix array sample for repetitive texts, opening the door to indexing thousands of genomes. We expect this seminar will lead to a fuller understanding of this advance and how it can be applied in practice. Another challenge has been beating run-length compression of genomic databases’ BWTs, by identifying additional structure.

More generalizations of the BWT: Many BWT researchers have heard of the generalizations to trees and graphs mentioned above, but it seems few except specialists in algorithms and data structures know about its recent extension to indexed parameterized and order-preserving pattern matching, few except specialists in combinatorics on words know about the alternating BWT, and few except bioinformaticians know about the positional BWT — but each of these may have applications in the other areas. Also, a partially unifying framework has recently been proposed, but there are still many open problems.

New challenges in bioinformatics: Papers on the BWT are published in many venues and no single conference brings together all the experts from algorithms and data structures, combinatorics on words, and theoretical and applied bioinformatics. This disconnect between the areas hurts us all because it prevents knowledge being shared efficiently. The BWT has recently been applied to some surprising bioinformatics problems, such as building ancestral recombination graphs and optical read mapping, and we expect other possibilities will emerge from interdisciplinary discussions.

Motivation text license
  Creative Commons BY 3.0 DE
  Travis Gagie, Giovanni Manzini, Gonzalo Navarro, and Jens Stoye

Classification

  • Bioinformatics
  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Burrows-Wheeler Transform
  • Data Compression
  • Indexing
  • Sequence Alignment

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.