Dagstuhl Seminar 16351: Next Generation Sequencing – Algorithms, and Software For Biomedical Applications

Dagstuhl Seminar 16351

Next Generation Sequencing – Algorithms, and Software For Biomedical Applications

( Aug 28 – Sep 02, 2016 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/16351

Organizers

Gene Myers (MPI - Dresden, DE)
Mihai Pop (University of Maryland - College Park, US)
Knut Reinert (FU Berlin, DE)
Tandy Warnow (University of Illinois - Urbana-Champaign, US)

Contact

Andreas Dolzmann (for scientific matters)
Annette Beyer (for administrative matters)

Dagstuhl Seminar Wiki

Dagstuhl Seminar Wiki (Use personal credentials as created in DOOR to log in)

Publications

Next Generation Sequencing (Dagstuhl Seminar 16351). Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow. In Dagstuhl Reports, Volume 6, Issue 8, pp. 91-130, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)

Motivation

Show Motivation

In recent years, Next Generation Sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling, to name but a few. Other fields of biological research, such as phylogenomics, functional genomics, and metagenomics, are also making increasing use of the new sequencing technologies.

The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. Addressing these issues requires expertise in a large variety of areas: algorithm design, high performance computing on big data (and hardware acceleration), statistical modeling and estimation, and specific domain knowledge for each medical problem. In this Dagstuhl Seminar we aim at bringing together leading experts from both sides – computer scientists including theoreticians, algorithmicists and tool developers, as well as leading researchers who work primarily on the application side in the biomedical sector – to discuss the state-of-the art and to identify areas of research that might benefit from a joint effort of all the groups involved.

The key goal of this seminar is a free and deep exchange of ideas and needs between the communities of algorithmicists and theoreticians and practitioners from the biomedical field. On the one hand, state-of-the-art methods from computer science will be presented to experimentalists. On the other hand, leading experimentalists will present novel techniques and sketch problems arising from these techniques. Following this exchange we will discuss the implications that new types of data or experimental protocols have on the needed algorithms or data structures. The schedule will be such that we move from applications to algorithms and implementations, so that the experimentalists can leave the seminar earlier if they wish.

We envision a number of topics that we will address in the seminar such as:

Data structures and algorithms for large data sets: The Burrows-Wheeler transform (BWT) and the subsequent search indices by Ferragina and Manzini triggered a plethora of BWT-based read mappers. Fields like pan- and metagenomics, phylogenomics, and genome analysis of large cohorts, will pose new computational problems and challenges.
Challenges arising from new experimental frontiers: New experimental technologies offer exciting opportunities for computational research. Examples of such technologies are single cell sequencing, long read sequencing, long-range restriction mapping, and hybrid sequencing strategies (e.g. Moleculo and 10X Genomics). We will discuss the impact of new technologies on the underling computational models and methods.
Software engineering (tools, testing, and libraries), hardware acceleration: NGS sequence analysis is currently largely driven by academic tools that exhibit considerable variability in their quality and maintainability. This is somewhat disconcerting, given that such tools will soon be used to compute results upon which treatment decisions will be made. The use of software libraries or adherence to a set of coding standards would help alleviate the problem, but neither are currently in widespread use. Hence there is a need to discuss this issue.
New problems in the upcoming age of genomes: Genome structure and variation: Genomic variations between individuals and within (microbial) communities are increasingly being linked to interesting biological effects, e.g. in cancer genesis. Variations include not only single nucleotide variations (SNVs) but also larger scale, so called, structural variations. While a number of methods for detecting variants already exist, approaches for analyzing the impact of variants on specific phenotypes are still in their infancy.
Training of interdisciplinary experts: The current and future computational approaches can rarely be used blindly, and should often be adapted to the experiment at hand. There is thus a strong need to train interdisciplinary experts. Given the time constraints in undergraduate and graduate curricula, it is challenging to identify what skills should be taught. Also, the dynamic nature of our field makes textbooks and other teaching materials obsolete in a matter of years. How can the development of appropriate training materials keep up with the rapid changes in our field in terms of content and types of skills necessary to future biomedical scientists?

Summary

Show Summary

Motivation

The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. Addressing these issues requires expertise in a large variety of areas: algorithm design, high performance computing on big data (and hardware acceleration), statistical modeling and estimation, and specific domain knowledge for each medical problem. In this Dagstuhl Seminar we aimed at bringing together leading experts from both sides – computer scientists including theoreticians, algorithmicists and tool developers, as well as leading researchers who work primarily on the application side in the biomedical sector – to discuss the state-of-the art and to identify areas of research that might benefit from a joint effort of all the groups involved.

Goals of the seminar

The key goal of this seminar was a free and deep exchange of ideas and needs between the communities of algorithmicists and theoreticians and practitioners from the biomedical field. This exchange should have triggered discussions about the implications that new types of data or experimental protocols have on the needed algorithms or data structures.

Results

We started the seminar with a number of challenge talks to encourage discussion about the various topics introduced in the proposal. Before the seminar started we identified three areas the participants were most interested in, namely:

Data structures and algorithms for large data sets, hardware acceleration
New problems in the upcoming age of genomes
Challenges arising from new experimental frontiers and validation

For the first area Laurent Mouchard, Gene Myers, and Simon Gog presented results and challenges; for the second area Siavash Mirarab, Niko Beerenwinkel, Shibu Yooseph, and Kay Nieselt introduced some thoughts; and finally, for the last area, Jason Chin, Ewan Birney, Alice McHardy, and Pascal Costanza talked about challenges. For most of those talks the abstracts can be found below. Following this introductionary phase, the participants organized themselves into various working groups the topics of which were relatively broad. Those first breakout groups were about

Haplotype phasing
Big data
Pangenomics data representation
Cancer genomics
Metagenomics
Assembly

The results of the groups were discussed in plenary sessions interleaved with some impromptu talks. As a result the participants split up into smaller, more focused breakout groups that were received very well. Indeed, some participants did already extend data formats for assembly or improved recent results on full text string indices.

Based on the initial feedback from the participants we think that the topic of the seminar was interesting and led to a lively exchange of ideas. We thus intend to revisit the field in the coming years in a Dagstuhl seminar again, most likely organized by different leaders of the field in order to account for these upcoming changes. In such a seminar we intend to encourage more people from clinical bioinformatics to join into the discussions.

Creative Commons BY 3.0 Unported license

Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow

Participants

Show Participants

Niko Beerenwinkel (ETH Zürich - Basel, CH) [dblp]
Ewan Birney (European Bioinformatics Institute - Cambridge, GB) [dblp]
Christina Boucher (Colorado State University - Fort Collins, US) [dblp]
Jason Chin (PACIFIC BIOSCIENCES - Menlo Park, US)
Pascal Costanza (Intel Corporation, BE) [dblp]
Anthony J. Cox (Illumina - United Kingdom, GB) [dblp]
Fabio Cunial (MPI - Dresden, DE) [dblp]
Richard Durbin (Wellcome Trust Sanger Institute - Cambridge, GB) [dblp]
Mohammed El-Kebir (Brown University - Providence, US) [dblp]
Anne-Katrin Emde (New York Genome Center, US) [dblp]
Simon Gog (KIT - Karlsruher Institut für Technologie, DE) [dblp]
Hannes Hauswedell (FU Berlin, DE) [dblp]
Daniel H. Huson (Universität Tübingen, DE) [dblp]
André Kahles (ETH Zürich, CH) [dblp]
Birte Kehr (deCode Genetics - Reykjavik, IS) [dblp]
Gunnar W. Klau (CWI - Amsterdam, NL) [dblp]
Oliver Kohlbacher (Universität Tübingen, DE) [dblp]
Ben Langmead (Johns Hopkins University - Baltimore, US) [dblp]
Pietro Lio (University of Cambridge, GB) [dblp]
Veli Mäkinen (University of Helsinki, FI) [dblp]
Tobias Marschall (Universität des Saarlandes, DE) [dblp]
Alice Carolyn McHardy (Helmholtz Zentrum - Braunschweig, DE) [dblp]
Siavash Mirarab (University of California at San Diego, US) [dblp]
Laurent Mouchard (University of Rouen, FR) [dblp]
Gene Myers (MPI - Dresden, DE) [dblp]
Luay Nakhleh (Rice University - Houston, US) [dblp]
Kay Katja Nieselt (Universität Tübingen, DE) [dblp]
Enno Ohlebusch (Universität Ulm, DE) [dblp]
Adam M. Phillippy (National Institutes of Health - Rockville, US) [dblp]
Mihai Pop (University of Maryland - College Park, US) [dblp]
Simon J. Puglisi (University of Helsinki, FI) [dblp]
Gunnar Rätsch (ETH Zürich, CH) [dblp]
Tobias Rausch (EMBL - Heidelberg, DE) [dblp]
Knut Reinert (FU Berlin, DE) [dblp]
Karin Remington (Computationality, US) [dblp]
Bernhard Renard (Robert Koch Institut - Berlin, DE) [dblp]
S. Cenk Sahinalp (Simon Fraser University - Burnaby, CA) [dblp]
Enrico Siragusa (IBM TJ Watson Research Center - Yorktown Heights, US) [dblp]
Peter F. Stadler (Universität Leipzig, DE) [dblp]
Granger Sutton (The J. Craig Venter Institute - Rockville, US) [dblp]
German Tischler-Höhle (MPI - Dresden, DE) [dblp]
Esko Ukkonen (University of Helsinki, FI) [dblp]
Tandy Warnow (University of Illinois - Urbana-Champaign, US) [dblp]
David Weese (SAP Innovation Center - Potsdam, DE) [dblp]
Shibu Yooseph (University of Central Florida - Orlando, US) [dblp]

Classification

bioinformatics
data structures / algorithms / complexity
software engineering

Keywords

Sequence analysis
DNA Sequence Assembly
Expression Profiles
Cancer
Human Disease
Software Engineering (Tools & Libraries)
Next Generation Sequencing

Seminar 16351

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 16351

Next Generation Sequencing – Algorithms, and Software For Biomedical Applications

( Aug 28 – Sep 02, 2016 )

Permalink

Organizers

Contact

Dagstuhl Seminar Wiki

Publications

Motivation

Summary

Motivation

Goals of the seminar

Results

Participants

Classification

Keywords