28.08.16 - 02.09.16, Seminar 16351

Next Generation Sequencing - Algorithms, and Software For Biomedical Applications

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.

Motivation

In recent years, Next Generation Sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling, to name but a few. Other fields of biological research, such as phylogenomics, functional genomics, and metagenomics, are also making increasing use of the new sequencing technologies.

The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. Addressing these issues requires expertise in a large variety of areas: algorithm design, high performance computing on big data (and hardware acceleration), statistical modeling and estimation, and specific domain knowledge for each medical problem. In this Dagstuhl Seminar we aim at bringing together leading experts from both sides – computer scientists including theoreticians, algorithmicists and tool developers, as well as leading researchers who work primarily on the application side in the biomedical sector – to discuss the state-of-the art and to identify areas of research that might benefit from a joint effort of all the groups involved.

The key goal of this seminar is a free and deep exchange of ideas and needs between the communities of algorithmicists and theoreticians and practitioners from the biomedical field. On the one hand, state-of-the-art methods from computer science will be presented to experimentalists. On the other hand, leading experimentalists will present novel techniques and sketch problems arising from these techniques. Following this exchange we will discuss the implications that new types of data or experimental protocols have on the needed algorithms or data structures. The schedule will be such that we move from applications to algorithms and implementations, so that the experimentalists can leave the seminar earlier if they wish.

We envision a number of topics that we will address in the seminar such as:

  • Data structures and algorithms for large data sets: The Burrows-Wheeler transform (BWT) and the subsequent search indices by Ferragina and Manzini triggered a plethora of BWT-based read mappers. Fields like pan- and metagenomics, phylogenomics, and genome analysis of large cohorts, will pose new computational problems and challenges.
  • Challenges arising from new experimental frontiers: New experimental technologies offer exciting opportunities for computational research. Examples of such technologies are single cell sequencing, long read sequencing, long-range restriction mapping, and hybrid sequencing strategies (e.g. Moleculo and 10X Genomics). We will discuss the impact of new technologies on the underling computational models and methods.
  • Software engineering (tools, testing, and libraries), hardware acceleration: NGS sequence analysis is currently largely driven by academic tools that exhibit considerable variability in their quality and maintainability. This is somewhat disconcerting, given that such tools will soon be used to compute results upon which treatment decisions will be made. The use of software libraries or adherence to a set of coding standards would help alleviate the problem, but neither are currently in widespread use. Hence there is a need to discuss this issue.
  • New problems in the upcoming age of genomes: Genome structure and variation: Genomic variations between individuals and within (microbial) communities are increasingly being linked to interesting biological effects, e.g. in cancer genesis. Variations include not only single nucleotide variations (SNVs) but also larger scale, so called, structural variations. While a number of methods for detecting variants already exist, approaches for analyzing the impact of variants on specific phenotypes are still in their infancy.
  • Training of interdisciplinary experts: The current and future computational approaches can rarely be used blindly, and should often be adapted to the experiment at hand. There is thus a strong need to train interdisciplinary experts. Given the time constraints in undergraduate and graduate curricula, it is challenging to identify what skills should be taught. Also, the dynamic nature of our field makes textbooks and other teaching materials obsolete in a matter of years. How can the development of appropriate training materials keep up with the rapid changes in our field in terms of content and types of skills necessary to future biomedical scientists?