Dagstuhl Seminar 18161: Visualization of Biological Data

Dagstuhl Seminar 18161

Visualization of Biological Data – Crossroads

( Apr 15 – Apr 20, 2018 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/18161

Organizers

Jan Aerts (KU Leuven, BE)
Nils Gehlenborg (Harvard University, US)
Georgeta Elisabeta Marai (University of Illinois - Chicago, US)
Kay Katja Nieselt (Universität Tübingen, DE)

Contact

Michael Gerke (for scientific matters)
Annette Beyer (for administrative matters)

Publications

Visualization of Biological Data - Crossroads (Dagstuhl Seminar 18161). Jan Aerts, Nils Gehlenborg, Georgeta Elisabeta Marai, and Kay Katja Nieselt. In Dagstuhl Reports, Volume 8, Issue 4, pp. 32-71, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)

Impacts

Schedule

Schedule

Motivation

Show Motivation

The rapidly expanding application of experimental high-throughput and high-resolution methods in biology is creating enormous challenges for the visualization of biological data. To meet these challenges, a large variety of expertise from the visualization, bioinformatics and biology domains is required. These encompass visualization and design knowledge, algorithm design, strong implementation skills for analyzing and visualizing big data, statistical knowledge, and specific domain knowledge for different application problems. In particular, it is of increasing importance to develop powerful and integrative visualization methods combined with computational analytical methods. Furthermore, because of the growing relevance of visualization for bioinformatics, teaching visualization should also become part of the bioinformatics curriculum.

With this Dagstuhl Seminar we want to continue the process of community building across the disciplines of biology, bioinformatics, and visualization. We aim to bring together researchers from the different domains to discuss how to continue the BioVis interdisciplinary dialogue, to foster the development of an international community, to discuss the state-of-the-art and identify areas of research that might benefit from joint efforts of all groups involved.

Thus during the seminar we envision to address the following topics:

Challenges in the integrative and visual analysis of high-dimensional and complex medicine data
Visual analysis of stochastic biological networks
Beyond genomes: visual representations of metagenomes and pangenomes
On the BioVis crossroads: Continued collaborations between the visualization and bioinformatics communities
Designing a curriculum for teaching visualization in bioinformatics

While emphasis of topics is put on different, highly challenging problems in biology, it is also envisioned to identify newly needed visualization paradigms and developments that can help to solve these challenges. Thus further topics may be added, depending on the specific interest of the participants.

As one outcome of the seminar, we plan to summarize the results in a white paper, in which the computational, the visualization and the application domain aspects are collected and published to not only ensure that a broad audience is reached but also that our crossroads journey may continue.

Creative Commons BY 3.0 DE

Jan Aerts, Nils Gehlenborg, Georgeta Elisabeta Marai, and Kay Nieselt

Summary

Show Summary

The rapidly expanding application of experimental high-throughput and high-resolution methods in biology is creating enormous challenges for the visualization of biological data. To meet these challenges, a large variety of expertise from the visualization, bioinformatics and biology domains is required. These encompass visualization and design knowledge, algorithm design, strong implementation skills for analyzing and visualizing big data, statistical knowledge, and specific domain knowledge for different application problems. In particular, it is of increasing importance to develop powerful and integrative visualization methods combined with computational analytical methods. Furthermore, because of the growing relevance of visualization for bioinformatics, teaching visualization should also become part of the bioinformatics curriculum.

With this Dagstuhl Seminar we wanted to continue the process of community building across the disciplines of biology, bioinformatics, and visualization. We aim to bring together researchers from the different domains to discuss how to continue the BioVis interdisciplinary dialogue, to foster the development of an international community, to discuss the state-of-the-art and identify areas of research that might benefit from joint efforts of all groups involved.

Based on the topics identified in the seminar proposal, as well as the interest and expertise of the confirmed participants, the following four topics were chosen as focus areas for the seminar, in addition to the overarching topic of collaboration between the data visualization, bioinformatics, and biology communities:

Visualization challenges related to high-dimensional medical data. Patient data is increasingly available in many forms including genomic, transcriptomic, epigenetic, proteomic, histologic, radiologic, and clinical, resulting in large (100s of TBs, 1000s of patients), heterogeneous (dozens of data types per patient) data repositories. Repositories such as The Cancer Genome Atlas (TCGA) contain a multitude of patient records which can be used for patient stratification, for high-risk group and response to treatment discoveries, or for disease subtype/biomarker discoveries. Still, patient records from the clinic are used singularly to diagnose patients in the clinic without including likely insights from other sources. Similarly, molecular expression signatures from the omic sources barely impinge on the clinical observations. There is an urgent need to bridge the divide the precision medicine gap between the laboratory and the clinic, as well as a need to bridge the quantitative sciences with biology. Additionally, many precision medicine studies plan to include sensor data (e.g. physical activity, sleep, and other patient-worn sensors) that will add another dimension of complexity that analysis and visualization tools need to take into account.

This highly relevant topic focused on visual analytic tools and collaborations that will promote and leverage notions of patient similarity across the phenotypical scales. Scalable and robust machine learning methods will need to work synergistically to integrate evidence of similarity while meaningful visual encodings should simultaneously summarize and illuminate patient similitude at the individual and group level. This topic is closely related to some of the topics below.

Visualization of biological networks. Modeling the stochasticity of genetic circuits is an important field of research in systems biology, and can help elucidate the mechanisms of cell behavior, which in turn can be the basis of diseases. These models can further enable predictions of important phenotypic cellular states. However, the analysis of stochastic probability distributions is difficult due to their spatiotemporal and multidimensional nature, and due to the typically large number of simulations run under varying settings. Moreover, stochastic network researchers often emphasize that what is of biological significance is often not of statistical significance -- numerical analyses often miss small or rare events of particular biological relevance. A visual approach can help, in contrast, in mining the network dynamics through the landscape defined by these probability distributions.

Another major challenge relates to finding "stable behavior" of networks, including those recruited in signal transduction. Multistability and bistability have been often studied in metabolic chemically reactive networks. Necessary conditions have been formulated to imply the emergence of stable phenotypes. However, these methods have been deployed on small networks. Recently many groups have recognized that scalable methods can be explored using steady state or quasi steady state models that are derived from stoichiometry and rate-action kinetics. These unfortunately suffer from the lack of methods that will examine the large parametric space. Consider this: N interacting molecules imply N2 interactions and in turn the same order of the governing "parameters" (activation rates and abundances). For even mid-size portions of salient pathways (EGFR, B-cell Receptor activation, etc.) finding stable states is challenging. It is certainly the case that a complete graph is never realized and sparsity and network mining can be used to glean the necessary structure. Design of experiments followed by visualization of parametric spaces will be required to search for these stable points. Furthermore, the huge size of this space needs possibly new scalable approaches for the visualization.

Visualization for pan-genomics. With the advent of next-generation sequencing we can observe the increase of genome data both in the field of metagenomics (simultaneous assessment of many species) as well as within the field of pan-genomics. In metagenomics, the aim is to understand the composition and operation of complex microbial consortia in environmental samples. On the other hand in pangenomics genomes within a species are studied. While originally a pan-genome has been referred to as the full complement of genes in a clade (mainly a species in bacteria or archaea), this has recently been generalized to considering a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference rather than a single genome.

In bioinformatics, both topics impose a number of computational challenges. For example, a recent review paper by Marschall et al. on "Computational Pan-Genomics: Status, Promises and Challenges" (DOI: 10.1093/bib/bbw089) addresses current efforts in this sub-area of bioinformatics. This area needs novel, qualitatively different computational methods and paradigms. While the development of new promising computational methods and new data structures both in metagenomics and pangenomics can be observed, a number of open challenges exist. One of them in the area of pangenomics is for example the transition from the representation of reference genomes as strings to representations as graphs. However, the important topic of pangenome visualization has not been addressed in the aforementioned review. Interestingly this has been taken up in a break-out session in a recent Dagstuhl seminar on "Next Generation Sequencing - Algorithms, and Software For Biomedical Applications", and identified as a topic of urgent interest and demand. One observation for example is that in pan-genomes there are segments of conserved regions interspersed by highly variable regions. Open question here is how to visualize the highly variable regions, or how to interpret its content in the context of its neighborhood. Other open visualization topics involve the visual representation of the graph structure underlying pangenomes.

In the field of metagenomics some common visualization approaches, such as heatmaps or scatter plots in combination with principal component analyses, are used, however, many open challenges exist. In particular those visualization tools that are developed for genomics studies fall short in representing large-scale, high dimensional metagenomics studies. Especially the magnitude of the data presents a challenge to meaningfully represent biologically valuable information from complex analysis results. Thus also in this topic the question of large-scale and heterogeneous data visualization is of central importance.

Curriculum development of biological data visualization. Parallel to the recognized need to teach bioinformatics students about big data in biology, there is a growing need to familiarise students with modern visual analytics methodologies applied to biological data, and to provide hands-on training. While several community members are teaching summer camps, tutorials, and workshops on biological data visualization, many of these educational sessions take the form of an introduction to specific tools. We find ourselves handling similar questions: what is exploratory data visualization, what is visual analytics, which frameworks to think about visualization exist, how can we explore design space, and how can we visualise biological data to gain insight into them, so that hypotheses can be generated or explored and further targeted analyses can be defined?

Despite the increasing importance of visualization for bioinformatics, there is currently a general lack of integration into the bioinformatics education, and a useful and appropriate curriculum has not yet been developed. In this topic the following questions will be addressed: What should a modern and seminal curriculum for visualization in bioinformatics look like? How far along the introductory visualization courses should this curriculum go, while allowing biological data topics as well? What are the essential topics, and how can comprehensive training be achieved?

The schedule for the seminar was developed by the organizers based on previous successful Dagstuhl seminars. Emphasis was given to a balance between prepared talks and panels and break groups for less structured discussions focused on a selection of highly relevant topics. Three types of plenary presentations were available to participants who had indicated interest in presenting during the seminar: overview talks (20 minutes plus 10 minutes for questions), regular talks (10 minutes plus 5 minutes for questions), and panel presentations (5 minutes per speaker followed by a 20 -- 25 minute discussion). The break out groups met multiple times for several hours during the week and reported back to the overall group on several occasions. This format successfully brought bioinformatics and visualization researchers onto the same platform, and enabled researchers to reach a common, deep understanding through their questions and answers. It also stimulated very long, intense, and fruitful discussions that were deeeply appreciated by all participants.

This report describes in detail the outcomes of this meeting. Our outcomes include a set of white papers summarizing the breakout sessions, overviews of the talks, and a detailed curriculum for biological data visualization courses.

Creative Commons BY 3.0 Unported license

Jan Aerts, Nils Gehlenborg, Georgeta Elisabeta Marai, and Kay Katja Nieselt

Participants

Show Participants

Jan Aerts (KU Leuven, BE) [dblp]
Katja Bühler (VRVis - Wien, AT) [dblp]
Sheelagh Carpendale (University of Calgary, CA) [dblp]
James L. Chen (Ohio State University - Columbus, US) [dblp]
Arlene Chung (University of North Carolina - School of Medicine, US) [dblp]
Anamaria Crisan (University of British Columbia - Vancouver, CA) [dblp]
Mirjam Figaschewski (Universität Tübingen, DE)
Angus Forbes (University of California, Santa Cruz, US) [dblp]
Nils Gehlenborg (Harvard University, US) [dblp]
Carsten Görg (University of Colorado - Aurora, US) [dblp]
David H. Gotz (University of North Carolina - Chapel Hill, US) [dblp]
Helena Jambor (TU Dresden, DE) [dblp]
Jessie Kennedy (Edinburgh Napier University, GB) [dblp]
Karsten Klein (Universität Konstanz, DE) [dblp]
Anne Knudsen (University of Calgary, CA)
Barbora Kozlíková (Masaryk University - Brno, CZ) [dblp]
Michael Krone (Universität Stuttgart, DE) [dblp]
Martin Krzywinski (BC Cancer Research Centre - Vancouver, CA) [dblp]
Alexander Lex (University of Utah - Salt Lake City, US) [dblp]
Raghu Machiraju (The Ohio State University - Columbus, US) [dblp]
Georgeta Elisabeta Marai (University of Illinois - Chicago, US) [dblp]
Lennart Martens (Ghent University, BE) [dblp]
Ewy A. Mathé (Ohio State University - Columbus, US) [dblp]
Torsten Möller (Universität Wien, AT) [dblp]
Scooter Morris (UC - San Francisco, US) [dblp]
Cydney Nielsen (BC Cancer Agency - Vancouver, CA) [dblp]
Kay Katja Nieselt (Universität Tübingen, DE) [dblp]
Bruno Pinaud (University of Bordeaux, FR) [dblp]
James Procter (University of Dundee, GB) [dblp]
William Ray (Ohio State University - Columbus, US) [dblp]
Jens Rittscher (University of Oxford, GB) [dblp]
Jos B.T.M. Roerdink (University of Groningen, NL) [dblp]
Timo Ropinski (Universität Ulm, DE) [dblp]
Ryo Sakai (PharmiWeb Solutions - Bracknell, GB) [dblp]
Falk Schreiber (Universität Konstanz, DE) [dblp]
Christian Stolte (New York Genome Center, US) [dblp]
Marc Streit (Johannes Kepler Universität Linz, AT) [dblp]
Granger Sutton (J. Craig Venter Institute - Rockville, US) [dblp]
Danielle Szafir (University of Colorado - Boulder, US) [dblp]
Cagatay Turkay (City - University of London, GB) [dblp]
Michel A. Westenberg (TU Eindhoven, NL) [dblp]
Blaz Zupan (University of Ljubljana, SI) [dblp]

Related Seminars

Dagstuhl Seminar 12372: Biological Data Visualization (2012-09-09 - 2012-09-14) (Details)
Dagstuhl Seminar 21401: Visualization of Biological Data - From Analysis to Communication (2021-10-03 - 2021-10-08) (Details)
Dagstuhl Seminar 23451: Visualization of Biomedical Data - Shaping the Future and Building Bridges (2023-11-05 - 2023-11-10) (Details)
Dagstuhl Seminar 26101: Contextualising Complexity - Faithful Visualisations for Biology (2026-03-01 - 2026-03-06) (Details)

Classification

bioinformatics
data structures / algorithms / complexity

Keywords

Visualisation
Visual Analytics
Sequence analysis
Omics
Imaging

Seminar 18161

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 18161

Visualization of Biological Data – Crossroads

( Apr 15 – Apr 20, 2018 )

Permalink

Organizers

Contact

Publications

Impacts

Schedule

Motivation

Summary

Participants

Related Seminars

Classification

Keywords