http://www.dagstuhl.de/13491

December 1 – 6 , 2013, Dagstuhl Seminar 13491

Computational Mass Spectrometry

Organizers

Rudolf Aebersold (ETH Zürich, CH)
Oliver Kohlbacher (Universität Tübingen, DE)
Olga Vitek (Purdue University – West Lafayette, US)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 3, Issue 12 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents
Dagstuhl's Impact: Documents available

Summary

Motivation

Mass Spectrometry (MS) is an analytical technique of immense versatility. Detection of explosives at airports, urine tests for doping in sports, tests for cancer biomarkers in a clinic - all these rely on mass spectrometry as the key analytical technique. During the last decade, technological advances have resulted in a flood of mass spectrometric data (high-resolution mass spectrometry, mass spectrometry coupled to high-performance liquid chromatography - HPLC-MS). The publication of the first human genome in 2001 was a key even that enabled the explosive development of proteomics, which led to the conception of the Human Proteome Project in 2010. Today, mass spectrometric techniques are an indispensible tool in the life sciences. Their development, however, is more and more hampered by the lack of computational tools for the analysis of the data. Modern instrumentation can easily produce data sets of hundreds of gigabytes from an individual sample. Most experimental groups are no longer able to deal with both the amount and the inherent complexity of these data. Computer science has the necessary tools to address these problems. It is thus necessary to intensify collaboration between the three key communities involved: life scientists applying MS; analytical chemists and engineers developing the instruments; computer scientists, bioinformaticians and statisticians developing algorithms and software for data analysis.

Goals

The seminar 'Computational Mass Spectrometry' is a follow-up seminar to the successful Dagstuhl seminars on 'Computational Proteomics (05471 and 08101). The different title was chosen to reflect the growing scope of computational mass spectrometry: from proteomics to metabolomic, lipidomics, and glycomics.

The goal of the seminar was thus to assess the state of the art for the field of computational mass spectrometry as a whole and to identify the challenges the field will be facing for the years to come. To this end we put together a list of participants covering both computational and experimental aspects of mass spectrometry from industry and academia from around the world. The result of these discussion should then be summarized in a joint status paper.

Results

The seminar was very productive and led to a number of tangible outcomes summarized below.

The Big Challenges

Not unexpectedly, it turned out to be difficult to identify the big challenges of the coming years and views on this differed quite a bit. After lengthy discussions, we were able to categorize the challenges. We are currently in the process of finalizing the draft of a paper on these challenges for computational mass spectrometry, which is supposed to be submitted by end of March 2014. The paper is a joint work of all the participants and will document the current state of the field. The challenges identified were the following:

Challenges of computational and statistical interpretation of mass spectra
  • Identification
    Identification of analytes is still a challenge. In proteomics, the identification of post-translational modifications and of different proteoforms pose problems. Also the identification of non-tryptic peptides (peptidomics, MHC ligands) are interesting problems. Estimation of false-discovery rates based on target-decoy approaches has been criticized, but there is still a distinct lack of established alternatives. With the increasing interest in small-molecule mass spectrometry, the identification of metabolites, glycans, and lipids is increasingly becoming an issue and the algorithmic support for this is currently still lacking.
  • Quantification
    Quantification faces challenges due to the -- still-growing -- diversity of experimental methods for analyte quantifications that necessitate a permanent development of new computational approaches. There are also more fundamental, statistical problems, for example, inferring the absence of an analyte based on the absence of a signal. Quantification is also expected to contribute to the understanding of protein complexes and their stoichiometry.
Challenges arising from new experimental frontiers
  • Data-independent acquisition
    The recent developments of data-independent acquisition techniques resulted in a set of entirely new computational challenges due to the different structure of the underlying data.
  • Imaging
    Imaging mass spectrometry has become mature on the experimental side. The analysis of spatially resolved MS data, however, poses entirely new problems for computational mass spectromtry with increased complexity and data volume.
  • Single-cell mass spectrometry
    Multi-parameter single cell mass spectrometry enables the characterization of rare and heterogeneous cell populations and prevents the typical averaging across a whole tissue/cell population. The key challenge will be the development of new computational tools able to define biologically meaningful cell types and then model the dynamic behaviour of the biological processes.
  • Top-down proteomics
    Despite its obvious advantages of top-down approaches for functional proteomics, isoform identification and related topics, the approach suffers from unmet challenges on the computational side. Methods for mass spectrum deconvolution need to be improved and algorithms for the identification of multiple PTM sites are required.
Challenges of extracting maximal information from datasets
  • Democratization of data
    Public availability of large datasets enables novel types of studies in computational mass spectrometry (data mining). The standardized deposition is and reliable repositories handling this data is still a major problem that needs to be addressed.
  • Integration of MS data with different technologies
    Increasingly, computational biologists face data from multiple omics technologies. Integrating data from computational mass spectrometry across omics levels (genomics with transcriptomics, transcriptomics with proteomics, proteomics with metabolomics) poses a difficult data integration challenge, but will be essential for a more comprehensive view of the biological systems under study.
  • Visualization of heterogeneous data sets
    The amount, structure and complexity of large-scale mass spectometric data turns out to be a challenging issue. While some end-users of theses methods tend to be interested in a final, aggregated result of a complex data analysis pipeline, it is often essential to analyze the data conveniently down to the raw spectra. Tools navigating these data sets on all levels are currently not yet available.

Community Building

It was felt among participants that computational mass spectrometry is lacking a structured community. Researchers in computational mass spectrometry come from diverse backgrounds: statistics, computer science, analytical sciences, biology, or medicine. Traditionally they are thus organized in different scientific societies, for example the International Society on Computational Biology (ISCB), the American Society of Mass Spectrometry (ASMS), the Human Proteome Organization (HUPO), the Metabolomics Society, and of course various national societies. Many participants attend both computational and experimental conferences in the area of mass spectrometry organized by these different organizations. Participants suggested to form subgroups for computational mass spectrometry in different societies. At the same time, in order to avoid duplication of structures and efforts, it was planned to share these subgroups across the different societies and establish joint chairs of these groups, organize joint workshops, and coordinate educational activities.

After the Dagstuhl seminar we contacted ISCB and HUPO to discuss the formation of these subgroups. After intensive discussion with the societies, HUPO and ISCB both agreed to this plan. A HUPO subgroup CompMS on computational mass spectrometry was formed. In parallel, ISCB agreed to form a Community of Special Interest (CoSI) CompMS. Both subgroups share a joint structure. A joint steering committee (Steering Committee Oliver Kohlbacher, Olga Vitek, Shoba Ranganathan, Henning Hermjakob, and Ruedi Aebersold) has been established to guide both groups through their formation period. The groups have set up a joint mailing list, a website, and are currently planning initial kick-off meetings as satellite workshops to ISMB 2014 (Boston) and HUPO 2014 (Madrid).

Teaching Initiative

Recognizing the great need for educational materials for various audiences (bioinformaticians, biologists, computer scientists) some participants initiated an initiative to put these materials together as online courses. Discussions of this initiative have come quite far. It is currently planned to come up with a core curriculum for mass spectrometry. This core curriculum will be open for discussion within the computational mass spectrometry community. After the contents of the core curriculum has been established, tutorial papers will be solicited for the various modules of the curriculum. These papers will refer to each other, will use a coherent vocabulary and notation and will appear as a paper collection online in PLoS Computational Biology (edited by Theodore Alexandrov). Additional materials will be included, for example, online courses and lecture videos. An initial tutorial workshop is currently in planning to kickstart the further development of the curriculum.

Reviewing Guidelines

A working group discussed the problems that computational papers face in the reviewing process. The main driver for this discussion was expediting the review process, specifically in terms of reducing the number of review cycles. It is worth noting that the Journal of Proteome Research (JPR), published by the American Chemical Society (ACS), presents a special case since this journal is the only one in the field that does not have a regular mechanism for the reviewers to see the comments of the other reviewers and the corresponding responses of the authors after each round of review. The proposal initially on the table was to share all reviews among reviewers and invite comments and changes before the first editorial decision is made for the first round of review. This system is, for instance, already in place via EasyChair (software used for the RECOMB meetings, but not for proteomics journals). After discussion, it was decided that it would clearly be beneficial if the JPR distributed all reviews among reviewers after each stage of revision. But it was felt that it would only be necessary to collect comments and feedback from the reviewers (based on sending them all reviews) before the editor reached an initial decision in cases where there was substantive disagreement among reviewers on critical points. These ad hoc communications can be handled in a semi-manual way within the existing manuscript management systems used by the proteomics journals, with the added benefit of maintaining an audit trail for the process. The reviewing guidelines developed by the participants in Dagstuhl are currently being discussed by with the editorial boards of different journals (currently J. Proteome Res and Mol. Cell. Prot.).

License
  Creative Commons BY 3.0 Unported license
  Ruedi Aebersold and Oliver Kohlbacher and Olga Vitek

Dagstuhl Seminar Series

Classification

  • Bioinformatics
  • Data Bases / Information Retrieval
  • Data Structures / Algorithms / Complexity

Keywords

  • Computational Mass Spectrometry
  • Proteomics
  • Metabolomics
  • Bioinformatics
  • Statistics

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support