Mass Spectrometry (MS) is an analytical technique of immense versatility. Advances in mass spectrometers and in the associated experimental workflows, as well as the sequencing of genomes of many organisms, now enable versatile, high throughput and sensitive proteomic and meta-bolomics experiments. A flood of mass spectrometric data is currently produced. Therefore, a major bottleneck is now not the limited availability of data, but our limited ability of data synthesis and interpretation.
Computational and statistical methods are key for fast, objective, and reproducible synthesis and interpretation of mass spectrometric measurements. Substantial efforts have recently been invested in the development of algorithms and tools for these tasks. However many challenges remain, and they can only be addressed through a strong synergy between life sciences and computational research. It is thus necessary to intensify communication and collaboration between three key communities: life scientists who use mass spectrometry as a tool for biological and biomedical investigations, analytical chemists developing the instruments and the experimental protocols, and computer scientists, bio-informaticians and statisticians developing algorithms and software.
The goal of this seminar is to bring together the leading experts from these three communities, assess the state of the field of computational mass spectrometry, and identify areas in need of innovative solutions. On one hand, life scientists will have the opportunity to present novel experimental techniques and discuss open problems. On the other hand, computational scientists will present the existing tools and potentially promising new computational techniques.
The organization of the seminar will follow the long-standing Dagstuhl tradition in that the organizers will only outline the plan of the week and the areas for discussion. The exact schedule and the topics of the presentations will be determined dynamically by the participants. The overall setting is designed to encourage informal interactions. This seminar is the third mass spectrometry-oriented seminar at Dagstuhl. Our past experience indicates that this will be an exciting seminar with more intense discussions and debates than a typical workshop.
Mass Spectrometry (MS) is an analytical technique of immense versatility. Detection of explosives at airports, urine tests for doping in sports, tests for cancer biomarkers in a clinic - all these rely on mass spectrometry as the key analytical technique. During the last decade, technological advances have resulted in a flood of mass spectrometric data (high-resolution mass spectrometry, mass spectrometry coupled to high-performance liquid chromatography - HPLC-MS). The publication of the first human genome in 2001 was a key even that enabled the explosive development of proteomics, which led to the conception of the Human Proteome Project in 2010. Today, mass spectrometric techniques are an indispensible tool in the life sciences. Their development, however, is more and more hampered by the lack of computational tools for the analysis of the data. Modern instrumentation can easily produce data sets of hundreds of gigabytes from an individual sample. Most experimental groups are no longer able to deal with both the amount and the inherent complexity of these data. Computer science has the necessary tools to address these problems. It is thus necessary to intensify collaboration between the three key communities involved: life scientists applying MS; analytical chemists and engineers developing the instruments; computer scientists, bioinformaticians and statisticians developing algorithms and software for data analysis.
The seminar 'Computational Mass Spectrometry' is a follow-up seminar to the successful Dagstuhl seminars on 'Computational Proteomics (05471 and 08101). The different title was chosen to reflect the growing scope of computational mass spectrometry: from proteomics to metabolomic, lipidomics, and glycomics.
The goal of the seminar was thus to assess the state of the art for the field of computational mass spectrometry as a whole and to identify the challenges the field will be facing for the years to come. To this end we put together a list of participants covering both computational and experimental aspects of mass spectrometry from industry and academia from around the world. The result of these discussion should then be summarized in a joint status paper.
The seminar was very productive and led to a number of tangible outcomes summarized below.
The Big Challenges
Not unexpectedly, it turned out to be difficult to identify the big challenges of the coming years and views on this differed quite a bit. After lengthy discussions, we were able to categorize the challenges. We are currently in the process of finalizing the draft of a paper on these challenges for computational mass spectrometry, which is supposed to be submitted by end of March 2014. The paper is a joint work of all the participants and will document the current state of the field. The challenges identified were the following:Challenges of computational and statistical interpretation of mass spectra
Identification of analytes is still a challenge. In proteomics, the identification of post-translational modifications and of different proteoforms pose problems. Also the identification of non-tryptic peptides (peptidomics, MHC ligands) are interesting problems. Estimation of false-discovery rates based on target-decoy approaches has been criticized, but there is still a distinct lack of established alternatives. With the increasing interest in small-molecule mass spectrometry, the identification of metabolites, glycans, and lipids is increasingly becoming an issue and the algorithmic support for this is currently still lacking.
Quantification faces challenges due to the -- still-growing -- diversity of experimental methods for analyte quantifications that necessitate a permanent development of new computational approaches. There are also more fundamental, statistical problems, for example, inferring the absence of an analyte based on the absence of a signal. Quantification is also expected to contribute to the understanding of protein complexes and their stoichiometry.
- Data-independent acquisition
The recent developments of data-independent acquisition techniques resulted in a set of entirely new computational challenges due to the different structure of the underlying data.
Imaging mass spectrometry has become mature on the experimental side. The analysis of spatially resolved MS data, however, poses entirely new problems for computational mass spectromtry with increased complexity and data volume.
- Single-cell mass spectrometry
Multi-parameter single cell mass spectrometry enables the characterization of rare and heterogeneous cell populations and prevents the typical averaging across a whole tissue/cell population. The key challenge will be the development of new computational tools able to define biologically meaningful cell types and then model the dynamic behaviour of the biological processes.
- Top-down proteomics
Despite its obvious advantages of top-down approaches for functional proteomics, isoform identification and related topics, the approach suffers from unmet challenges on the computational side. Methods for mass spectrum deconvolution need to be improved and algorithms for the identification of multiple PTM sites are required.
- Democratization of data
Public availability of large datasets enables novel types of studies in computational mass spectrometry (data mining). The standardized deposition is and reliable repositories handling this data is still a major problem that needs to be addressed.
- Integration of MS data with different technologies
Increasingly, computational biologists face data from multiple omics technologies. Integrating data from computational mass spectrometry across omics levels (genomics with transcriptomics, transcriptomics with proteomics, proteomics with metabolomics) poses a difficult data integration challenge, but will be essential for a more comprehensive view of the biological systems under study.
- Visualization of heterogeneous data sets
The amount, structure and complexity of large-scale mass spectometric data turns out to be a challenging issue. While some end-users of theses methods tend to be interested in a final, aggregated result of a complex data analysis pipeline, it is often essential to analyze the data conveniently down to the raw spectra. Tools navigating these data sets on all levels are currently not yet available.
It was felt among participants that computational mass spectrometry is lacking a structured community. Researchers in computational mass spectrometry come from diverse backgrounds: statistics, computer science, analytical sciences, biology, or medicine. Traditionally they are thus organized in different scientific societies, for example the International Society on Computational Biology (ISCB), the American Society of Mass Spectrometry (ASMS), the Human Proteome Organization (HUPO), the Metabolomics Society, and of course various national societies. Many participants attend both computational and experimental conferences in the area of mass spectrometry organized by these different organizations. Participants suggested to form subgroups for computational mass spectrometry in different societies. At the same time, in order to avoid duplication of structures and efforts, it was planned to share these subgroups across the different societies and establish joint chairs of these groups, organize joint workshops, and coordinate educational activities.
After the Dagstuhl seminar we contacted ISCB and HUPO to discuss the formation of these subgroups. After intensive discussion with the societies, HUPO and ISCB both agreed to this plan. A HUPO subgroup CompMS on computational mass spectrometry was formed. In parallel, ISCB agreed to form a Community of Special Interest (CoSI) CompMS. Both subgroups share a joint structure. A joint steering committee (Steering Committee Oliver Kohlbacher, Olga Vitek, Shoba Ranganathan, Henning Hermjakob, and Ruedi Aebersold) has been established to guide both groups through their formation period. The groups have set up a joint mailing list, a website, and are currently planning initial kick-off meetings as satellite workshops to ISMB 2014 (Boston) and HUPO 2014 (Madrid).
Recognizing the great need for educational materials for various audiences (bioinformaticians, biologists, computer scientists) some participants initiated an initiative to put these materials together as online courses. Discussions of this initiative have come quite far. It is currently planned to come up with a core curriculum for mass spectrometry. This core curriculum will be open for discussion within the computational mass spectrometry community. After the contents of the core curriculum has been established, tutorial papers will be solicited for the various modules of the curriculum. These papers will refer to each other, will use a coherent vocabulary and notation and will appear as a paper collection online in PLoS Computational Biology (edited by Theodore Alexandrov). Additional materials will be included, for example, online courses and lecture videos. An initial tutorial workshop is currently in planning to kickstart the further development of the curriculum.
A working group discussed the problems that computational papers face in the reviewing process. The main driver for this discussion was expediting the review process, specifically in terms of reducing the number of review cycles. It is worth noting that the Journal of Proteome Research (JPR), published by the American Chemical Society (ACS), presents a special case since this journal is the only one in the field that does not have a regular mechanism for the reviewers to see the comments of the other reviewers and the corresponding responses of the authors after each round of review. The proposal initially on the table was to share all reviews among reviewers and invite comments and changes before the first editorial decision is made for the first round of review. This system is, for instance, already in place via EasyChair (software used for the RECOMB meetings, but not for proteomics journals). After discussion, it was decided that it would clearly be beneficial if the JPR distributed all reviews among reviewers after each stage of revision. But it was felt that it would only be necessary to collect comments and feedback from the reviewers (based on sending them all reviews) before the editor reached an initial decision in cases where there was substantive disagreement among reviewers on critical points. These ad hoc communications can be handled in a semi-manual way within the existing manuscript management systems used by the proteomics journals, with the added benefit of maintaining an audit trail for the process. The reviewing guidelines developed by the participants in Dagstuhl are currently being discussed by with the editorial boards of different journals (currently J. Proteome Res and Mol. Cell. Prot.).
- Rudolf Aebersold (ETH Zürich, CH) [dblp]
- Theodore Alexandrov (Universität Bremen, DE) [dblp]
- Dario Amodei (Stanford University, US) [dblp]
- Sebastian Böcker (Universität Jena, DE) [dblp]
- Bernd Bodenmiller (Universität Zürich, CH) [dblp]
- Karsten Boldt (Universitätsklinikum Tübingen, DE)
- Daniel R. Boutz (University of Texas - Austin, US) [dblp]
- Julia Burkhart (ISAS - Dortmund, DE)
- Manfred Claassen (ETH Zürich, CH) [dblp]
- John Cottrell (Matrix Science Ltd. - London, GB)
- Eric Deutsch (Institute for Systems Biology - Seattle, US) [dblp]
- Joshua Elias (Stanford University, US) [dblp]
- David Fenyö (New York University, US) [dblp]
- Anne-Claude Gingras (University of Toronto, CA)
- Henning Hermjakob (European Bioinformatics Institute - Cambridge, GB) [dblp]
- Lukas Käll (KTH - Royal Institute of Technology, SE) [dblp]
- Sangtae Kim (Pacific Northwest National Lab. - Richland, US) [dblp]
- Oliver Kohlbacher (Universität Tübingen, DE) [dblp]
- Theresa Kristl (Universität Salzburg, AT) [dblp]
- Bernhard Küster (TU München, DE) [dblp]
- Henry Lam (The Hong Kong Univ. of Science & Technology, HK) [dblp]
- Wolf D. Lehmann (DKFZ - Heidelberg, DE) [dblp]
- Kathryn Lilley (University of Cambridge, GB) [dblp]
- Michal Linial (The Hebrew University of Jerusalem, IL) [dblp]
- Mike MacCoss (University of Washington - Seattle, US) [dblp]
- Brendan MacLean (University of Washington - Seattle, US) [dblp]
- Alexander Makarov (Thermo Fisher GmbH - Bremen, DE) [dblp]
- Lennart Martens (Ghent University, BE) [dblp]
- Sara Nasso (ETH Zürich, CH) [dblp]
- Alexey Nesvizhskii (University of Michigan - Ann Arbor, US) [dblp]
- Steffen Neumann (IPB - Halle, DE) [dblp]
- Paola Picotti (ETH Zürich, CH) [dblp]
- Knut Reinert (FU Berlin, DE) [dblp]
- Bernhard Renard (RKI - Berlin, DE) [dblp]
- Hannes Röst (ETH Zürich, CH) [dblp]
- William Stafford Noble (University of Washington - Seattle, US) [dblp]
- Stephen Tate (SCIEX - Concord, CA) [dblp]
- Andreas Tholey (Universität Kiel, DE) [dblp]
- Henning Urlaub (MPI für Biophysikalische Chemie - Göttingen, DE) [dblp]
- Olga Vitek (Purdue University - West Lafayette, US) [dblp]
- Christian von Mering (Universität Zürich, CH) [dblp]
- Susan T. Weintraub (The University of Texas Health Science Center, US)
- Witold E. Wolski (ETH Zürich, CH) [dblp]
- René Zahedi (ISAS - Dortmund, DE)
- Dagstuhl Seminar 05471: Computational Proteomics (2005-11-20 - 2005-11-25) (Details)
- Dagstuhl Seminar 08101: Computational Proteomics (2008-03-02 - 2008-03-07) (Details)
- Dagstuhl Seminar 15351: Computational Mass Spectrometry (2015-08-23 - 2015-08-28) (Details)
- Dagstuhl Seminar 17421: Computational Proteomics (2017-10-15 - 2017-10-20) (Details)
- Dagstuhl Seminar 19351: Computational Proteomics (2019-08-25 - 2019-08-30) (Details)
- Dagstuhl Seminar 21271: Computational Proteomics (2021-07-04 - 2021-07-09) (Details)
- Dagstuhl Seminar 23301: Computational Proteomics (2023-07-23 - 2023-07-28) (Details)
- data bases / information retrieval
- data structures / algorithms / complexity
- Computational Mass Spectrometry