- Expanding the Use of Spectral Libraries in Proteomics : article - Deutsch, Eric W; Perez-Riverol, Yasset; Chalkley, Robert J. Chalkley; Wilhelm, Mathias; Tate, Stephen; Sachsenberg, Timo; Walzer, Mathias; Käll, Lukas; Schymanski, Emma L.; Kuster, Bernhard; Neumann, Steffen; Lam, Henry; Böcker, Sebastian; Delanghe, Bernard ; Wilmes, Paul ; Dorfer, Viktoria ; Volders, Pieter-Jan ; Jehmlich, Nico ; Vissers, Johannes P. C. ; Wolan, Dennis W. ; Wang, Ana Y. ; Mendoza, Luis ; Shofstahl, Jim ; Dowsey, Andrew W. ; Griss, Johannes ; Salek, Reza M. ; Binz, Pierre-Alain ; Vizcaino, Juan Antonio ; Bandeira, Nuno ; Röst, Hannes - Washington, D.C. : American Chemical Society, 2018. - 10 pp..
Proteomics has become a data science. Driven by continuous improvements in instrumentation and increasing sophistication of experimental approaches, the sensitivity of the analyses has grown rapidly while the complexity of the experimental designs has expanded. The result is the generation of ever more data, and often higher complexity of these data to boot. A side effect of this recent transition to a data science, is that much of the acquired proteomics data remains incompletely interpreted due to a lack of truly exhaustive analysis approaches.
Fortunately, there has been a very strong drive to publicly share data in biology, and proteomics is no exception. The result is that a very large amount of essentially uninterpreted yet undoubtedly valuable data is currently available (over 100TB already today, and doubling yearly).
Other omics fields have meanwhile experienced a similar evolution, in the process transforming much of molecular biology into an overall data science. This is evident in the ever more central role of bioinformatics as the main gateway to knowledge in biology.
All these advances create great opportunities for new discoveries, yet in order to make maximal use of these opportunities it will be paramount to bring experimental and computational experts from across different domains together to intensify collaborations.
Indeed, it has become quite clear that a more complete understanding of biological systems will require the integration of data across the four traditional omics domains: genomics, transcriptomics, proteomics and metabolomics. At the same time, it has also become evident that the importance of integration extends to the macroscopic level as well, through meta-(proteo-)omics studies that analyze entire communities of (microbial) cells, often in relation to a host organism.
In this Dagstuhl Seminar on Computational Proteomics, we will therefore bring together the three key communities involved in proteomics: life scientists relying on mass spectrometry; analytical chemists and engineers developing the instruments; and computer scientists, bioinformaticians and statisticians developing algorithms and software. In addition, we will also reach out beyond proteomics, and start up collaborations aimed at ultimately extensive integration with scientists from the genomics, transcriptomics, and metabolomics domains.
Our key topics for discussion and investigation at this Dagstuhl Seminar will follow the outlines of the challenges identified above, and will center on:
Integration of proteomics and transcriptomics data to model the dynamics of gene expression
The processes that drive gene transcription and translation remain poorly understood, as is evident from long non-coding RNAs (lncRNAs), small open reading frames (sORFs), and complex correlations between protein and mRNA abundances. We will therefore endeavor to develop integrative strategies to explore these issues in more detail.
Analysis and interpretation of public proteomics data in orthogonal contexts
The mass of proteomics data in the public domain are uniquely suited to orthogonal reanalysis. Notable examples could be the integration with transcriptomics data to better understand translation, or the exploration of the large proportion (70%) of as-yet unidentified spectra.
Assessing and addressing the specific computational challenges of metaproteomics
The field of metaproteomics is a reasonably young discipline, but is rapidly becoming more popular. Yet data analysis is quite specific, and very few tools or algorithms exist. We will therefore chart the greatest needs in the field, and plan to address these.
Exploration of the key computational interfaces between omics domains
Interfaces between omics domains are very exciting places, and we should chart obvious overlaps and opportunities for across-omics integration as seed cores for collaboration.
Training of integrative bioinformatics experts
The future of bioinformatics will undoubtedly involve a lot of integrative data analysis, and we should consider carefully how we can ensure that we train future researchers appropriately.
The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current challenges facing the field of computational proteomics, while at the same time reaching out across the field's borders to engage with other computational omics fields at the joint interfaces. The issues that were discussed reflect the emergence of novel applications within the field of proteomics, notably proteogenomics (the identification of proteins based on sequence data obtained from prior genomics and/or transcriptomics analyses), and metaproteomics (the study of the combined proteome across an entire community of (micro-)organisms). These two new proteomics approaches share several challenges, which predominantly revolve around the sensitive identification of proteins from large databases while maintaining an acceptably low false discovery rate (FDR). The ramifications of these issues, and possible solutions, were first introduced in short but thought-provoking talks, followed by a plenary discussion to delineate the initial discussion sub-topics. Afterwards, working groups addressed these initial considerations in great detail.
In addition, both proteogenomics and metaproteomics suffer from coverage issues, as neither is currently capable of providing anywhere near a complete view on the true complexity of the (meta-)proteome. This issue is exacerbated by the fact that the true extent of the proteome remains unknown, and is likely to be time-dependent as well. As a result, a separate working group was created to discuss the issues and possible remedies related to proteome coverage.
The field of proteomics has, however, not only extended into novel application areas, but meanwhile also continues to see a strong development of novel technologies. Over the past few years, the most impactful of these is data-independent acquisition (DIA), which comes with its own unique computational challenges. On the one hand, the analysis of DIA data currently relies heavily on spectral libraries, which have so far been a rather niche product in proteomics (as opposed to, for instance, metabolomics, where spectral libraries have a much longer and much more fruitful history), while on the other hand, FDR estimation remains contested in DIA approaches. As a result, two further working groups were established during the seminar, one on the applications for, and methods to create spectral libraries, and the other on the specific challenge of calculating a reliable FDR when performing spectral library searching.
Another key topic of the seminar was the (orthogonal) re-use of public proteomics data, which focused on the provision of metadata for the assembled proteomics data, as this is the key bottleneck facing researchers who wish to perform large-scale re-analysis of public proteomics data, especially when the objective is to obtain biological knowledge. A working group was therefore created to explore the issues with metadata provision, and to explore means to ameliorate the current suboptimal metadata reporting situation.
Throughout the seminar, the topic of visualizing the acquired data and the obtained results cropped up with regularity. A corresponding working group was therefore set up to delineate the state-of-the-art in proteomics data visualization, and to explore the issues with, and opportunities of advanced visualizations in proteomics.
As a last core topic, a short introductory talk and subsequent working group was dedicated to the education of computational proteomics researchers, with special focus on their ability to work at the interfaces with other omics fields (genomics, transcriptomics, and metabolomics). This working group assembled an extensive list of already available materials, along with an overview of the different roles and specializations that can be found across informaticians, bio-informaticians, and biologists, and how each field should evolve in order to bring these more closely together in the future.
In addition to abovementioned topic introduction talks, and the associated working groups, two talks illustrated specific topics of the seminar. Paul Wilmes showed his recent work in bringing metaproteomics together with advanced metatranscriptomics and metagenomics, showing that the flexible use of sequence assembly graphs at the nucleotide level opens up many highly interesting possibilities at the proteome level through enhanced identification. Nevertheless, it was observed that there is strong enrichment for genes with unknown function at the protein identification level, highlighting quite clearly that we have yet to achieve a more complete biochemical understanding of microbial ecosystems. Finally, Magnus Palmblad delighted the participants with a highly original talk on the exploration of mass spectrometry data (of both peptides as well as small molecules) through the five senses (sight, hearing, touch, smell, and taste).
- Magnus Arntzen (Norwegian University of Life Sciences - As, NO) [dblp]
- Nuno Bandeira (University of California - San Diego, US) [dblp]
- Harald Barsnes (University of Bergen, NO) [dblp]
- Sebastian Böcker (Universität Jena, DE) [dblp]
- Robert Chalkley (UC - San Francisco, US) [dblp]
- John Cottrell (Matrix Science Ltd. - London, GB)
- Ileana M. Cristea (Princeton University, US) [dblp]
- Bernard Delanghe (Thermo Fisher GmbH - Bremen, DE)
- Eric Deutsch (Institute for Systems Biology - Seattle, US) [dblp]
- Viktoria Dorfer (University of Applied Sciences Upper Austria, AT) [dblp]
- Julien Gagneur (TU München, DE) [dblp]
- Laurent Gatto (University of Cambridge, GB) [dblp]
- Marco Hennrich (EMBL - Heidelberg, DE)
- Nico Jehmlich (UFZ - Leipzig, DE)
- Lukas Käll (KTH - Royal Institute of Technology, SE) [dblp]
- Oliver Kohlbacher (Universität Tübingen, DE) [dblp]
- Jeroen Krijgsveld (DKFZ - Heidelberg, DE) [dblp]
- Bernhard Küster (TU München, DE) [dblp]
- Lydie Lane (Swiss Institute of Bioinformatics, CH) [dblp]
- Kathryn Lilley (University of Cambridge, GB) [dblp]
- Frédérique Lisacek (Swiss Institute of Bioinformatics, CH) [dblp]
- Lennart Martens (Ghent University, BE) [dblp]
- Gerben Menschaert (Ghent University, BE) [dblp]
- Bart Mesuere (Ghent University, BE) [dblp]
- Thilo Muth (Robert Koch Institut - Berlin, DE) [dblp]
- Magnus Palmblad (Leiden University Medical Center, NL) [dblp]
- Phillip Pope (Norwegian University of Life Sciences - As, NO) [dblp]
- Hannes Röst (University of Toronto, CA) [dblp]
- Timo Sachsenberg (Universität Tübingen, DE) [dblp]
- Veit Schwämmle (University of Southern Denmark - Odense, DK) [dblp]
- Stephen Tate (SCIEX - Concord, CA) [dblp]
- Elien Vandermarliere (Ghent University, BE)
- Hans Vissers (Waters Corporation - Wilmslow, GB)
- Olga Vitek (Northeastern University - Boston, US) [dblp]
- Juan Antonio Vizcaino (EBI - Hinxton, GB) [dblp]
- Pieter-Jan Volders (Ghent University, BE) [dblp]
- Mathias Walzer (EBI - Hinxton, GB) [dblp]
- Ana L. Wang (Scripps Research Institute - La Jolla, US)
- Mathias Wilhelm (TU München, DE) [dblp]
- Paul Wilmes (University of Luxembourg, LU) [dblp]
- Dennis Wolan (Scripps Research Institute - La Jolla, US)
- Henrik Zauber (Max-Delbrück-Centrum - Berlin, DE)
- Dagstuhl Seminar 05471: Computational Proteomics (2005-11-20 - 2005-11-25) (Details)
- Dagstuhl Seminar 08101: Computational Proteomics (2008-03-02 - 2008-03-07) (Details)
- Dagstuhl Seminar 13491: Computational Mass Spectrometry (2013-12-01 - 2013-12-06) (Details)
- Dagstuhl Seminar 15351: Computational Mass Spectrometry (2015-08-23 - 2015-08-28) (Details)
- Dagstuhl Seminar 19351: Computational Proteomics (2019-08-25 - 2019-08-30) (Details)
- Dagstuhl Seminar 21271: Computational Proteomics (2021-07-04 - 2021-07-09) (Details)
- Dagstuhl Seminar 23301: Computational Proteomics (2023-07-23 - 2023-07-28) (Details)
- Computational Mass Spectrometry
- Computational Biology
- Integrative Bioinformatics
- Large Scale Public Data