Dagstuhl Seminar 19351
( Aug 25 – Aug 30, 2019 )
- Nuno Bandeira (University of California - San Diego, US)
- Ileana M. Cristea (Princeton University, US)
- Lennart Martens (Ghent University, BE)
- Shida Kunz (for scientific matters)
- Data management of sensitive human proteomics data : current practices, recommendations and perspectives for the future : article - Bandeira, Nuno; Deutsch, Eric W.; Kohlbacher, Oliver; Martens, Lennart; Vizcaino, Juan A. - Amsterdam : Elsevier, 2021. - 27 pp. - (Molecular and Cellular Proteomics ; 2021 : open access).
- Towards Increased Reliability, Transparency and Accessibility in Crosslinking Mass Spectrometry : article - Leitner, Alexander; Bonvin, Alexandre M. J .J.; Borchers, Christoph H.; Chalkley, Robert J.; Chamot-Rooke, Julia; Combe, Colin W.; Cox, Jürgen; Dong, Meng-Qiu; Rappsilber, Juri; Wilkins, Marc R.; Vizcaino, Juan A. ; Viner, Rosa; Urlaub, Henning; Thalassinos, Konstantinos; Stengel, Florian; Sobott, Frank; Sinz, Andrea; Schriemer, David; Schmidt, Carla; Scheltema, Richard A.; Sali, Andrej; Petrotchenko, Evgeniy; Novak, Petr; Netz, Eugen; Moritz, Robert L.; Mechtler, Karl; Kohlbacher, Oliver; Kalisman, Nir; Jones, Andrew R.; Ishihama, Yasushi; Huang,Lan; Hoopmann, Michael R.; Heck, Albert J. R.; Gozzo, Fabio C.; Götze, Michael; Fischer, Lutz - Cornell University : arXiv.org, 2020. - 25 pp..
Mass spectrometry (MS) based proteomics has seen an enormous increase in analytical capability over the past twenty years, which has allowed the field to become a cornerstone technology in the life sciences. Concomitant with this increased analytical capability, the field has seen rapid and continuous growth of the amount of data it produces. This in turn led to a strong dependence on dedicated, state-of-the-art computational approaches to process and interpret the acquired data. Moreover, the field has seen the development of complex experimental approaches that allow researchers to dig ever deeper into protein biology. Yet these new approaches also require their own dedicated analysis algorithms.
Two such novel approaches that have gained prominence over the past two years are data independent acquisition (DIA), and protein cross-linking experiments. The DIA approach omits the selection of a narrow mass-over-charge (m/z) range prior to fragmenting a peptide analyte, thus effectively creating compound spectra that consist of a multitude of co-fragmented peptides. These spectra contrast markedly with those from traditional data dependent acquisition (DDA), where spectra are typically derived from the fragmentation of a single peptide. Current identification algorithms, built for the interpretation of DDA spectra, are therefore not suited to handling DIA spectra.
A second challenge emerges for protein cross-linking, where small-molecule cross-linkers are used to establish a covalent link between two sections of a single protein, or between two different proteins. The resulting peptide analytes of interest are thus covalently cross-linked, creating a so-called chimeric di-peptide. When fragmented, these cross-linked di-peptides create complex spectra, with many novel types of fragment ions. Interpretation of these spectra thus requires dedicated algorithms that should moreover be able to adapt to the large variety of small molecule cross-linkers available today.
Apart from these experiment-specific data analysis challenges, the field also must find ways to deal effectively with all the public data that is being amassed at an increasing rate. Indeed, while there is an enormous (and growing) amount of proteomics data now available in the public domain, efforts to channel these data into a comprehensive picture of the proteome of a given cell, tissue, or organism are still at an early stage. The field therefore needs to develop novel ways to assemble and present proteomes for direct consumption by biologists, which will require new algorithms to combine and filter data collected across tens of thousands of individual analyses, along with novel visualization approaches that are tailored to biologists. To successfully address these challenges, different experts need to be brought together: computer scientists, bioinformaticians and statisticians that develop algorithms, approaches and software for the interpretation of the acquired data; life scientists that rely on mass spectrometry-based proteomics as a key means to elucidate biology; and analytical chemists and engineers that develop the instruments.
Our key topics for discussion and investigation at this seminar will follow the outlines of the challenges identified above, and will center on:
- Identification and quantification of DIA data
There is an urgent need to bring together the various researchers involved in establishing novel approaches for DIA analytics, and in developing novel algorithms to process DIA data, so that the specific features of these data can be leveraged for robust identification and quantification.
- Algorithms for the analysis of protein cross-linking data
Cross-linking MS data exposes weaknesses in current scoring functions, as well as scaling issues. This creates a clear need for fresh approaches to the processing of cross-linking data, including data from cleavable cross-linkers. We will therefore bring together experimentalists and bioinformaticians working in cross-linking MS to derive novel solutions to support this field.
- Creating an online view on complete, browsable proteomes from public data
We will investigate approaches to combine data across tens of thousands of analyses into high-quality proteomes and develop an interface for biologists to explore and interrogate such a proteome based on a new visual design language for proteomics.
- Detecting interesting biology from proteomics findings
The re-processing of public data is highly likely to deliver novel biology, yet we are currently extremely poorly equipped to detect such biologically significant findings, or to assess their role or importance. We will therefore investigate the creation of such methods and approaches in this seminar.
The Dagstuhl Seminar 19351 'Computational Proteomics' discussed several key challenges of facing the field of computational proteomics. The topics discussed were varied and wide-ranging, and radiated out from the four topics set out at the start.
These four topics were (i) personally identifiable proteomics data; (ii) unique computational challenges in data-independent analysis (DIA) approaches; (iii) computational approaches for cross-linking proteomics; and (iv) the visual design of proteomics data and results, to communicate more clearly to the broad life sciences community. A cross-cutting topic was introduced as well, which focused on proteotyping in clinical trials as it brings many of the previous challenges together, by asking the logical but complex question of how proteomics approaches, data, and associated computational methods and tools can become part of routine clinical trial data acquisition, monitoring and processing.
Based on these initial topics, breakout sessions were organized around proteomics data privacy, dealing with data from DIA approaches, how to best utilize computational approaches to use cross-linking for structural elucidation, and the importance of visualisation of proteomics data and results to engender excitement for the field's capabilities in the life sciences in general. However, these breakout sessions in turn inspired additional breakout sessions on associated topics.
The DIA and cross-linking breakouts both yielded the issue of ambiguity in identification as a cross-cutting topic that merited its own dedicated breakout session. A closley related breakout session, derived from the proteomics privacy and DIA sessions, centered on open modification searches, which are now becoming feasible in proteomics for the first time, but which are also prone to potentially crippling ambiguity issues while raising even more complex privacy issues. The visual design breakout explicitly identified multi-omics data integration as a direct offshoot of its discussions, which led to a dedicated breakout session on this topic as well. Another emerging breakout session concerned public data, which was triggered by both the DIA and cross-linking topics because of their shared need to disseminate their respective specialised data and results in a standardised, uniform, and well-structured manner. Finally, the cross-linking and DIA topics also led to a breakout session on ion mobility, as this technological advance was seen as a key aspect in the future of these technologies.
Each of these breakout sessions had exciting outcomes, and gave rise to future research ideas and collaborations. The proteomics privacy breakout concluded that the field is now ready to delve in more detail into the issues surrounding proteomics data privacy concerns, and that a white paper will be written that can be used to propose policy and to inform the community. The DIA breakout identified three such future tasks: (i) to develop a perspective manuscript that will discuss peptide-centric and spectrum-centric FDR, as well as the effects of shared evidence; (ii) to conduct an experiment for testing DDA versus DIA on the same sample to discover the sampling space for precursors and fragments; and (iii) to conduct a second experiment for understanding target/decoy scoring for different decoy generation models using both synthetic and predicted target/decoy peptides. The cross-linking breakout concluded that a cross-linked ribosomal protein complex should be used as a standardized dataset publicly available to the community, while a 'Minimum Information Requirements About a Cross Linking Experiment (MIRACLE)' was proposed to unify results from many crosslinking tools. The results will also be presented at the Symposium on Structural Proteomics in Göttingen in November 2019. The visual design breakout came up with many fine-grained conclusions, but also with an overall design philosophy which centered on three levels of technical detail, depending on the audience: i) interfaces for deatiled data exploration for experienced consumers; ii) interfaces with minimal technical information, focusing on high-level data for the specific scientific question for novice consumers; and iii) interfaces with only relevant information for clinical decision making (e.g. short list of proteins significantly affected by the disease) for clinicians.
The five offshoot breakouts described above also came to conclusions, and the interested reader is referred to the corresponding abstracts for details.
Overall, the 2019 Dagstuhl Seminar on Computational Proteomics was extremely successful as a catalyst for careful yet original thinking about key challenegs in the field, and as a means to make progress by setting important, high impact goals to work on in close collaboration. Moreover, during the Seminar, several highly interesting topics for a future Dagstuhl Seminar on Computational Proteomics were proposed, showing that this active and inspired community has not yet run out of challenges, nor out of ideas and opportunities!
- Nuno Bandeira (University of California - San Diego, US) [dblp]
- Harald Barsnes (University of Bergen, NO) [dblp]
- Pedro Beltrao (EBI - Hinxton, GB) [dblp]
- Sebastian Böcker (Universität Jena, DE) [dblp]
- Robert Chalkley (University of California - San Francisco, US)
- Lieven Clement (Ghent University, BE) [dblp]
- Frank Conlon (University of North Carolina - Chapel Hill, US) [dblp]
- David Creasy (Matrix Science Ltd. - London, GB)
- Bernard Delanghe (Thermo Fisher GmbH - Bremen, DE)
- Eric Deutsch (Institute for Systems Biology - Seattle, US) [dblp]
- Maarten Dhaenens (Ghent University, BE)
- Joshua Elias (Chan Zuckerberg Biohub, US) [dblp]
- Michael Götze (ETH Zürich, CH) [dblp]
- Rebekah Gundry (University of Nebraska - Omaha, US)
- Sicheng Hao (Northeastern University - Boston, US) [dblp]
- Nils Hoffmann (ISAS - Dortmund, DE) [dblp]
- Michael Hoopmann (Institute for Systems Biology - Seattle, US) [dblp]
- Lukas Käll (KTH Royal Institute of Technology - Solna, SE) [dblp]
- Michelle Kennedy (Princeton University, US)
- Benoît Kunath (University of Luxembourg, LU)
- Lennart Martens (Ghent University, BE) [dblp]
- Magnus Palmblad (Leiden University Medical Center, NL) [dblp]
- Hannes Röst (University of Toronto, CA) [dblp]
- Renee Salz (Radboud University Nijmegen, NL)
- Birgit Schilling (Buck Institute - Novato, US) [dblp]
- Brian Searle (Institute for Systems Biology - Seattle, US)
- Natalia Sizochenko (Dartmouth College - Hanover, US)
- Stefan Tenzer (Universität Mainz, DE) [dblp]
- Yves Vandenbrouck (CEA - Grenoble, FR) [dblp]
- Hans Vissers (Waters Corporation - Wilmslow, GB)
- Olga Vitek (Northeastern University - Boston, US) [dblp]
- Juan Antonio Vizcaino (EBI - Hinxton, GB) [dblp]
- Mathias Wilhelm (TU München, DE) [dblp]
- Bernd Wollscheid (ETH Zürich, CH) [dblp]
- Roman Zubarev (Karolinska Institute - Stockholm, SE) [dblp]
- Dagstuhl Seminar 05471: Computational Proteomics (2005-11-20 - 2005-11-25) (Details)
- Dagstuhl Seminar 08101: Computational Proteomics (2008-03-02 - 2008-03-07) (Details)
- Dagstuhl Seminar 13491: Computational Mass Spectrometry (2013-12-01 - 2013-12-06) (Details)
- Dagstuhl Seminar 15351: Computational Mass Spectrometry (2015-08-23 - 2015-08-28) (Details)
- Dagstuhl Seminar 17421: Computational Proteomics (2017-10-15 - 2017-10-20) (Details)
- Dagstuhl Seminar 21271: Computational Proteomics (2021-07-04 - 2021-07-09) (Details)
- Dagstuhl Seminar 23301: Computational Proteomics (2023-07-23 - 2023-07-28) (Details)
- Computational Mass Spectrometry
- Computational Biology