Dagstuhl Seminar 23301: Computational Proteomics

Dagstuhl Seminar 23301

Computational Proteomics

( Jul 23 – Jul 28, 2023 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/23301

Organizers

Rebekah Gundry (University of Nebraska - Omaha, US)
Lennart Martens (Ghent University, BE)
Magnus Palmblad (Leiden University Medical Center, NL)

Contact

Andreas Dolzmann (for scientific matters)
Jutka Gasiorowski (for administrative matters)

Summary

Show Summary

The Dagstuhl Seminar 23301 "Computational Proteomics" was based around three key topics of rapid development in mass spectrometry-based proteomics, which were discussed in-depth in light of their challenges and opportunities. These three topics were: (i) the expanding and highly successful adoption of machine learning (ML) approaches in proteomics; (ii) the varied computational challenges posed by the very recent, but very rapidly evolving field of single cell proteomics; and (iii) the possible paths to adoption of advanced computational approaches in the challenging field of glycoproteomics. Each of these topics was introduced by a short lecture, delivered by an expert in the field, and focused on two main goals: (i) to provide an informed opinion of the current state of the field, while highlighting its key challenges; and (ii) to thus entice the participants to contribute their own views on this topic, to help set the agenda for the discussions throughout the remainder of the seminar. Apart from these three invited talks, two ad-hoc talks also emerged during the seminar, and these concerned the specific topic of large scale spectral clustering, and the promise of using the Rust programming language in proteomics applications.

Based on the ideas collected after each of the introductory, topic-specific presentations, a list of discussion points was collated for each topic, and three parallel breakout sessions were then organised in the mornings and afternoons around these discussion points. A final, joint session in the evening of each day served to bring all participants from the different breakout groups together again, and summarized the key points of their respective discussions. Moreover, these joint sessions were also used to update the lists of discussion points for the three topics with any newly emerged points, and to reprioritise discussion points for the next day's breakout sessions.

The Machine Learning Working Group had a lot of topics to explore, mostly focusing on refining current approaches, by, for instance, introducing quality control and explainable AI, as well as setting out new applications for ML in the field. The latter included the possibility of a foundational model, the extended prediction of analyte behaviour in the instrumentation, and the possibility to analyse the resulting models to gain a better understanding of the physico-chemical properties at play in the analytics workflow. Finally, community-building efforts were discussed, and suggestions for improvements of existing initiatives (notably proteomicsML.org), as well for novel community engagements were made.

The Single Cell Proteomics Working Group discussed the applicability of current tools in the new discipline of single cell proteomics. Correspondingly, issues in capacity in current tools also came up, in light of the fast-growing data sizes for singe cell experiments, which are currently at hundreds of analytical runs, but likely soon expanding towards thousands of runs; well beyond the capabilities of present-day algorithms. Standardisation of this field, and its metadata, which is even more inportant given the sheer size and complexity of typical single cell proteomics data sets, was also considered in some detail. This entailed the standardisation of the workflows and algorithm parameters, which are currentl very diverse and specific, as well as the standards for data and metadata representation and dissemination.

The Glycomics and Glycoproteomics Working Group saw plentiful opportunities for the field to strenghten their bioinformatics, and put emphasis on adopting machine learning techniques in their field. However, they also saw open challenges regarding the collection and annotation of their data. Some time was also spent on identifying current weaknesses in their field, notably the quantification of glycopeptides, and possible avenues for addressing these.

At the end of the seminar, a critical assessment of the seminar was performed by all participants, highlighting the strengths and improvement points of the overall Seminar organisation, and a list of potential future Dagstuhl Seminar topics was drafted based on the participant's input. The assessment of the Seminar highlighted in particular the extremely fruitful nature of the open and engaging discussions, the unique and highly valuable nature of Schloss Dagstuhl and its unmatched seminars, and the ongoing gratitude of the computational proteomics community for the opportunity to convene in this singular setting. Concerning possible future topics, a plethora of enticing options were put forward, indicating that the field of computational proteomics remains in full expansion and that it continues to brim with both challenges and promise!

Creative Commons BY 4.0

Lennart Martens, Rebekah Gundry, and Magnus Palmblad

Motivation

Show Motivation

Novel algorithms have become driving forces of innovation in mass spectrometry (MS) based proteomics, as these computational advances have allowed much improved recovery of actionable information from acquired data, in turn propelling the field forward towards ever more sophisticated experimental approaches. Obviously, this also creates a highly fertile ground for interdisciplinary discussions and brainstorming on the evolution and future of computational proteomics.

We have therefore identified four highly interesting computational challenges (and thus opportunities) that have come to the foreground in the field of proteomics: (i) the quickly emerging and expanding application of Machine Learning throughout the field of proteomics; (ii) dedicated computational support for the newly emerging field of single-cell proteomics analyses; (iii) the identification and localization of protein glycosylation using mass spectrometry; and (iv) computational approaches to support the increasingly important role of proteomics in the discovery, design, and quality control of (novel) therapeutics.

Machine Learning is becoming pervasive across the field, which also means it cuts across the other topics, while the other three challenges represent specific application domains of proteomics (single-cell analysis, glycoprotein characterization, and therapeutics). However, between these application domains there are very interesting areas of overlap, such as analysis of therapeutic antibody glycosylation, and glycan analysis of the surfaces of single cells. This Dagstuhl Seminar on Computational Proteomics is therefore built around these four core challenges/opportunities.

As different experts need to be brought together to tackle these four topics, this seminar is thoroughly interdisciplinary: computer scientists, bioinformaticians and statisticians, who develop algorithms and software for data interpretation; experimental life scientists that rely on proteomics as a key means to elucidate biology; and analytical chemists and engineers that develop new instruments and approaches to deliver ever more comprehensive and accurate data. Throughout, industry plays crucial roles as instrument and software vendors, and as advanced users driving applications, including in the development of new diagnostics and pharmaceuticals. Industry participation is therefore explicitly included in this seminar.

Invitees are therefore coming from a very diverse group of participants, including computational and experimental scientists, and academic and industrial researchers across the four relevant domains of this seminar. The goal is to uncover as-yet unexplored synergies from interactions within and across these various backgrounds; a process which has already proven highly effective and inspiring in previous Dagstuhl Seminars on Computational Proteomics. A strong focus throughout the week thus will be on the free exchange of ideas between participants of different backgrounds to maximally benefit from obvious as well as less obvious synergies and to provide maximal opportunity for cross-fertilization of ideas. To accommodate this, the seminar structure will be quite flexible, allowing for spontaneous working groups to emerge alongside pre-planned ones, and providing opportunity for any interested participant to start a discussion on any related topic of interest.

This interdisciplinary Dagstuhl Seminar is therefore poised to enable novel, breakthrough developments in computational proteomics around the four topics: (i) new applications and methods for advanced machine learning in computational proteomics; (ii) address computational challenges posed by single-cell proteomics; (iii) build on a combination of cutting edge algorithms and novel computational approaches to allow glycan analysis in glycoproteomics; and (iv) reinforce the computational foundation of fast-growing proteomics applications in discovery, characterization, and quality control of (novel) therapeutics.

Creative Commons BY 4.0

Rebekah Gundry, Lennart Martens, and Magnus Palmblad

Participants

Show Participants

Kiyoko Aoki-Kinoshita (Soka University - Tokyo, JP) [dblp]
Robbin Bouwmeester (Ghent University, BE) [dblp]
Robert Chalkley (University of California - San Francisco, US) [dblp]
Bernard Delanghe (Thermo Fisher GmbH - Bremen, DE)
Viktoria Dorfer (University of Applied Sciences Upper Austria, AT) [dblp]
Melanie Föll (Universitätsklinikum Freiburg, DE) [dblp]
Laurent Gatto (University of Louvain, BE) [dblp]
Arzu Tugce Guler (Leiden, NL) [dblp]
Rebekah Gundry (University of Nebraska - Omaha, US) [dblp]
Tiannan Guo (Westlake University - Hangzhou, CN) [dblp]
Catherine Hayes (Swiss Institute of Bioinformatics - Geneva, CH) [dblp]
Michael Hoopmann (Institute for Systems Biology - Seattle, US) [dblp]
Lukas Käll (KTH Royal Institute of Technology - Solna, SE) [dblp]
Ville Koskinen (Matrix Science Ltd. - London, GB)
Lennart Martens (Ghent University, BE) [dblp]
Karina Martinez (George Washington University - Washington, DC, US)
Sriram Neelamegham (University at Buffalo - SUNY, US) [dblp]
Magnus Palmblad (Leiden University Medical Center, NL) [dblp]
Erdmann Rapp (MPI - Magdeburg, DE)
Tobias Schmidt (MSAID - Garching, DE) [dblp]
Veit Schwämmle (University of Southern Denmark - Odense, DK) [dblp]
Mathias Wilhelm (TU München - Freising, DE) [dblp]
Dirk Winkelhardt (Ruhr-Universität Bochum, DE & ELIXIR Germany - Jülich, DE)
Bernd Wollscheid (ETH Zürich, CH) [dblp]
Gamze Nur Yapici (Koc University - Istanbul, TR)

Related Seminars

Dagstuhl Seminar 05471: Computational Proteomics (2005-11-20 - 2005-11-25) (Details)
Dagstuhl Seminar 08101: Computational Proteomics (2008-03-02 - 2008-03-07) (Details)
Dagstuhl Seminar 13491: Computational Mass Spectrometry (2013-12-01 - 2013-12-06) (Details)
Dagstuhl Seminar 15351: Computational Mass Spectrometry (2015-08-23 - 2015-08-28) (Details)
Dagstuhl Seminar 17421: Computational Proteomics (2017-10-15 - 2017-10-20) (Details)
Dagstuhl Seminar 19351: Computational Proteomics (2019-08-25 - 2019-08-30) (Details)
Dagstuhl Seminar 21271: Computational Proteomics (2021-07-04 - 2021-07-09) (Details)
Dagstuhl Seminar 25351: Computational Proteomics (2025-08-24 - 2025-08-29) (Details)

Classification

Artificial Intelligence
Machine Learning
Other Computer Science

Keywords

proteomics
bioinformatics
machine learning
therapeutics
mass spectrometry

Seminar 23301

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 23301

Computational Proteomics

( Jul 23 – Jul 28, 2023 )

Permalink

Organizers

Contact

Publications

Impacts

Schedule

Summary

Motivation

Participants

Related Seminars

Classification

Keywords