Dagstuhl Seminar 24451
Machine Learning for Protein-Protein and Protein-Ligand Interactions
( Nov 03 – Nov 08, 2024 )
Permalink
Organizers
- Anne-Florence Bitbol (EPFL - Lausanne, CH)
- Jennifer Listgarten (University of California - Berkeley, US)
- Tomas Pluskal (IOCB - Prague, CZ)
Contact
- Andreas Dolzmann (for scientific matters)
- Christina Schwarz (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
The Dagstuhl Seminar 24451, titled "Machine Learning for Protein-Protein and Protein-Ligand Interactions", convened leading experts from computational biology, chemistry, and machine learning (ML) to explore advancements and challenges in understanding biomolecular interactions. This was the first seminar of its kind, and it aimed to address pressing issues such as integrating domain knowledge into ML models, ensuring data availability and quality, and fostering effective interdisciplinary collaboration. The event facilitated discussions on theoretical advancements, practical challenges, and future research directions in leveraging ML for protein science and drug discovery.
A central theme of the seminar was the exploration of ML techniques for predicting protein-protein and protein-ligand interactions with improved accuracy and interpretability. Topics included representation learning, generative modeling, and the role of inductive biases in structuring ML models. The emergence of protein language models (pLMs) was a particular highlight, as they offer sustainability and efficiency advantages over traditional sequence alignment-based methods. These models were recognized for their ability to enhance structural and functional predictions, particularly in cases where evolutionary data is scarce or unreliable.
Another critical discussion focused on benchmarking ML models for biomolecular interactions. Participants debated the limitations of current datasets, particularly in terms of negative data and class imbalances, and emphasized the need for standardized benchmarks. Reliable benchmarking was identified as essential for validating new approaches, particularly in enzyme design and protein interaction network analysis.
The seminar also tackled real-world challenges in applying ML to protein science. One of the most pressing issues was the scarcity of high-quality datasets, particularly for protein-ligand interactions. Participants proposed new standards for curating balanced datasets and ensuring the inclusion of negative data to improve ML model training. Additionally, discussions highlighted the importance of integrating experimental constraints into ML workflows, which would enhance cost-effectiveness in protein engineering and directed evolution.
The integration of computational and wet-lab research was another key topic. Effective collaboration between these fields remains a major challenge due to differing methodologies and objectives. Strategies were proposed to improve synergy between computational modeling and experimental validation, including the development of interdisciplinary training programs and shared data repositories.
Inspired by previous Dagstuhl meetings, the seminar adopted a discussion-driven format, encouraging open exchanges and collaboration. Several actionable outcomes emerged, including initiatives to enhance data accessibility, develop standardized benchmarking frameworks, and refine ML model architectures for improved performance in biomolecular applications.
The event successfully bridged computational and experimental research, paving the way for future innovations in protein science and drug discovery. Moving forward, participants emphasized the need for continued interdisciplinary collaboration, the development of more reliable datasets, and the refinement of ML techniques to better capture the complexity of protein interactions. The outcomes of this seminar are expected to significantly influence the future of machine learning applications in biology and chemistry, setting the stage for groundbreaking advancements in the field.

Over the past few decades, machine learning (ML) has helped advance progress in a wide range of problems in computational biology and biochemistry, particularly towards understanding the structure and function of proteins. Similarly, in cheminformatics, ML is increasingly influencing pharmaceutical decision making and enabling novel drug design strategies. However, an area of great importance that requires further advances, likely involving significant innovations, is the understanding, prediction, and design of protein-protein and protein-ligand interactions. This Dagstuhl Seminar aims to connect the protein-ML and cheminformatics-ML communities and foster their communication with key experts in biology and chemistry. This seminar will allow us to discuss both theoretical and application-oriented ML topics in the context of protein-protein and protein-ligand interactions. The initial topics for discussion are listed below:
Stream I: Theory and foundations
- Injecting biological and chemical knowledge as inductive bias into modeling
- Representation learning for proteins and small molecules
- Generative modeling for design and prediction
Stream II: Real-world applications
- Tackling the data availability problem
- Fundamental training datasets
- Model performance monitoring
- Reliable performance benchmarking
- Efficient interaction between ML experts and biology experts
- Interpretability and effective reduction of predictions to practice
We consider flexibility to be an important aspect of the seminar organization, and we will not impose a rigid structure. The core of the workshop will be based on discussion and brainstorming sessions, and not on formal conference-style presentations. The program for each day will be determined by collective discussion and by voting on the most interesting topics, which will typically result in splitting the participants into several smaller subgroups based on their interests. Each evening, we will meet with all the participants for a debriefing session, report on the outcomes of individual discussions, and plan the program for the next day. After dinner, we will have less formal concurrent sessions in smaller groups focusing on specific areas that comprise more immediately tractable problems. These evening sessions may also include short overview presentations on specific topics by selected participants.

Please log in to DOOR to see more details.
- Anne-Florence Bitbol (EPFL - Lausanne, CH) [dblp]
- Sebastian Böcker (Friedrich-Schiller-Universität Jena, DE) [dblp]
- Alexandre Bonvin (Utrecht University, NL) [dblp]
- Anton Bushuiev (Czech Technical University - Prague, CZ) [dblp]
- Roman Bushuiev (The Czech Academy of Sciences - Prague, CZ) [dblp]
- Alessandra Carbone (Sorbonne University - Paris, FR) [dblp]
- Alberto Cazzaniga (AREA Science Park - Trieste, IT) [dblp]
- Simona Cocco (ENS - Paris, FR) [dblp]
- Francesca Cuturello (AREA Science Park - Trieste, IT) [dblp]
- Christian Dallago (Nvidia - München, DE) [dblp]
- Arne Elofsson (Stockholm University - Solna, SE) [dblp]
- Sergei Grudinin (CNRS - St. Martin-d'Hères, FR) [dblp]
- Ilia Igashov (EPFL - Lausanne, CH) [dblp]
- Petr Kouba (Czech Technical University - Prague, CZ) [dblp]
- Jessica Lanini (Novartis AG - Basel, CH) [dblp]
- Andrew Leach (University of Manchester, GB)
- Jennifer Listgarten (University of California - Berkeley, US) [dblp]
- Cyril Malbranke (EPFL - Lausanne, CH) [dblp]
- Hiroshi Mamitsuka (Kyoto University, JP) [dblp]
- Céline Marquet (TU München - Garching, DE) [dblp]
- Simon Mathis (University of Cambridge, GB) [dblp]
- Stanislav Mazurenko (Masaryk University - Brno, CZ) [dblp]
- Remi Monasson (ENS - Paris, FR) [dblp]
- Hunter Nisonoff (University of California - Berkeley, US)
- Armita Nourmohammad (University of Washington - Seattle, US) [dblp]
- Tomas Pluskal (IOCB - Prague, CZ) [dblp]
- Burkhard Rost (TU München, DE) [dblp]
- Juho Rousu (Aalto University, FI) [dblp]
- Alexander Schug (Jülich Supercomputing Centre, DE) [dblp]
- Josef Sivic (Czech Technical University - Prague, CZ) [dblp]
- Martin Steinegger (Seoul National University, KR) [dblp]
- Aalt-Jan van Dijk (University of Amsterdam, NL) [dblp]
- Pablo Varas Pardo (Institute of Mathematical Sciences - Madrid, ES)
- Andrea Volkamer (Universität des Saarlandes - Saarbrücken, DE) [dblp]
- Martin Weigt (Sorbonne University - Paris, FR) [dblp]
- Julius Wenckstern (EPFL - Lausanne, CH)
- Bruce Wittmann (Microsoft - Redmond, US)
- Xiaotong Xu (Utrecht University, NL)
- Omri Yakir (Tel Aviv University, IL)
- Lenka Zdeborova (EPFL - Lausanne, CH) [dblp]
Classification
- Machine Learning
- Other Computer Science
Keywords
- protein
- ligand
- molecular interactions
- biological machine learning
- generative models