Dagstuhl Seminar 24511
Coding Theory and Algorithms for Emerging Technologies in Synthetic Biology
( Dec 15 – Dec 20, 2024 )
Permalink
Organizers
- R. B. (TU München, DE)
- Olgica Milenkovic (University of Illinois - Urbana Champaign, US)
- Zohar Yakhini (Reichman University - Herzliya, IL)
- Yonatan Yehezkeally (TU München, DE)
Contact
- Michael Gerke (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Shared Documents
- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Schedule
Designing DNA-based storage systems intrinsically requires joint efforts between biologists, chemists, engineers, and computer scientists, as prominent differences from classical storage media exist at all stages. For example, cost-effective synthesis introduces insertion and deletion errors on top of the well-understood substitution errors occurring in classical media, and much less is known for correcting these. Error-correction techniques could also be affected by the targeted application due to the intrinsic properties of the stored data and the effects of the different types of errors (e.g., this phenomenon is observed when storing images in DNA-based storage systems). Further, strands in the storage container are not ordered in the memory, thus, during sequencing, it is not possible to distinguish which strand is being read, making error correction even more challenging. Lastly, sequencing techniques allow observing many copies of erroneous sub-sequences of the stored strands, which can be leveraged to reconstruct the stored strands more efficiently.
The first large-scale experiments that demonstrated the potential of in vitro DNA storage were reported by Church et al., who recovered 643KB of data [1], and Goldman et al., who accomplished the same task for a 739 KB message [2]. However, both of these groups did not recover the entire message successfully due to the lack of using the appropriate coding solutions to correct errors. Most published studies report that either substitutions or deletions are the most prominent error types in DNA-based storage systems, depending upon the specific technology for synthesis and sequencing. Thus, coding-theoretic aspects of DNA-based storage systems have received significant attention recently. However, these theoretical works have not yet led to viable storage technologies.
The progress in enabling information storage in DNA has been driven by the progress in using synthetic DNA in more general applications. In [3], the authors demonstrated how high throughput synthesis can be used to understand, optimize, and fine-tune the functionality of biological systems. This work involved careful design of the reagents – the composition of a large library of candidates – as well as rigorous statistical data analysis to support the result. This Dagstuhl Seminar, therefore, covered general design and analysis frameworks for high throughput experiments. In particular, one specific (and prominent) type of high throughput experiments that is strongly related to synthetic DNA is CRISPR screening. These experiments involve the silencing or the activation of a large number of elements in genomes to allow for optimizing and tuning certain outcomes, ranging from growth rates in plants and bovine cultures to insinuating immune responses in cancer patients.
Informed by this observation, this seminar ultimately aimed at forging closer connections between information theorists, computer scientists, data scientists, biologists, and chemists to: (i) drive joint progress in coding-theoretic techniques specifically tailored to the emerging synthesis sequencing technologies; (ii) have a better understanding of, and initiate innovation in, the application of computer science techniques for high throughput experimental synthetic biology; and (iii) shape an application-driven design of low-error cost-effective DNA-based storage systems. The seminar schedule was flexibly designed, allowing participants to present their research and expertise while interactively accommodating audience input. The plenaries exposed participants to assorted underlying fields, namely genetic code, CRISPR and gene editing, bio-informatics, informatics and machine learning for medical applications, the utility of coding theory for DNA-based information systems, and market data-storage applications. Meanwhile, working groups enabled participants to leverage interdisciplinary backgrounds and share their knowledge and expertise to envision holistic solutions to contemporary challenges. To advertise the research of junior participants, the schedule included a handful of short talks to showcase their results. During the discussions, a couple of participants noticed the implications of their research on the discussed topics; therefore, “pop-up” talks were scheduled for those participants to share their thoughts. Throughout, much fun was had, and connections were forged in a myriad of ice-breaking activities.
References
- G. M. Church, Y. Gao, and S. Kosuri, “Next-generation digital information storage in DNA,” Science, no. 6102, pp. 1628–1628, Sep. 2012.
- N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney, “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,” Nature, no. 7435, pp. 77–80, Jan. 2013.
- E. Sharon, Y. Kalma, A. Sharp, T. Raveh-Sadka, M. Levo, D. Zeevi, L. Keren, Z. Yakhini, A. Weinberger, and E. Segal, “Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters,” Nature biotechnology, vol. 30, no. 6, pp. 521–530, 2012.

The progress in understanding genes and genomes has given a boost to the use of synthetic DNA for biological and technological applications. Synthetic nucleic acids play a central role in synthetic biology and in emerging therapeutic paradigms, e.g., genome editing and nucleic acid vaccines. DNA-based data storage is making significant progress, and thanks to its extreme data-density, its high durability and its timelessness, it is promising to be the next standard for data archival systems.
Synthetic biology and the use of synthetic DNA for information storage applications bring important algorithmic and data analysis challenges. In synthetic biology, reagent and assay design are often driven by algorithmic approaches. Novel synthesis technologies offer cost-reduction by several orders of magnitude at the cost of increased error-rate, raising new coding-theoretic questions.
This Dagstuhl Seminar aims to bring together leading biologists, chemists, engineers, computer/data scientists, and theoreticians working on synthetic biology and on DNA-based data storage, to enable joint work in small groups leading to discussing and exploring recent advancements and current challenges. The seminar will also facilitate the initiation of collaborative work and thus possibly pave the way to addressing open challenges.
The topics envisaged to be discussed include: (i) coding-theoretic challenges and methods for native DNA-based data storage systems; (ii) implications of emerging sequencing technologies, novel synthesis methods, and application-specific data structures; and (iii) information aspects of high throughput synthetic biology, including CRISPR screening experiments.

Please log in to DOOR to see more details.
- Roee Amit (Technion - Haifa, IL) [dblp]
- Iryna Andriyanova (CY Cergy Paris University, FR) [dblp]
- R. B. (TU München, DE)
- Anisha Banerjee (TU München, DE) [dblp]
- Daniella Bar-Lev (Technion - Haifa, IL) [dblp]
- Jessica Bariffi (TU München, DE)
- Salim El Rouayheb (Rutgers University - Piscataway, US) [dblp]
- Ohad Elishco (Ben Gurion University - Beer Sheva, IL) [dblp]
- Nick Goldman (European Molecular Biology Laboratory - Hinxton, GB) [dblp]
- Alexandre Graell i Amat (Chalmers University of Technology - Göteborg, SE) [dblp]
- Francesca Granito (ETH Zürich, CH)
- Robert Grass (ETH Zürich, CH) [dblp]
- Jasper Groen (TU Delft, NL)
- Anina Gruica (Technical University of Denmark - Lyngby, DK) [dblp]
- Serge Kas Hanna (Université Côte d’Azur - Sophia Antipolis, FR) [dblp]
- Cai Kui (Singapore University of Technology and Design, SG) [dblp]
- Olgica Milenkovic (University of Illinois - Urbana Champaign, US) [dblp]
- Lior Nissim (The Hebrew University of Jerusalem, IL)
- Tzachi Pilpel (Weizmann Institute of Science - Rehovot, IL) [dblp]
- Nimesh Pinnamaneni (Helixworks Technologies Ltd. - Cork, IE)
- Inbal Preuss (Technion - Haifa, IL) [dblp]
- Roni Rak (Agriculture Research Organization, IL)
- João Ribeiro (IST - Lisbon, PT)
- Eirik Rosnes (Simula Research Laboratory - Oslo, NO) [dblp]
- Omer Sabary (Technion - Haifa, IL) [dblp]
- Benno Schwikowski (Institut Pasteur & LIX - Paris, FR & MPI - Berlin, DE) [dblp]
- Ilan Shomorony (University of Illinois - Urbana Champaign, US) [dblp]
- Roman Sokolovskii (Imperial College London, GB)
- Mark Somoza (Leibniz-Institut für Lebensmittel-Systembiologie - Freising, DE)
- Kasra Tabatabaei (New England Biolabs - Ipswich, US)
- Jennifer Tang (MIT - Cambridge, US)
- Emanuele Viterbo (Monash University - Clayton, AU) [dblp]
- Van Khu Vu (National University of Singapore, SG) [dblp]
- Frederik Walter (TU München, DE)
- Zhiying Wang (University of California - Irvine, US) [dblp]
- Eitan Yaakobi (Technion - Haifa, IL) [dblp]
- Zohar Yakhini (Reichman University - Herzliya, IL) [dblp]
- Yonatan Yehezkeally (TU München, DE) [dblp]
Classification
- Emerging Technologies
- Information Theory
Keywords
- Synthetic biology
- DNA-based data storage
- Edit error-correcting codes