TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 24431

Automated Programming and Program Repair

( Oct 20 – Oct 25, 2024 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/24431

Organizers

Contact

Shared Documents


Summary

Automated tools that generate and improve code promise to fundamentally change software development. For example, there is a recent trend towards automated code generation from large language models, as evidenced by the capabilities of Codex/Copilot, ChatGPT, and GPT-4. These models, and other techniques, such as search-based and semantic analysis-based techniques, have the potential to automate significant parts of today’s software development process. In particular, there are promising techniques for automated programming and automated program repair. Automated programming refers to techniques that suggest newly written code, e.g., in the form of code completion tools. The capabilities of such tools have increased from moderately successful single-token predictions just a few years ago to predicting entire functions with relatively high accuracy. Techniques for automated programming include large language models that predict code based on natural language specifications of the intended behavior.

Automated program repair refers to a suite of techniques for automated rectification of errors or vulnerabilities in programs. Automated program repair technologies were originally developed for reducing the debugging effort for manually written code. In other words, automated program repair technologies are meant to boost developer productivity in locating and fixing errors in manually written code. Automated programming and automated program repair strongly overlap in terms of their goals and the techniques used to achieve these goals. Both streams of research aim at generating correct source code while having to cope with limited knowledge of the behavior this source code is meant to have. Since formal specifications of correct program behavior are typically not available, both techniques try to infer specifications from various program artifacts, such as large code corpora, past program versions, natural language documentation, or various executions of the program. To address the challenge of predicting likely correct code, both streams of research combine techniques from machine learning, search-based approaches, and semantic code analysis.

Despite these similarities, the subcommunities working on automated programming and automated program repair are only partially aware of each other’s most recent techniques. This seminar set out to explore the intersection of these two fields in order to foster collaborations between them. For example, to discuss recent work and potential future work in the following directions: (1) Apply program repair to fix code generated by code completion models. The code generated by large language models often leaves significant room for improvement in terms of correctness, thereby raising the question whether automated program repair techniques can be used for last-mile improvement of code that was automatically generated by large language models. (2) Apply the generate-and-validate paradigm from program repair to the code completion problem. For example, such techniques can repeatedly generate code completion candidates and validate them by running test suites. (3) Apply language model-based code generators to the program repair problem. Once the location of a bug has been (heuristically) determined, large language models can predict candidate code snippets for replacing the incorrect code. (4) Use the ability of large language models to infer the intended behavior of code from natural language information embedding in the code. For example, we plan to discuss techniques that specify the intended behavior in the form of assertions or test cases, which can then guide automated program repair. (5) In addition to predicting (fixed) code, generate evidence that the final code is trustworthy. Such evidence may take the form of tests generated along with the code, or other certificates obtained from formal reasoning.

Thus, to discuss topics at the intersection of automated programming and program repair, we had Dagstuhl Seminar 24431. In a five-day seminar with 33 participants from both academia and industries (e.g., Microsoft and Google), we held a series of talks and three panel discussions. The seminar concluded with more than 20 talks and three panel discussions. Overall, the seminar stimulated quite a few discussions where researchers initiated some future research directions and potential international collaborations.

Before the seminar, all participants had received an invitation to give a talk of a flexible duration (i.e., lightning update that is around 5 minutes, short talk that is around 10 minutes or long talk that is around 25 minutes). More than 20 participants replied positively to the invitation, resulting in a great variety of talks given by many participants. The first day of the seminar (i.e., October 21, 2025) started with an introduction by the organizers and a self-introduction by all the participants. Then, a few short talks and longer talks (more than 25 minutes) were given by participants. The first day ended with a panel discussion on "Benchmarks for LLM Code Generation". On October 22, 2025, there were several talks followed by a panel discussion on "LLM-beyond just coding-assistance". On October 23, 2025, several talks took place in the morning, followed by an excursion to Mettlach and Villa Borg after lunch. On October 24, 2025, a few inspiring talks took place, followed by a panel discussion on "Obstacles for deploying program repair techniques".

Overall, the seminar has received very positive feedback from the participants both personally and formally (via email). Notably, one participant sent an email to one of the organizers saying that "It was my best Dagstuhl Seminar last October, and I really appreciate your organizing of the seminar once again", demonstrating that the seminar has been quite successful in leaving a good impression in comparison to other Dagstuhl Seminars that the participants have attended. Meanwhile, a few participants have complimented Dagstuhl on the diversity of the social events held (e.g., excursion, the treetop walk, and sauna), and the babysitting services provided for participants attending the seminar with young children.

In terms of collaborations, there are a few actionable topics for collaborations that have been discussed. An opinion piece of AI Software Engineer titled "AI Software Engineer: Programming with Trust" is now available: https://arxiv.org/abs/2502.13767. Another potential collaboration is a critical review on benchmarks crafted by AI communities (i.e., SWE-Bench). Meanwhile, AutoCodeRover (presented by one of the organizers in Dagstuhl), which was an NUS spinoff, has on February 19, 2025, been officially acquired by SonarSource, a leader in code quality via its static analysis solutions.

The seminar focused on the following key themes:

  • Topics at the intersection of automated programming and automated program repair, analyzing progress in both fields.
  • Understanding common mistakes in automatically generated code.
  • Discussing the theme of “Trusted Automated Programming”, which focuses on:
    • How automatically generated code can be made more trustworthy.
    • How to generate evidence that improvements to auto-generated code maintain trustworthiness.
    • How to decide, based on such evidence, when to incorporate automatically generated code into an existing software project with a stable code-base.
  • Important challenges in automated program repair and automated programming in general.
  • Using large language models (LLMs) beyond just coding assistance.
  • Obstacles in deploying program repair techniques in real-world settings.

The seminar also identified several critical challenges:

  1. The problem of curating widely accepted benchmarks for code generation that serve both the Software Engineering and AI communities.
  2. The problem of designing evaluation criteria that effectively assess the quality and reliability of auto-generated code.
  3. The challenges and opportunities in applying LLM-based techniques beyond traditional automated program repair (APR).
  4. The obstacles in training developers to effectively use program repair techniques in real-world software development.
Copyright Shin Hwei Tan

Motivation

Automated programming refers to techniques that suggest newly written code, e.g., in the form of code completion tools. Techniques for automated programming include large language models that predict code based on natural language specifications of the intended behavior. The recent development of technologies like Codex and ChatGPT have made us examine the possibility of automated programming in the future.

Automated program repair refers to a suite of techniques for automated rectification of errors or vulnerabilities in programs. Automated program repair technologies were originally developed for reducing the debugging effort for manually written code. However, interestingly these techniques can be adapted to improve automatically generated code.

This Dagstuhl Seminar will explore the intersection of these two fields. We plan to discuss recent work and potential future work in the following directions

  1. Apply program repair to fix code generated by code completion models. The code generated by large language models often has significant room for improvement in terms of correctness. We can also consider repair of automatically generated code in general.
  2. Apply the generate-and-validate paradigm from program repair to the code completion problem.
  3. Apply language model-based code generators to the program repair problem.
  4. Use the ability of large language models to infer the intended behavior of code from natural language information embedding in the code.
  5. In addition to predicting (fixed) code, generate evidence that the final code is trustworthy. Such evidence may take the form of tests generated along with the code, or other certificates obtained from formal reasoning.

This seminar will also help build greater connections and cross community understanding across formal verification researchers and researchers in program repair / automated programming. Formal verification or symbolic analysis techniques can be suitably adapted to give certificates of correctness, which may be useful in integrating large language model generated code into a software project. This can help in the coexistence of manually written code with auto-code from large language models. This possibility will be discussed and examined during the seminar.

Copyright Claire Le Goues, Michael Pradel, Abhik Roychoudhury, and Shin Hwei Tan

Participants

Please log in to DOOR to see more details.

  • Earl T. Barr (University College London, GB) [dblp]
  • Islem Bouzenia (Universität Stuttgart, DE) [dblp]
  • Yuriy Brun (University of Massachusetts Amherst, US) [dblp]
  • Cristian Cadar (Imperial College London, GB) [dblp]
  • Celso G. Camilo-Junior (Federal University of Goiás, BR) [dblp]
  • Satish Chandra (Google - Mountain View, US) [dblp]
  • Chunyang Chen (TU München - Heilbronn, DE) [dblp]
  • Tse-Hsun Chen (Concordia University - Montreal, CA) [dblp]
  • Zimin Chen (Deutsche Telekom - Bonn, DE) [dblp]
  • Andreea Costea (National University of Singapore, SG) [dblp]
  • Premkumar T. Devanbu (University of California - Davis, US) [dblp]
  • Alexander Frömmgen (Google - München, DE) [dblp]
  • Ahmed E. Hassan (Queen's University - Kingston, CA) [dblp]
  • Sungmin Kang (KAIST - Daejeon, KR)
  • Dongsun Kim (Kyungpook National University, KR) [dblp]
  • Claire Le Goues (Carnegie Mellon University - Pittsburgh, US) [dblp]
  • Yiling Lou (Fudan University - Shanghai, CN) [dblp]
  • Fernanda Madeiral (VU Amsterdam, NL) [dblp]
  • Matías Martínez (UPC Barcelona Tech, ES) [dblp]
  • Martin Monperrus (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
  • Nikhil Parasaram (University College London, GB)
  • Michael Pradel (Universität Stuttgart, DE) [dblp]
  • Nikitha Rao (Carnegie Mellon University - Pittsburgh, US) [dblp]
  • Abhik Roychoudhury (National University of Singapore, SG) [dblp]
  • André Silva (KTH Royal Institute of Technology - Stockholm, SE)
  • Gustavo Soares (Microsoft Corporation - Redmond, US) [dblp]
  • Gang (Gary) Tan (Pennsylvania State University - University Park, US) [dblp]
  • Lin Tan (Purdue University - West Lafayette, US) [dblp]
  • Shin Hwei Tan (Concordia University - Montreal, CA) [dblp]
  • Yingfei Xiong (Peking University, CN) [dblp]
  • Jinqiu Yang (Concordia University - Montreal, CA) [dblp]
  • He Ye (Carnegie Mellon University - Pittsburgh, US)
  • Jooyong Yi (Ulsan National Institute of Science and Technology, KR) [dblp]
  • Jie Zhang (King's College London, GB) [dblp]
  • Lingming Zhang (University of Illinois - Urbana-Champaign, US) [dblp]

Related Seminars
  • Dagstuhl Seminar 17022: Automated Program Repair (2017-01-08 - 2017-01-13) (Details)

Classification
  • Software Engineering

Keywords
  • Program repair
  • Auto-coding
  • Program Synthesis
  • Trustworthy Software
  • Large Language Models