- Dagstuhl Materials Page (Use personal credentials as created in DOOR to log in)
Our 5-day Dagstuhl Seminar on "Programming Language Processing" (PLP) brought together researchers and practitioners from the software engineering, programming languages, and natural language processing communities The seminar focused on activities prepared ahead by the participants, such as talks, demos of tools and challenges, and tutorials, as well as informal discussions anchored around the prepared activities. We provided each participant who wanted to present their work an opportunity for doing so. In addition, we asked specific people to present specific topics, e.g., experts of a particularly relevant subfield to prepare a tutorial or creators of a particularly relevant tool to give a tool demo.
In addition to talks and informal discussions, there were several break-out sessions during which participants discussed specific topics in smaller groups and eventually reported back to the other participants. In particular, we had break-out sessions on the following topics:
- How (if at all) do AI programming assistants change programming?
- Interpreting neural models of code.
- Do we still need per-task models, or do large language models solve it all?
- What software engineering tasks are not yet explored (sufficiently) by neural models?
- How should and will computer science education change in response to ML-based coding tools?
- What kinds of guarantees can we expect, and do we want, from ML-based developer tools? What human factors in interacting with ML systems are relevant?
- How can learned models use existing tools, e.g., compilers and interpreters, to improve their predictions?
As a result of the seminar, several participants plan to launch various follow-up activities, such as joint publications and transferring promising ideas from academia to industry.
Program analysis is at the core of many tools that software developers rely on during their daily work. Instead of analyzing programs in the traditional, symbolic reasoning-based way, there is an increasing interest in learning-based program analysis, both by academic researchers and industry practitioners. A learning-based approach is motivated by the huge amounts of available source code and other data, by the undecidability of practically all interesting program analysis questions, and by recent progress in machine learning and natural language processing. We here call the emerging field of learning-based program analysis "programming language processing" (PLP), in analogy to "natural language processing". Current work shows PLP to be effective for a variety of tasks, including code completion, bug detection, type prediction, program synthesis, code summarization, and program repair.
This Dagstuhl Seminar will bring together researchers and practitioners from three communities – software engineering, programming languages, and natural language processing – providing a unique opportunity for cross-fertilization and inter-disciplinary progress. We will discuss machine learning models of code, integrating learning-based and traditional program analysis, and learning from natural language information associated with software. We expect the seminar to lead to a better understanding of the commonalities and differences between natural and programming languages, a set of standardized tasks and datasets, and an understanding of the challenges and opportunities in industry adoption of PLP.
Topics to be discussed (non-exhaustive list):
- Effective models of code
- Techniques for obtaining, cleaning, and preprocessing training data
- Integrating learning-based and traditional program analysis
- Learning from natural language information associated with programs
- Standardized tasks, leaderboards
- Challenges and opportunities in industry adoption
The main purpose of the seminar is to connect the participants and their respective research ideas. Hence, one expected outcome is new collaborations between the participants, in particular across different subfields of computer science that usually interact only sparsely. As another concrete outcome of the seminar, we plan to write an article that summarizes the current state of PLP, open challenges, and a set of specific tasks the community could focus on. The article should neither survey all existing work nor focus on one or two specific technical contributions. Instead, the goal is to provide a forward-looking perspective of PLP to a broader, general computer science audience.
- Rui Abreu (Meta Platforms - Bellevue, US) [dblp]
- Edward E. Aftandilian (GitHub - San Francisco, US) [dblp]
- Jürgen Cito (TU Wien, AT) [dblp]
- Premkumar T. Devanbu (University of California - Davis, US) [dblp]
- Elizabeth Dinella (University of Pennsylvania - Philadelphia, US)
- Aryaz Eghbali (Universität Stuttgart, DE)
- Khashayar Etemadi Someoliayi (KTH Royal Institute of Technology - Stockholm, SE)
- Yoav Goldberg (Bar-Ilan University - Ramat Gan, IL) [dblp]
- Lars Grunske (HU Berlin, DE) [dblp]
- Jingxuan He (ETH Zürich, CH)
- Maliheh Izadi (TU Delft, NL)
- Reyhaneh Jabbarvand (University of Illinois - Urbana-Champaign, US)
- Wei Le (Iowa State University - Ames, US) [dblp]
- Martin Monperrus (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
- Alex Polozov (Google - Mountain View, US) [dblp]
- Michael Pradel (Universität Stuttgart, DE) [dblp]
- Baishakhi Ray (Columbia University - New York, US) [dblp]
- Cedric Richter (Universität Oldenburg, DE) [dblp]
- Romain Robbes (CNRS - Bordeaux, FR & University of Bordeaux, FR) [dblp]
- Baptiste Rozière (Meta AI - Paris, FR)
- Jan Arne Sparka (HU Berlin, DE)
- Charles Sutton (Google - Mountain View, US) [dblp]
- Lin Tan (Purdue University - West Lafayette, US) [dblp]
- Eran Yahav (Technion - Haifa, IL) [dblp]
- Albert Ziegler (GitHub - San Francisco, US)
- Neural and Evolutionary Computing
- Programming Languages
- Software Engineering
- Neural Software Analysis