Dagstuhl-Seminar 23062: Programming Language Processing

Dagstuhl-Seminar 23062

Programming Language Processing

( 05. Feb – 10. Feb, 2023 )

(zum Vergrößern in der Bildmitte klicken)

Permalink

Bitte benutzen Sie folgende Kurz-Url zum Verlinken dieser Seite: https://www.dagstuhl.de/23062

Organisatoren

Michael Pradel (Universität Stuttgart, DE)
Baishakhi Ray (Columbia University - New York, US)
Charles Sutton (Google - Mountain View, US)
Eran Yahav (Technion - Haifa, IL)

Kontakt

Marsha Kleinbauer (für wissenschaftliche Fragen)
Christina Schwarz (für administrative Fragen)

Publikationen

Michael Pradel, Baishakhi Ray, Charles Sutton, and Eran Yahav. Programming Language Processing (Dagstuhl Seminar 23062). In Dagstuhl Reports, Volume 13, Issue 2, pp. 20-32, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)

Summary

Show Summary

Our 5-day Dagstuhl Seminar on "Programming Language Processing" (PLP) brought together researchers and practitioners from the software engineering, programming languages, and natural language processing communities The seminar focused on activities prepared ahead by the participants, such as talks, demos of tools and challenges, and tutorials, as well as informal discussions anchored around the prepared activities. We provided each participant who wanted to present their work an opportunity for doing so. In addition, we asked specific people to present specific topics, e.g., experts of a particularly relevant subfield to prepare a tutorial or creators of a particularly relevant tool to give a tool demo.

In addition to talks and informal discussions, there were several break-out sessions during which participants discussed specific topics in smaller groups and eventually reported back to the other participants. In particular, we had break-out sessions on the following topics:

How (if at all) do AI programming assistants change programming?
Interpreting neural models of code.
Do we still need per-task models, or do large language models solve it all?
What software engineering tasks are not yet explored (sufficiently) by neural models?
How should and will computer science education change in response to ML-based coding tools?
What kinds of guarantees can we expect, and do we want, from ML-based developer tools? What human factors in interacting with ML systems are relevant?
How can learned models use existing tools, e.g., compilers and interpreters, to improve their predictions?

As a result of the seminar, several participants plan to launch various follow-up activities, such as joint publications and transferring promising ideas from academia to industry.

Creative Commons BY 4.0

Michael Pradel, Baishakhi Ray, Charles Sutton, and Eran Yahav

Motivation

Show Motivation

Program analysis is at the core of many tools that software developers rely on during their daily work. Instead of analyzing programs in the traditional, symbolic reasoning-based way, there is an increasing interest in learning-based program analysis, both by academic researchers and industry practitioners. A learning-based approach is motivated by the huge amounts of available source code and other data, by the undecidability of practically all interesting program analysis questions, and by recent progress in machine learning and natural language processing. We here call the emerging field of learning-based program analysis "programming language processing" (PLP), in analogy to "natural language processing". Current work shows PLP to be effective for a variety of tasks, including code completion, bug detection, type prediction, program synthesis, code summarization, and program repair.

This Dagstuhl Seminar will bring together researchers and practitioners from three communities – software engineering, programming languages, and natural language processing – providing a unique opportunity for cross-fertilization and inter-disciplinary progress. We will discuss machine learning models of code, integrating learning-based and traditional program analysis, and learning from natural language information associated with software. We expect the seminar to lead to a better understanding of the commonalities and differences between natural and programming languages, a set of standardized tasks and datasets, and an understanding of the challenges and opportunities in industry adoption of PLP.

Topics to be discussed (non-exhaustive list):

Effective models of code
Techniques for obtaining, cleaning, and preprocessing training data
Integrating learning-based and traditional program analysis
Learning from natural language information associated with programs
Standardized tasks, leaderboards
Challenges and opportunities in industry adoption

The main purpose of the seminar is to connect the participants and their respective research ideas. Hence, one expected outcome is new collaborations between the participants, in particular across different subfields of computer science that usually interact only sparsely. As another concrete outcome of the seminar, we plan to write an article that summarizes the current state of PLP, open challenges, and a set of specific tasks the community could focus on. The article should neither survey all existing work nor focus on one or two specific technical contributions. Instead, the goal is to provide a forward-looking perspective of PLP to a broader, general computer science audience.

Creative Commons BY 4.0

Michael Pradel, Baishakhi Ray, Charles Sutton, and Eran Yahav

Teilnehmer

Zeige Teilnehmer

Rui Abreu (Meta Platforms - Bellevue, US) [dblp]
Edward E. Aftandilian (GitHub - San Francisco, US) [dblp]
Jürgen Cito (TU Wien, AT) [dblp]
Premkumar T. Devanbu (University of California - Davis, US) [dblp]
Elizabeth Dinella (University of Pennsylvania - Philadelphia, US)
Aryaz Eghbali (Universität Stuttgart, DE)
Khashayar Etemadi Someoliayi (KTH Royal Institute of Technology - Stockholm, SE)
Yoav Goldberg (Bar-Ilan University - Ramat Gan, IL) [dblp]
Lars Grunske (HU Berlin, DE) [dblp]
Jingxuan He (ETH Zürich, CH)
Maliheh Izadi (TU Delft, NL)
Reyhaneh Jabbarvand (University of Illinois - Urbana-Champaign, US)
Wei Le (Iowa State University - Ames, US) [dblp]
Martin Monperrus (KTH Royal Institute of Technology - Stockholm, SE) [dblp]
Alex Polozov (Google - Mountain View, US) [dblp]
Michael Pradel (Universität Stuttgart, DE) [dblp]
Baishakhi Ray (Columbia University - New York, US) [dblp]
Cedric Richter (Universität Oldenburg, DE) [dblp]
Romain Robbes (CNRS - Bordeaux, FR & University of Bordeaux, FR) [dblp]
Baptiste Rozière (Meta AI - Paris, FR)
Jan Arne Sparka (HU Berlin, DE)
Charles Sutton (Google - Mountain View, US) [dblp]
Lin Tan (Purdue University - West Lafayette, US) [dblp]
Eran Yahav (Technion - Haifa, IL) [dblp]
Albert Ziegler (GitHub - San Francisco, US)

Klassifikation

Neural and Evolutionary Computing
Programming Languages
Software Engineering

Schlagworte

ML4PL
ML4SE
Neural Software Analysis

Seminar 23062

Suche auf der Schloss Dagstuhl Webseite

Schloss Dagstuhl Services

Seminare

Innerhalb dieser Seite:

Externe Seiten:

Publishing

Innerhalb dieser Seite:

Externe Seiten:

dblp

Innerhalb dieser Seite:

Externe Seiten:

Dagstuhl-Seminar 23062

Programming Language Processing

( 05. Feb – 10. Feb, 2023 )

Permalink

Organisatoren

Kontakt

Publikationen

Summary

Motivation

Teilnehmer

Klassifikation

Schlagworte