Dagstuhl Seminar 24172: Code Search

Dagstuhl Seminar 24172

Code Search

( Apr 21 – Apr 24, 2024 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/24172

Organizers

Satish Chandra (Google - Mountain View, US)
Michael Pradel (Universität Stuttgart, DE)
Kathryn T. Stolee (North Carolina State University - Raleigh, US)

Contact

Marsha Kleinbauer (for scientific matters)
Simone Schilke (for administrative matters)

Publications

Satish Chandra, Michael Pradel, and Kathryn T. Stolee. Code Search (Dagstuhl Seminar 24172). In Dagstuhl Reports, Volume 14, Issue 4, pp. 108-123, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Schedule

Schedule

Summary

Show Summary

The 3-day Dagstuhl Seminar on “Code Search” brought together leading experts from academia and industry to discuss and advance the field of code search. This seminar highlighted the critical role of code search in various software engineering activities, from locating where an error was thrown to learning new APIs or programming languages. It also emphasized the importance of search in automated software engineering tasks like automated program repair, code recommendation, and clone detection. The emergence of generative AI tools, which offer alternative methods for finding and reusing code, was also a significant topic of discussion.

Participants explored the implications of code search research on developer productivity, code quality, and software engineering ethics. They examined the diverse tools available for code search, ranging from internal company tools to open-source platforms like GitHub, and generative AI tools like ChatGPT. The seminar addressed various dimensions of code search, such as appropriate scope for search results, indexing methodologies, and combinations of code search and LLMs, e.g., in the form of retrieval-augmented generation.

In addition to talks and informal discussions, there were several break-out sessions during which participants discussed specific topics in smaller groups and eventually reported back to the other participants. Sections 4.1 of the full report provides an overview of the breakout sessions.

As a result of the seminar, several participants plan to launch various follow-up activities, such as joint publications and transferring promising ideas from academia to industry.

Creative Commons BY 4.0

Kathryn T. Stolee, Satish Chandra, and Michael Pradel

Motivation

Show Motivation

Code search describes the process of retrieving source code from a repository, where that source code matches a query. Whether a developer is looking for where an error was thrown, learning how to use a new-to-them API, learning a new programming language, or browsing their team’s directory to familiarize themselves with the codebase, search underpins all these activities. Beyond those human-driven software engineering processes, search is also a component in automated software engineering, such as automated program repair, code example recommendation, and clone detection. Furthermore, new generative AI tools have challenged traditional code search by presenting alternative approaches to finding and reusing code.

Code search research has implications for developer productivity, code quality, and software engineering ethics, and tools to facilitate code search are widely available. Some are internal to companies (e.g. Google has invested substantially in this), others are open source (e.g. Github has a search interface for public repositories), while still others generate code to match a user query (e.g., ChatGPT). Students and professionals use generic web search to find source code examples as well. With each of these platforms, query formats vary, indexing varies, rankings vary, the origin of the code varies, and use cases vary. This provides many avenues for innovation and exploration in code search research.

For example, what is the appropriate scope for a search result? This question has implications for the underlying technology (e.g., should the indexed unit be a file, function, sub-function, or something else?) and for the use case (e.g., does the user want to adapt the code to their context? Are they seeking to understand a code base? Or something else?). There are many other questions worth exploring: How should source code be indexed? Which search results should appear first? Are there artifacts beyond the code itself that should be surfaced, such as diffs against previous versions or documentation? What diversity of results should be shown to the user? What are the ethical considerations with code search, and with code search vs. code generation?

This Dagstuhl Seminar brings together experts in mining software repositories, human factors in software engineering, software documentation, code examples, program analysis, and industrial code search systems to bridge the gap between industry and academia and set the roadmap for the next decade of code search research.

Expected outcomes of this seminar include: new ideas on how to better support developers in searching for code across different user segments (e.g., industrial, open source software, student populations, developers with low language familiarity), clarity on how search can help during different stages of software development (e.g., writing new code, debugging existing issues, reviewing code), a better understanding of code search ethics, and guidelines for more rigorous, repeatable evaluations for code search research.

Creative Commons BY 4.0

Satish Chandra, Michael Pradel, and Kathryn T. Stolee

Participants

Show Participants

Boris Bokowski (Google - München, DE)
José Cambronero (Microsoft - Redmond, US) [dblp]
Satish Chandra (Google - Mountain View, US) [dblp]
Jürgen Cito (TU Wien, AT) [dblp]
Luca Di Grazia (Universität Stuttgart, DE)
Elena Leah Glassman (Harvard University - Allston, US) [dblp]
Georgios Gousios (TU Delft, NL) [dblp]
Reid Holmes (University of British Columbia - Vancouver, CA) [dblp]
Ciera Jaspan (Google - Mountain View, US) [dblp]
Tobias Kiecker (HU Berlin, DE)
Dongsun Kim (Kyungpook National University, KR) [dblp]
Miryung Kim (University of California at Los Angeles, USA & Amazon Web Services - Palo Alto, USA) [dblp]
Jens Krinke (University College London, GB) [dblp]
Julia Lawall (INRIA - Paris, FR) [dblp]
Gabriel Matute (University of California - Berkeley, US)
Alexander Neubeck (GitHub - San Francisco, US) [dblp]
Michael Pradel (Universität Stuttgart, DE) [dblp]
Nikitha Rao (Carnegie Mellon University - Pittsburgh, US) [dblp]
Kathryn T. Stolee (North Carolina State University - Raleigh, US) [dblp]
Christoph Treude (The University of Melbourne, AU) [dblp]
Jan Van den Bussche (Hasselt University, BE) [dblp]
Rijnard van Tonder (Mysten Labs - Palo Alto, US) [dblp]
Bogdan Vasilescu (Carnegie Mellon University - Pittsburgh, US) [dblp]
Cristina Videira Lopes (University of California - Irvine, US) [dblp]
Tobias Welp (Google - München, DE)
Bowen Xu (North Carolina State University - Raleigh, US)
Svetlana Zemlyanskaya (JetBrains GmbH - München, DE)

Classification

Software Engineering

Keywords

code search
developer tools

Seminar 24172

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 24172

Code Search

( Apr 21 – Apr 24, 2024 )

Permalink

Organizers

Contact

Publications

Schedule

Summary

Motivation

Participants

Classification

Keywords