Code search describes the process of retrieving source code from a repository, where that source code matches a query. Whether a developer is looking for where an error was thrown, learning how to use a new-to-them API, learning a new programming language, or browsing their team’s directory to familiarize themselves with the codebase, search underpins all these activities. Beyond those human-driven software engineering processes, search is also a component in automated software engineering, such as automated program repair, code example recommendation, and clone detection. Furthermore, new generative AI tools have challenged traditional code search by presenting alternative approaches to finding and reusing code.
Code search research has implications for developer productivity, code quality, and software engineering ethics, and tools to facilitate code search are widely available. Some are internal to companies (e.g. Google has invested substantially in this), others are open source (e.g. Github has a search interface for public repositories), while still others generate code to match a user query (e.g., ChatGPT). Students and professionals use generic web search to find source code examples as well. With each of these platforms, query formats vary, indexing varies, rankings vary, the origin of the code varies, and use cases vary. This provides many avenues for innovation and exploration in code search research.
For example, what is the appropriate scope for a search result? This question has implications for the underlying technology (e.g., should the indexed unit be a file, function, sub-function, or something else?) and for the use case (e.g., does the user want to adapt the code to their context? Are they seeking to understand a code base? Or something else?). There are many other questions worth exploring: How should source code be indexed? Which search results should appear first? Are there artifacts beyond the code itself that should be surfaced, such as diffs against previous versions or documentation? What diversity of results should be shown to the user? What are the ethical considerations with code search, and with code search vs. code generation?
This Dagstuhl Seminar brings together experts in mining software repositories, human factors in software engineering, software documentation, code examples, program analysis, and industrial code search systems to bridge the gap between industry and academia and set the roadmap for the next decade of code search research.
Expected outcomes of this seminar include: new ideas on how to better support developers in searching for code across different user segments (e.g., industrial, open source software, student populations, developers with low language familiarity), clarity on how search can help during different stages of software development (e.g., writing new code, debugging existing issues, reviewing code), a better understanding of code search ethics, and guidelines for more rigorous, repeatable evaluations for code search research.
- Software Engineering
- code search
- developer tools