Dagstuhl Seminar 26482: Generative and Agentic Software Engineering Beyond Code

Dagstuhl Seminar 26482

Generative and Agentic Software Engineering Beyond Code

( Nov 22 – Nov 27, 2026 )

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/26482

Organizers

Sven Apel (Universität des Saarlandes - Saarbrücken, DE)
Jin Guo (McGill University - Montréal, CA)
Rashina Hoda (Monash University - Melbourne, AU)
Walid Maalej (Hasso-Plattner-Institut, Universität Potsdam, DE)

Contact

Marsha Kleinbauer (for scientific matters)
Susanne Bach-Bernhard (for administrative matters)

Motivation

Show Motivation

Over the last few years, there has been a huge surge of interest in Large Language Models (LLMs), Generative AI, and Agentic AI for software engineering (SE), both in research and in practice. The area is generally referred to as generative and agentic software engineering ; or AI4SE. Multiple meta-studies have identified hundreds of research papers on LLMs for SE, particularly for code generation, informally called “vibe coding”. Since coining this term in early 2025 by OpenAI’s Andrej Karpathy, hundreds of books on vibe coding have appeared on Amazon. AI4SE is reshaping and will continue to reshape software development as we know it over the last decades. When “prompted in the right way”, state-of-the-art foundation models can generate not only source code and patches, but also specifications and requirements, models and architectural blueprints, documentation and summaries, as well as project plans and work items. Recent studies particularly highlight the potential of Multimodal LLMs trained on images, sketches, and other data (in addition to code and text) to, e.g. generate user interfaces, support UI /UX design, and user testing.

So far, the main research focus of AI4SE has been on a) source code generation (chunks and patches) and on b) increasing automation. However, many companies such as Siemens or SAP have huge repositories of requirements, models, and other artefacts of all kinds, which pose a challenge for existing AI4SE solutions, on the one hand, but also hold enormous potential for automation and value creation, on the other. We argue that the biggest potential of AI4SE—and perhaps also the biggest challenges—lie in:

Generating useful, high-quality non-code artefacts that are needed by various project members throughout the entire software lifecycle.
Supporting developer-AI interaction for optimal quality, productivity, and developer experience.

Beyond the focus on code generation use cases, from the methodology point of view, typical studies in this field investigate prompt designs, LLM tuning, and benchmarking. Often, open-source datasets (e.g., of specifications and corresponding code chunks, defects and corresponding patches, methods and corresponding documentation) are used with various models and prompts to identify how far we can go and how accurate LLMs are in generating the “golden sets”.

Despite top prediction performance, studies have also repeatedly highlighted severe quality issues in the generated artefacts. Recent field studies of programming and refactoring show that AI models can, in fact, boost productivity but clearly fail to completely substitute developers and domain experts. Particularly, when the task becomes specific and more complex, beyond those for simple programs in evaluation benchmarks and programming courses, AI tends to generate faulty code, miss important constraints, or hallucinate. Therefore, it is crucial for AI4SE research to address the development process in its entire complexity and diversity , covering real-world artefacts such as requirements, models, UI / UX designs, discussions and explanations, backlogs, etc.

Preliminary evidence points in the same direction: AI can be a great tool complementing developers. Yet it is fully unclear what a useful generated artefact means and what good developer-AI developer interaction looks like. Today, we know what makes a good stakeholder interview, a good issue report, a good architecture, or good requirements; but we don’t know yet what makes a good developer-AI interaction or what makes an AI-generated artefact useful to software engineering teams.

Creative Commons BY 4.0

Walid Maalej, Jin Guo, Rashina Hoda, and Sven Apel

Classification

Artificial Intelligence
Human-Computer Interaction
Software Engineering

Keywords

Agentic Software Engineering
Generative Requirements Engineering
Generative Design
Software Architecture
Software Documentation
Generative AI
Socio-Technical Aspects in Software Engineering

Seminar 26482

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 26482

Generative and Agentic Software Engineering Beyond Code

( Nov 22 – Nov 27, 2026 )

Permalink

Organizers

Contact

Motivation

Classification

Keywords