TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 26482

Generative and Agentic Software Engineering Beyond Code

( Nov 22 – Nov 27, 2026 )

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/26482

Organizers
  • Sven Apel (Universität des Saarlandes - Saarbrücken, DE)
  • Jin Guo (McGill University - Montréal, CA)
  • Rashina Hoda (Monash University - Melbourne, AU)
  • Walid Maalej (Hasso-Plattner-Institut, Universität Potsdam, DE)

Contact

Motivation

Over the last few years, there has been a huge surge of interest in Large Language Models (LLMs), Generative AI, and Agentic AI for software engineering (SE), both in research and in practice. The area is generally referred to as generative and agentic software engineering ; or AI4SE. Multiple meta-studies have identified hundreds of research papers on LLMs for SE, particularly for code generation, informally called “vibe coding”. Since coining this term in early 2025 by OpenAI’s Andrej Karpathy, hundreds of books on vibe coding have appeared on Amazon. AI4SE is reshaping and will continue to reshape software development as we know it over the last decades. When “prompted in the right way”, state-of-the-art foundation models can generate not only source code and patches, but also specifications and requirements, models and architectural blueprints, documentation and summaries, as well as project plans and work items. Recent studies particularly highlight the potential of Multimodal LLMs trained on images, sketches, and other data (in addition to code and text) to, e.g. generate user interfaces, support UI /UX design, and user testing.

So far, the main research focus of AI4SE has been on a) source code generation (chunks and patches) and on b) increasing automation. However, many companies such as Siemens or SAP have huge repositories of requirements, models, and other artefacts of all kinds, which pose a challenge for existing AI4SE solutions, on the one hand, but also hold enormous potential for automation and value creation, on the other. We argue that the biggest potential of AI4SE—and perhaps also the biggest challenges—lie in:

  1. Generating useful, high-quality non-code artefacts that are needed by various project members throughout the entire software lifecycle.
  2. Supporting developer-AI interaction for optimal quality, productivity, and developer experience.

Beyond the focus on code generation use cases, from the methodology point of view, typical studies in this field investigate prompt designs, LLM tuning, and benchmarking. Often, open-source datasets (e.g., of specifications and corresponding code chunks, defects and corresponding patches, methods and corresponding documentation) are used with various models and prompts to identify how far we can go and how accurate LLMs are in generating the “golden sets”.

Despite top prediction performance, studies have also repeatedly highlighted severe quality issues in the generated artefacts. Recent field studies of programming and refactoring show that AI models can, in fact, boost productivity but clearly fail to completely substitute developers and domain experts. Particularly, when the task becomes specific and more complex, beyond those for simple programs in evaluation benchmarks and programming courses, AI tends to generate faulty code, miss important constraints, or hallucinate. Therefore, it is crucial for AI4SE research to address the development process in its entire complexity and diversity , covering real-world artefacts such as requirements, models, UI / UX designs, discussions and explanations, backlogs, etc.

Preliminary evidence points in the same direction: AI can be a great tool complementing developers. Yet it is fully unclear what a useful generated artefact means and what good developer-AI developer interaction looks like. Today, we know what makes a good stakeholder interview, a good issue report, a good architecture, or good requirements; but we don’t know yet what makes a good developer-AI interaction or what makes an AI-generated artefact useful to software engineering teams.

Copyright Walid Maalej, Jin Guo, Rashina Hoda, and Sven Apel

Classification
  • Artificial Intelligence
  • Human-Computer Interaction
  • Software Engineering

Keywords
  • Agentic Software Engineering
  • Generative Requirements Engineering
  • Generative Design
  • Software Architecture
  • Software Documentation
  • Generative AI
  • Socio-Technical Aspects in Software Engineering