Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Within this website:
External resources:
Within this website:
External resources:
  • the dblp Computer Science Bibliography

Dagstuhl Seminar 06301

Duplication, Redundancy, and Similarity in Software

( Jul 23 – Jul 26, 2006 )

Please use the following short url to reference this page:



A venerable and long-standing goal and ideal in software development is to avoid duplication and redundancy. Duplication and redundancy can increase the size of the code, make it hard to understand the many code variants, and cause maintenance headaches. The goal of avoiding redundancy has provided the impetus to investigations on software reuse, software refactoring, modularization, and parameterization. Even in the face of the ethic of avoiding redundancy, in practice software frequently contains many redundancies and duplications. For instance the technique of "code scavenging" is frequently used, and works by copying and then pasting code fragments, thereby creating so-called "clones" of duplicated or highly similar code. Redundancies can also occur in various other ways, including because of missed reuse opportunities, purposeful duplication because of efficiency concerns, and duplication through parallel or forked development threads.

Because redundancies frequently do exist in code, methods for detecting and removing them from software are needed in many contexts. Over the past few decades, smatterings of research on these issues have contributed towards addressing the issue. Techniques for finding similar code and on removing duplication have been investigated in several specific areas such software reverse engineering, plagiarism in student programs, copyright infringement investigation, software evolution analysis, code compaction (e.g., for mobile devices), and design pattern discovery and extraction. Common to all these research areas is the problems involved in understanding the redundancies and finding similar code, either within a software system, between versions of a system, or between different systems. Although this research has progressed over decades, only recently has the pace of activity in this area picked up such that significant research momentum could be established. This seminar gathers leading scientists from all different areas related to software redundancy and young researchers ready to pick up the ball.

Reflections, Conclusions, and Acknowledgments

The remaining entries in this proceedings consist of one of three types of entries. The first are summaries of the keynote presentations. The aim of these summaries is to establish broad-brush outlines of the breadth of topics in the area—to firmly assert that there is more to the area than simply “clone detection.” Following this is a summary report on terminological discussions that permeated the seminar. Finally, reports on working sessions are included; these serve to document their outcomes, which primarily consist of open questions and issues. We are hopeful that they will be instrumental in the next wave of research in the area.

As organizers, we hoped the seminar would bring about a new understanding of the field and, in so doing, help lay the foundations for future research in the area. In reflecting back on the seminar, we have to conclude that it produced many successes. The discussions were lively and we know that many interesting ideas for future research were discussed in the working groups and the in the open discussions during the working group reporting sessions. We believe that the variety of interests of the participants served a key purpose: we think it helped broaden the scope and forced a critical reexamination of foundational assumptions, including terminology and concepts.

In closing, we wish to thank the participants for their cooperation, discussion, and efforts, and especially wish to thank the champions for their leadership, and thank every participant who spent time in writing up reports or summaries, or presenting the reports orally. We are particularly grateful to the Dagstuhl organization and the German government for making the seminar possible.

  • Paul Anderson (GrammaTech Inc. - Ithaca, US)
  • Stefan Bellon (TTI Stuttgart, DE)
  • Magiel Bruntink (CWI - Amsterdam, NL)
  • James R. Cordy (Queen's University - Kingston, CA) [dblp]
  • Thomas R. Dean (Kingston University - Kingston upon Thames, GB)
  • Massimiliano Di Penta (University of Sannio, IT) [dblp]
  • Mohammad El-Ramly (University of Leicester, GB)
  • William Evans (University of British Columbia - Vancouver, CA) [dblp]
  • Pierre Frenzel (Universität Bremen, DE)
  • Simon Giesecke (Universität Oldenburg, DE)
  • Michael W. Godfrey (University of Waterloo, CA) [dblp]
  • Ahmed E. Hassan (University of Waterloo, CA) [dblp]
  • Toshihiro Kamiya (AIST - Tokyo, JP)
  • Cory J. Kapser (University of Waterloo, CA)
  • Holger M. Kienle (University of Victoria, CA)
  • Raghavan Komondoor (New Dehli, IN)
  • Kostas Kontogiannis (University of Waterloo, CA)
  • Rainer Koschke (Universität Bremen, DE) [dblp]
  • Jens Krinke (FernUniversität in Hagen, DE) [dblp]
  • Kiarash Mahdavi (King's College London, GB)
  • Ettore Merlo (Ècole Polytechnique - Montréal, CA)
  • Leon Moonen (CWI - Amsterdam, NL) [dblp]
  • Hausi A. Müller (University of Victoria, CA) [dblp]
  • Markus Pizka (TU München, DE)
  • Ganesan Ramalingam (IBM India Research Lab, IN) [dblp]
  • Matthias Rieger (University of Antwerp, BE)
  • Filip van Rysselberghe (University of Antwerp, BE)
  • Andrew Walenstein (University of Louisiana - Lafayette, US)
  • Peter Weißgerber (Universität Trier, DE)
  • Jürgen Wolff von Gudenberg (Universität Würzburg, DE)

  • Sw-engineering
  • ACM Classifications: D.2.7, D.2.13, K.5.1.

  • Software clones
  • code redundancy
  • clone detection
  • redundancy removal
  • software refactoring
  • software