July 23 – 26 , 2006, Dagstuhl Seminar 06301

Duplication, Redundancy, and Similarity in Software


Rainer Koschke (Universität Bremen, DE)
Arun Lakhotia (University of Louisiana – Lafayette, US)
Ettore Merlo (Ècole Polytechnique – Montréal, CA)
Andrew Walenstein (University of Louisiana – Lafayette, US)

For support, please contact

Dagstuhl Service Team


Dagstuhl Seminar Proceedings DROPS
List of Participants
Dagstuhl's Impact: Documents available


A venerable and long-standing goal and ideal in software development is to avoid duplication and redundancy. Duplication and redundancy can increase the size of the code, make it hard to understand the many code variants, and cause maintenance headaches. The goal of avoiding redundancy has provided the impetus to investigations on software reuse, software refactoring, modularization, and parameterization. Even in the face of the ethic of avoiding redundancy, in practice software frequently contains many redundancies and duplications. For instance the technique of "code scavenging" is frequently used, and works by copying and then pasting code fragments, thereby creating so-called "clones" of duplicated or highly similar code. Redundancies can also occur in various other ways, including because of missed reuse opportunities, purposeful duplication because of efficiency concerns, and duplication through parallel or forked development threads.

Because redundancies frequently do exist in code, methods for detecting and removing them from software are needed in many contexts. Over the past few decades, smatterings of research on these issues have contributed towards addressing the issue. Techniques for finding similar code and on removing duplication have been investigated in several specific areas such software reverse engineering, plagiarism in student programs, copyright infringement investigation, software evolution analysis, code compaction (e.g., for mobile devices), and design pattern discovery and extraction. Common to all these research areas is the problems involved in understanding the redundancies and finding similar code, either within a software system, between versions of a system, or between different systems. Although this research has progressed over decades, only recently has the pace of activity in this area picked up such that significant research momentum could be established. This seminar gathers leading scientists from all different areas related to software redundancy and young researchers ready to pick up the ball.

Reflections, Conclusions, and Acknowledgments

The remaining entries in this proceedings consist of one of three types of entries. The first are summaries of the keynote presentations. The aim of these summaries is to establish broad-brush outlines of the breadth of topics in the area—to firmly assert that there is more to the area than simply “clone detection.” Following this is a summary report on terminological discussions that permeated the seminar. Finally, reports on working sessions are included; these serve to document their outcomes, which primarily consist of open questions and issues. We are hopeful that they will be instrumental in the next wave of research in the area.

As organizers, we hoped the seminar would bring about a new understanding of the field and, in so doing, help lay the foundations for future research in the area. In reflecting back on the seminar, we have to conclude that it produced many successes. The discussions were lively and we know that many interesting ideas for future research were discussed in the working groups and the in the open discussions during the working group reporting sessions. We believe that the variety of interests of the participants served a key purpose: we think it helped broaden the scope and forced a critical reexamination of foundational assumptions, including terminology and concepts.

In closing, we wish to thank the participants for their cooperation, discussion, and efforts, and especially wish to thank the champions for their leadership, and thank every participant who spent time in writing up reports or summaries, or presenting the reports orally. We are particularly grateful to the Dagstuhl organization and the German government for making the seminar possible.


  • Sw-engineering
  • ACM Classifications: D.2.7
  • D.2.13
  • K.5.1.


  • Software clones
  • Code redundancy
  • Clone detection
  • Redundancy removal
  • Software refactoring
  • Software


In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.


Download overview leaflet (PDF).


Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.