23. – 26. Juli 2006, Dagstuhl Seminar 06301
Duplication, Redundancy, and Similarity in Software
Rainer Koschke (Universität Bremen, DE)
Arun Lakhotia (University of Louisiana – Lafayette, US)
Ettore Merlo (Ècole Polytechnique – Montréal, CA)
Andrew Walenstein (University of Louisiana – Lafayette, US)
Auskunft zu diesem Dagstuhl Seminar erteilt
A venerable and long-standing goal and ideal in software development is to avoid duplication and redundancy. Duplication and redundancy can increase the size of the code, make it hard to understand the many code variants, and cause maintenance headaches. The goal of avoiding redundancy has provided the impetus to investigations on software reuse, software refactoring, modularization, and parameterization. Even in the face of the ethic of avoiding redundancy, in practice software frequently contains many redundancies and duplications. For instance the technique of "code scavenging" is frequently used, and works by copying and then pasting code fragments, thereby creating so-called "clones" of duplicated or highly similar code. Redundancies can also occur in various other ways, including because of missed reuse opportunities, purposeful duplication because of efficiency concerns, and duplication through parallel or forked development threads.
Because redundancies frequently do exist in code, methods for detecting and removing them from software are needed in many contexts. Over the past few decades, smatterings of research on these issues have contributed towards addressing the issue. Techniques for finding similar code and on removing duplication have been investigated in several specific areas such software reverse engineering, plagiarism in student programs, copyright infringement investigation, software evolution analysis, code compaction (e.g., for mobile devices), and design pattern discovery and extraction. Common to all these research areas is the problems involved in understanding the redundancies and finding similar code, either within a software system, between versions of a system, or between different systems. Although this research has progressed over decades, only recently has the pace of activity in this area picked up such that significant research momentum could be established. This seminar gathers leading scientists from all different areas related to software redundancy and young researchers ready to pick up the ball.
Reflections, Conclusions, and Acknowledgments
The remaining entries in this proceedings consist of one of three types of entries. The first are summaries of the keynote presentations. The aim of these summaries is to establish broad-brush outlines of the breadth of topics in the area—to firmly assert that there is more to the area than simply “clone detection.” Following this is a summary report on terminological discussions that permeated the seminar. Finally, reports on working sessions are included; these serve to document their outcomes, which primarily consist of open questions and issues. We are hopeful that they will be instrumental in the next wave of research in the area.
As organizers, we hoped the seminar would bring about a new understanding of the field and, in so doing, help lay the foundations for future research in the area. In reflecting back on the seminar, we have to conclude that it produced many successes. The discussions were lively and we know that many interesting ideas for future research were discussed in the working groups and the in the open discussions during the working group reporting sessions. We believe that the variety of interests of the participants served a key purpose: we think it helped broaden the scope and forced a critical reexamination of foundational assumptions, including terminology and concepts.
In closing, we wish to thank the participants for their cooperation, discussion, and efforts, and especially wish to thank the champions for their leadership, and thank every participant who spent time in writing up reports or summaries, or presenting the reports orally. We are particularly grateful to the Dagstuhl organization and the German government for making the seminar possible.
- ACM Classifications: D.2.7
- Software clones
- Code redundancy
- Clone detection
- Redundancy removal
- Software refactoring