22.06.14 - 27.06.14, Seminar 14261

Software Development Analytics

Diese Seminarbeschreibung wurde vor dem Seminar auf unseren Webseiten veröffentlicht und bei der Einladung zum Seminar verwendet.


Software and its development generate an inordinate amount of data. For example, check-ins, work items, bug reports and test executions are recorded in software repositories such as CVS, Subversion, GIT, and Bugzilla. Telemetry data, run-time traces, and log files reflect how customers experience software, which includes application and feature usage and exposes performance and reliability.

The sheer amount is impressive: As of July 2013, Mozilla Firefox had 900,000 bug reports, and platforms such as Sourceforge.net and GitHub hosted millions of projects with millions of users. Industrial projects have many sources of data at similar scale. But how can this data be used to improve software? Software analytics takes this data and turns it into actionable insight to inform better decisions related to software. Analytics is commonly used in many businesses—notably in marketing, to better reach and understand customers. The application of analytics to software data is becoming more popular.

To a large extent, software analytics is about what we can learn and share about software. The data include our own projects but also the software projects by others. Looking back at decades of research in empirical software engineering and mining software repositories, software analytics lets us share all of the following:

  • Sharing insights. Specific lessons learned or empirical findings. An example is that in Windows Vista it was possible to build high-quality software using distributed teams if the management is structured around code functionality (Christian Bird and his colleagues).
  • Sharing models. One of the early models was proposed by Fumio Akiyama and says that we should expect over a dozen bugs per 1,000 lines of code. In addition to defect models, plenty of other models (for example effort estimation, retention and engagement) can be built for software.
  • Sharing methods. Empirical findings such as insights and models are often context-specific, e.g., depend on the project that was studied. However, the method ("recipe") to create findings can often be applied across projects. We refer to methods as the techniques by which we can transform data into insight and models.
  • Sharing data. By sharing data, we can use and evolve methods to create better insight and models.

Despite many achievements, there are several challenges ahead for software analytics: How can we make data useful to a wide audience, not just to developers but to anyone involved in software? What can we learn from the vast amount of unexplored data? How can we learn from incomplete or biased data? How can we better tie usage analytics to development analytics? When and what lessons can we take from one project and apply to another? How can we establish smart data science as a discipline in software engineering practice and research as well as education?

In this seminar, we bring together researchers and practitioners from academia and industry who are interested in empirical software engineering and mining software repositories to share their insights, models, methods, and/or data. More specifically, we invite you to (1) discuss the next generation of software analytics; and to (2) contribute to a Software Analytics Manifesto that describes the extent to which software data can be exploited to support decisions related to development and usage of software.

We expect the seminar to outline a set of challenges for analytics on software data, which will help to focus the research effort in this field. The seminar will provide ample opportunities for discussion between attendees and also provide a platform for collaboration between attendees. We expect the seminar to set exciting directions for understanding and acting on data. Please join us for the future of software analytics.