http://www.dagstuhl.de/14261

June 22 – 27 , 2014, Dagstuhl Seminar 14261

Software Development Analytics

Organizers

Harald Gall (Universität Zürich, CH)
Tim Menzies (West Virginia University – Morgantown, US)
Laurie Williams (North Carolina State University – Raleigh, US)
Thomas Zimmermann (Microsoft Corporation – Redmond, US)

For support, please contact

Dagstuhl Service Team

Documents

Dagstuhl Report, Volume 4, Issue 6 Dagstuhl Report
Aims & Scope
List of Participants
Shared Documents

Summary

Software and its development generate an inordinate amount of data. For example, check-ins, work items, bug reports and test executions are recorded in software repositories such as CVS, Subversion, GIT, and Bugzilla. Telemetry data, run-time traces, and log files reflect how customers experience software, which includes application and feature usage and exposes performance and reliability. The sheer amount is truly impressive:

  • As of July 2013, Mozilla Firefox had 900,000 bug reports, and platforms such as Sourceforge.net and GitHub hosted millions of projects with millions of users.
  • Industrial projects have many sources of data at similar scale.

But how can this data be used to improve software? Software analytics takes this data and turns it into actionable insight to inform better decisions related to software. Analytics is commonly used in many businesses - notably in marketing, to better reach and understand customers. The application of analytics to software data is becoming more popular.

To a large extent, software analytics is about what we can learn and share about software. The data include our own projects but also the software projects by others. Looking back at decades of research in empirical software engineering and mining software repositories, software analytics lets us share all of the following:

  • Sharing insights. Specific lessons learned or empirical findings. An example is that in Windows Vista it was possible to build high-quality software using distributed teams if the management is structured around code functionality (Christian Bird and his colleagues).
  • Sharing models. One of the early models was proposed by Fumio Akiyama and says that we should expect over a dozen bugs per 1,000 lines of code. In addition to defect models, plenty of other models (for example effort estimation, retention and engagement) can be built for software.
  • Sharing methods. Empirical findings such as insights and models are often context-specific, e.g., depend on the project that was studied. However, the method ("recipe") to create findings can often be applied across projects. We refer to "methods" as the techniques by which we can transform data into insight and models.
  • Sharing data. By sharing data, we can use and evolve methods to create better insight and models.

The goal of this seminar was to build a roadmap for future work in this area. Despite many achievements, there are several challenges ahead for software analytics:

  • How can we make data useful to a wide audience, not just to developers but to anyone involved in software?
  • What can we learn from the vast amount of unexplored data?
  • How can we learn from incomplete or biased data?
  • How can we better tie usage analytics to development analytics?
  • When and what lessons can we take from one project and apply to another?
  • How can we establish smart data science as a discipline in software engineering practice and research as well as education?

Seminar Format

In this seminar, we brought together researchers and practitioners from academia and industry who are interested in empirical software engineering and mining software repositories to share their insights, models, methods, and/or data. Before the seminar, we collected input from the participants through an online survey to collect relevant themes and papers for the seminar. Most themes from the survey fell into the categories of method (e.g., measurement, visualization, combination of qualitative with quantitative methods), data (e.g. usage/telemetry, security, code, people, etc.), and best practices and fallacies (e.g. how to choose techniques, how to deal with noise and missing data, correlation vs. causation). A theme that also emerged in the pre-Dagstuhl survey was analytics for the purpose of theory format, i.e. "data analysis to support software engineering theory formation (or, data analytics in support of software science, as opposed to software engineering)".

At the seminar, we required that attendees

  1. discuss the next generation of software analytics;
  2. contribute to a Software Analytics Manifesto that describes the extent to which software data can be exploited to support decisions related to development and usage of software.

Attendees were required to outline a set of challenges for analytics on software data, which will help to focus the research effort in this field. The seminar provided ample opportunities for discussion between attendees and also provide a platform for collaboration between attendees since our time was divided equally between:

  1. Plenary sessions where everyone gave short (10 minute) presentations on their work.
  2. Breakout sessions where focus groups worked on shared tasks.

Our schedule was very dynamic. Each day ended with a "think-pair-share" session where some focus for the next day was debated first in pairs, then shared with the whole group. Each night, the seminar organizers would take away the cards generated in the "think-pair-share" sessions and use that feedback to reflect on how to adjust the next day's effort.

License
  Creative Commons BY 3.0 Unported license
  Harald Gall, Tim Menzies, Laurie Williams, and Thomas Zimmermann

Classification

  • Software Engineering

Keywords

  • Software development
  • Data-driven decision making
  • Analytics
  • Empirical software engineering
  • Mining software repositories
  • Business intelligence
  • Predictive analytics

Book exhibition

Books from the participants of the current Seminar 

Book exhibition in the library, ground floor, during the seminar week.

Documentation

In the series Dagstuhl Reports each Dagstuhl Seminar and Dagstuhl Perspectives Workshop is documented. The seminar organizers, in cooperation with the collector, prepare a report that includes contributions from the participants' talks together with a summary of the seminar.

 

Download overview leaflet (PDF).

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.

NSF young researcher support