http://www.dagstuhl.de/14261

22. – 27. Juni 2014, Dagstuhl Seminar 14261

Software Development Analytics

Organisatoren

Harald Gall (Universität Zürich, CH)
Tim Menzies (West Virginia University – Morgantown, US)
Laurie Williams (North Carolina State University – Raleigh, US)
Thomas Zimmermann (Microsoft Corporation – Redmond, US)

Auskunft zu diesem Dagstuhl Seminar erteilt

Dagstuhl Service Team

Dokumente

Dagstuhl Report, Volume 4, Issue 6 Dagstuhl Report
Motivationstext
Teilnehmerliste
Gemeinsame Dokumente

Summary

Software and its development generate an inordinate amount of data. For example, check-ins, work items, bug reports and test executions are recorded in software repositories such as CVS, Subversion, GIT, and Bugzilla. Telemetry data, run-time traces, and log files reflect how customers experience software, which includes application and feature usage and exposes performance and reliability. The sheer amount is truly impressive:

  • As of July 2013, Mozilla Firefox had 900,000 bug reports, and platforms such as Sourceforge.net and GitHub hosted millions of projects with millions of users.
  • Industrial projects have many sources of data at similar scale.

But how can this data be used to improve software? Software analytics takes this data and turns it into actionable insight to inform better decisions related to software. Analytics is commonly used in many businesses - notably in marketing, to better reach and understand customers. The application of analytics to software data is becoming more popular.

To a large extent, software analytics is about what we can learn and share about software. The data include our own projects but also the software projects by others. Looking back at decades of research in empirical software engineering and mining software repositories, software analytics lets us share all of the following:

  • Sharing insights. Specific lessons learned or empirical findings. An example is that in Windows Vista it was possible to build high-quality software using distributed teams if the management is structured around code functionality (Christian Bird and his colleagues).
  • Sharing models. One of the early models was proposed by Fumio Akiyama and says that we should expect over a dozen bugs per 1,000 lines of code. In addition to defect models, plenty of other models (for example effort estimation, retention and engagement) can be built for software.
  • Sharing methods. Empirical findings such as insights and models are often context-specific, e.g., depend on the project that was studied. However, the method ("recipe") to create findings can often be applied across projects. We refer to "methods" as the techniques by which we can transform data into insight and models.
  • Sharing data. By sharing data, we can use and evolve methods to create better insight and models.

The goal of this seminar was to build a roadmap for future work in this area. Despite many achievements, there are several challenges ahead for software analytics:

  • How can we make data useful to a wide audience, not just to developers but to anyone involved in software?
  • What can we learn from the vast amount of unexplored data?
  • How can we learn from incomplete or biased data?
  • How can we better tie usage analytics to development analytics?
  • When and what lessons can we take from one project and apply to another?
  • How can we establish smart data science as a discipline in software engineering practice and research as well as education?

Seminar Format

In this seminar, we brought together researchers and practitioners from academia and industry who are interested in empirical software engineering and mining software repositories to share their insights, models, methods, and/or data. Before the seminar, we collected input from the participants through an online survey to collect relevant themes and papers for the seminar. Most themes from the survey fell into the categories of method (e.g., measurement, visualization, combination of qualitative with quantitative methods), data (e.g. usage/telemetry, security, code, people, etc.), and best practices and fallacies (e.g. how to choose techniques, how to deal with noise and missing data, correlation vs. causation). A theme that also emerged in the pre-Dagstuhl survey was analytics for the purpose of theory format, i.e. "data analysis to support software engineering theory formation (or, data analytics in support of software science, as opposed to software engineering)".

At the seminar, we required that attendees

  1. discuss the next generation of software analytics;
  2. contribute to a Software Analytics Manifesto that describes the extent to which software data can be exploited to support decisions related to development and usage of software.

Attendees were required to outline a set of challenges for analytics on software data, which will help to focus the research effort in this field. The seminar provided ample opportunities for discussion between attendees and also provide a platform for collaboration between attendees since our time was divided equally between:

  1. Plenary sessions where everyone gave short (10 minute) presentations on their work.
  2. Breakout sessions where focus groups worked on shared tasks.

Our schedule was very dynamic. Each day ended with a "think-pair-share" session where some focus for the next day was debated first in pairs, then shared with the whole group. Each night, the seminar organizers would take away the cards generated in the "think-pair-share" sessions and use that feedback to reflect on how to adjust the next day's effort.

License
  Creative Commons BY 3.0 Unported license
  Harald Gall, Tim Menzies, Laurie Williams, and Thomas Zimmermann

Classification

  • Software Engineering

Keywords

  • Software development
  • Data-driven decision making
  • Analytics
  • Empirical software engineering
  • Mining software repositories
  • Business intelligence
  • Predictive analytics

Buchausstellung

Bücher der Teilnehmer 

Buchausstellung im Erdgeschoss der Bibliothek

(nur in der Veranstaltungswoche).

Dokumentation

In der Reihe Dagstuhl Reports werden alle Dagstuhl-Seminare und Dagstuhl-Perspektiven-Workshops dokumentiert. Die Organisatoren stellen zusammen mit dem Collector des Seminars einen Bericht zusammen, der die Beiträge der Autoren zusammenfasst und um eine Zusammenfassung ergänzt.

 

Download Übersichtsflyer (PDF).

Publikationen

Es besteht weiterhin die Möglichkeit, eine umfassende Kollektion begutachteter Arbeiten in der Reihe Dagstuhl Follow-Ups zu publizieren.

Dagstuhl's Impact

Bitte informieren Sie uns, wenn eine Veröffentlichung ausgehend von
Ihrem Seminar entsteht. Derartige Veröffentlichungen werden von uns in der Rubrik Dagstuhl's Impact separat aufgelistet  und im Erdgeschoss der Bibliothek präsentiert.