- Simone Schilke (for administrative matters)
Big data technology promises to improve people’s lives, accelerate scientific discovery and innovation, and bring about positive societal change. Yet, if not used responsibly, large-scale data analysis can increase economic inequality, affirm systemic bias, and even destabilize global markets. While the potential benefits of data analysis techniques are well accepted, the importance of using them responsibly -- that is, in a fair, neutral and transparent manner -- is rarely considered.
This seminar will bring together academic and industry researchers from several areas of computer science, including data management, data mining, machine learning, security/privacy, and computer networks, as well as social sciences researchers, data journalists, and those active in government think-tanks and policy initiatives. The goal of the seminar is to assess the state of data analysis in terms of fairness and transparency, present recent research, identify new research challenges, and derive an agenda for computer science efforts in responsible data use.
Our society is data-driven. Large scale data analysis, known as Big data, is distinctly present in the private lives of individuals, is a dominant force in commercial domains as varied as automatic manufacturing, e-commerce and personalized medicine, and assists in - or fully automates - decision making in the public and private sectors. Data-driven algorithms are used in criminal sentencing - ruling who goes free and who remains behind bars, in college admissions - granting or denying access to education, and in employment and credit decisions - offering or withholding economic opportunities.
The promise of Big data is to improve people's lives, accelerate scientific discovery and innovation, and enable broader participation. Yet, if not used responsibly, Big data can increase economic inequality and affirm systemic bias, polarize rather than democratize, and deny opportunities rather than improve access. Worse yet, all this can be done in a way that is non-transparent and defies public scrutiny.
Big data impacts individuals, groups and society as a whole. Because of the central role played by this technology, it must be used responsibly - in accordance with the ethical and moral norms that govern our society, and adhering to the appropriate legal and policy frameworks. And as journalists , legal and policy scholars [1,2] and governments [4,5] are calling for algorithmic fairness and greater insight into data-driven algorithmic processes, there is an urgent need to define a broad and coordinated computer science research agenda in this area. The primary goal of the Dagstuhl Seminar "Data, Responsibly" was to make progress towards such an agenda.
The seminar brought together academic and industry researchers from several areas of computer science, including a broad representation of data management, but also data mining, security/privacy, and computer networks, as well as social sciences researchers, data journalists, and those active in government think-tanks and policy initiatives. The problem we aim to address is inherently transdisciplinary. For this reason, it was important to have input from policy and legal scholars, and to have representation from multiple areas within computer science. We were able to attract a mix of European, North American, and South American participants. Out of 39 participants, 10 were women.
Specific goals of the seminar were to:
- assess the state of data analysis in terms of fairness, transparency and diversity;
- identify new research challenges;
- develop an agenda for computer science research in responsible data analysis and use, with a particular focus on potential high-impact contributions from the data management community;
- solicit perspectives on the necessary education efforts, and on responsible research and innovation practices.
The seminar included technical talks and break-out sessions. Technical talks were organized into themes, which included fairness and diversity, transparency and accountability, tracking and transparency, personal information management, education, and responsible research and innovation. Participants suggested topics for seven working groups, which met over one or multiple days.
The organizers felt that the seminar was very successful - ideas were exchanged, discussions were lively and insightful, and we are aware of several collaborations that were started as a result of the seminar. The participants and the organizers all felt that the topic of the seminar is broad, fast moving and extremely important, and that it would be beneficial to hold another seminar on this topic in the near future.
Details about the program are contained in the remainder of this document.
- Kate Crawford. Artificial Intelligence’s White Guy Problem. The New York Times, June 25, 2016.
- Kate Crawford and Ryan Calo. There is a blind spot in AI research. Nature / Comment 538(7625), October 13, 2016.
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine Bias. ProPublica, May 23, 2016.
- Executive Office of the President, The White House. Big Data: Seizing Opportunities, Preserving Values. May 2014
- Parliament and Council of the European Union. General Data Protection Regulation. 2016
- Serge Abiteboul (ENS - Cachan, FR) [dblp]
- Marcelo Arenas (Pontificia Universidad Catolica de Chile, CL) [dblp]
- Solon Issac Barocas (Microsoft - New York, US) [dblp]
- Chaitanya Baru (NSF - Arlington, US) [dblp]
- Claudia Bauzer Medeiros (UNICAMP - Campinas, BR) [dblp]
- Bettina Berendt (KU Leuven, BE) [dblp]
- Claude Castelluccia (INRIA - Grenoble, FR) [dblp]
- Nicholas Diakopoulos (University of Maryland - College Park, US) [dblp]
- Marina Drosou (Hellenic Police - Athen, GR) [dblp]
- Gerald Friedland (ICSI - Berkeley, US) [dblp]
- Sorelle Friedler (Haverford College, US) [dblp]
- Irini Fundulaki (FORTH - Heraklion, GR) [dblp]
- Krishna P. Gummadi (MPI-SWS - Saarbrücken, DE) [dblp]
- Michael Hay (Colgate University - Hamilton, US) [dblp]
- Bill Howe (University of Washington - Seattle, US) [dblp]
- H. V. Jagadish (University of Michigan - Ann Arbor, US) [dblp]
- Benny Kimelfeld (Technion - Haifa, IL) [dblp]
- Amélie Marian (Rutgers University - Piscataway, US) [dblp]
- Pauli Miettinen (MPI für Informatik - Saarbrücken, DE) [dblp]
- Gerome Miklau (University of Massachusetts - Amherst, US) [dblp]
- Wolfgang Nejdl (Leibniz Universität Hannover, DE) [dblp]
- Benjamin Nguyen (INSA - Bourges, FR) [dblp]
- Evaggelia Pitoura (University of Ioannina, GR) [dblp]
- Salvatore Ruggieri (University of Pisa, IT) [dblp]
- Rishiraj Saha Roy (MPI für Informatik - Saarbrücken, DE) [dblp]
- Arnaud Sahuguet (Cornell Tech NYC, US) [dblp]
- Eric Simon (SAP France, FR) [dblp]
- Julia Stoyanovich (Drexel Univ. - Philadelphia, US) [dblp]
- Jannik Strötgen (MPI für Informatik - Saarbrücken, DE) [dblp]
- Fabian Suchanek (Télécom ParisTech, FR) [dblp]
- Kristene Unsworth (Drexel Univ. - Philadelphia, US) [dblp]
- Jan Van den Bussche (Hasselt University, BE) [dblp]
- Suresh Venkatasubramanian (University of Utah - Salt Lake City, US) [dblp]
- Agnès Voisard (FU Berlin, DE) [dblp]
- Nicholas Weaver (ICSI - Berkeley, US) [dblp]
- Gerhard Weikum (MPI für Informatik - Saarbrücken, DE) [dblp]
- Christo Wilson (Northeastern University - Boston, US) [dblp]
- Cong Yu (Google - New York, US) [dblp]
- Ben Zevenbergen (University of Oxford, GB) [dblp]
- data bases / information retrieval
- world wide web / internet
- data management
- data mining
- machine learning
- data analysis
- big data
- data fairness
- data transparency
- data provenance