Evaluation in the Crowd: Crowdsourcing and Human-Centred Experiments
( 22. Nov – 27. Nov, 2015 )
- Daniel Archambault (Swansea University, GB)
- Tobias Hoßfeld (Universität Duisburg-Essen, DE)
- Helen C. Purchase (University of Glasgow, GB)
- Susanne Bach-Bernhard (für administrative Fragen)
- Evaluation in the Crowd : Crowdsourcing and Human-Centered Experiments : Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22 - 27, 2015, Revised Contributions - Archambault, Daniel; Purchase, Helen C.; Hoßfeld, Tobias - Springer, 2017. - VI, 190 S. - (Lecture notes in computer science : State-of-the-Art Survey ; 10264). ISBN: 978-3-319-66434-7 / 3-319-66434-4.
- Information Visualization Evaluation Using Crowdsourcing : article : EuroVis 2018 - Borgo, Rita; Micallef, Luana; Bach, Benjamin; MacGee, Fintan; Lee, Bongshin - Chichester : Wiley, 2018. - 24 pp. - (Computer graphics forum ; 2018).
- Report on the Dagstuhl Seminar 15481 "Evaluation in the Crowd: Crowdsourcing and Human-Centred Experiments" (November 2015) : article in KuVS Newsletter PP. 13-14 - Hoßfeld, Tobias; Archambault, Daniel; Purchase, Helen C. - GI Fachgruppe KuVS, 2015. - pp. 13-14.
Human-centred empirical evaluations play an important role in the fields of HCI, visualization, and graphics in testing the effectiveness of visual representations. The advent of crowdsourcing platforms (such as Amazon Mechanical Turk) has provided a revolutionary methodology to conduct human-centred experiments. Through such platforms, experiments can now collect data from hundreds, even thousands, of participants from a diverse user community over a matter of weeks, greatly increasing the ease with which we can collect data as well as the power and generalizability of experimental results. However, the use of such experimental platforms does not come without its problems: ensuring participant investment in the task, defining experimental controls, collecting qualitative data, and understanding the ethics behind deploying such experiments en masse.
The focus of this Dagstuhl seminar is to discuss experiences and methodological considerations when using crowdsourcing platforms to run human-centred experiments. We aim to bring together researchers in areas that use crowdsourcing to run human-centred experiments and aim to have a high degree of interdisciplinarity. We target members of the human-computer interaction, visualization, psychology, and applied perception research communities as typical users of crowdsourcing platforms. We also wish to engage researchers who develop the technology that makes crowdsourcing possible and researchers who have studied the crowdsourcing community. The following topics will be discussed:
- Crowdsourcing Platforms vs. The Laboratory. The laboratory setting for human-centred experiments has been employed for decades and has a well understood methodology with known advantages and limitations. Studies performed on crowdsourcing platforms provide new opportunities and new challenges. A cross community discussion over the nature of these technologies as well as their advantages and limitations is needed. When should we use crowdsourcing? More importantly, when should we not?
- Scientifically Rigorous Methodologies. Understanding the strengths and limitations of a crowdsourcing platform can help us refine our human-centred experimental methodologies. When running between-subjects experiments, what considerations do we need to make when allocating our participant pools that will be compared? Are within-subjects experiments too taxing for crowdsourced participants? How do we effectively collect qualitative information?
- Crowdsourcing Experiments in Human-Computer Interaction, Visualization, and Applied Perception/Graphics. Each of our fields has unique challenges when designing, deploying, and analysing the results of crowdsourcing evaluation. We are especially interested in the experiences and best practice findings of our communities in regards to these methodologies.
- Getting to Know the Crowd. Much of this seminar examins the ways that our research communities can use the technology in order to evaluate the software systems and techniques that we design. However, it is important to consider the people that accept and perform the jobs that we post on these platforms. What are they like?
- Ethics in Experiments. Even though the participants of a crowdsourcing study never walk into the laboratory, ethical considerations behind this new platform need to be discussed. What additional considerations are needed beyond standard ethical procedures when running crowdsourcing experiments? How do we ensure that we are compensating our participants adequately for their work, while considering the nature of microtasks?
The intended output of this seminar is an edited volume of articles that will become a primer text on the use of crowdsourcing in our diverse research communities. We also expect that with the range of researchers invited to this seminar new collaborative and interdisciplinary projects will be fostered.
In various areas of computer science like visualization, graphics, or multimedia, it is often required to involve the users, e.g. to measure the performance of the system with respect to users, e.g. to measure the user perceived quality or usability of a system. A popular and scientifically rigorous method for assessing this performance or subjective quality is through formal experimentation, where participants are asked to perform tasks on visual representations and their performance is measured quantitatively (often through response time and errors). For the evaluation of the user perceived quality, users are conducting some experiments with the system under investigation or are completing user surveys. Also in other scientific areas like psychology, such subjective tests and user surveys are required. One approach is to conduct such empirical evaluations in the laboratory, often with the experimenter present, allowing for the controlled collection of quantitative and qualitative data. Crowdsourcing platforms can address these limitations by providing an infrastructure for the deployment of experiments and the collection of data over diverse user populations and often allows for hundreds, sometimes even thousands, of participants to be run in parallel over one or two weeks. However, when running experiments on this platform, it is hard to ensure that participants are actively engaging with the experiment and experimental controls are difficult to implement. Often, qualitative data is difficult, if not impossible, to collect as the experimenter is not present in the room to conduct an exit survey. Finally, and most importantly, the ethics behind running such experiments require further consideration. When we post a job on a crowdsourcing platform, it is often easy to forget that people are completing the job for us on the other side of the machine.
The focus of this Dagstuhl seminar was to discuss experiences and methodological considerations when using crowdsourcing platforms to run human-centred experiments to test the effectiveness of visual representations in these fields. We primarily target members of the human-computer interaction, visualization, and applied perception research as these communities often engage in human-centred experimental methodologies to evaluate their developed technologies and have deployed such technologies on crowdsourcing platforms in the past. Also, we engaged researchers that study the technology that makes crowdsourcing possible. Finally, researchers from psychology, social science and computer science that study the crowdsourcing community participated and brought another perspective on this topic. In total, 40 researchers from 13 different countries participated in the seminar. The seminar was held over one week, and included topic talks, stimulus talks and flash ('late breaking') talks. In a 'madness' session, all participants introduced themselves in a fast-paced session within 1 minutes. The participants stated their areas of interest, their expectations from the seminar, and their view on crowdsourcing science. The major interests of the participants were focused in different working groups:
- Technology to support Crowdsourcing
- Crowdworkers and the Crowdsourcing Community
- Crowdsourcing experiments vs laboratory experiments
- The use of Crowdsourcing in Psychology research
- The use of Crowdsourcing in Visualisation research
- Using Crowdsoursing to assess Quality of Experience
The abstracts from the different talks, as well as the summary of the working groups can be found on the seminar homepage and this Dagstuhl report. Apart from the report, we will produce an edited volume of articles that will become a primer text on (1) the crowdsourcing technology and methodology, (2) a comparison between crowdsourcing and lab experiments, (3) the use of crowdsourcing for visualization, psychology, and applied perception empirical studies, and (4) the nature of crowdworkers and their work, their motivation and demographic background, as well as the relationships among people forming the crowdsourcing community.
- Daniel Archambault (Swansea University, GB) [dblp]
- Benjamin Bach (Microsoft Research - Inria Joint Centre, FR) [dblp]
- Kathrin Ballweg (TU Darmstadt, DE) [dblp]
- Rita Borgo (Swansea University, GB) [dblp]
- Alessandro Bozzon (TU Delft, NL) [dblp]
- Sheelagh Carpendale (University of Calgary, CA) [dblp]
- Remco Chang (Tufts University - Medford, US) [dblp]
- Min Chen (University of Oxford, GB) [dblp]
- Stephan Diehl (Universität Trier, DE) [dblp]
- Darren J. Edwards (Swansea University, GB) [dblp]
- Sebastian Egger-Lampl (AIT Austrian Institute of Technology - Wien, AT) [dblp]
- Sara Irina Fabrikant (Universität Zürich, CH) [dblp]
- Brian D. Fisher (Simon Fraser University - Surrey, CA) [dblp]
- Ujwal Gadiraju (Leibniz Universität Hannover, DE) [dblp]
- Neha Gupta (University of Nottingham, GB) [dblp]
- Matthias Hirth (Universität Würzburg, DE) [dblp]
- Tobias Hoßfeld (Universität Duisburg-Essen, DE) [dblp]
- Jason Jacques (University of Cambridge, GB) [dblp]
- Radu Jianu (Florida International University - Miami, US) [dblp]
- Christian Keimel (IRT - München, DE) [dblp]
- Andreas Kerren (Linnaeus University - Växjö, SE) [dblp]
- Stephen G. Kobourov (University of Arizona - Tucson, US) [dblp]
- Bongshin Lee (Microsoft Research - Redmond, US) [dblp]
- David Martin (Xerox Research Centre Europe - Grenoble, FR) [dblp]
- Andrea Mauri (Polytechnic University of Milan, IT) [dblp]
- Fintan McGee (Luxembourg Inst. of Science & Technology, LU) [dblp]
- Luana Micallef (HIIT - Helsinki, FI) [dblp]
- Sebastian Möller (TU Berlin, DE) [dblp]
- Babak Naderi (TU Berlin, DE) [dblp]
- Martin Nöllenburg (TU Wien, AT) [dblp]
- Helen C. Purchase (University of Glasgow, GB) [dblp]
- Judith Redi (TU Delft, NL) [dblp]
- Peter Rodgers (University of Kent, GB) [dblp]
- Dietmar Saupe (Universität Konstanz, DE) [dblp]
- Ognjen Scekic (TU Wien, AT) [dblp]
- Paolo Simonetto (Romano dEzzelino (VI), IT) [dblp]
- Tatiana von Landesberger (TU Darmstadt, DE) [dblp]
- Ina Wechsung (TU Berlin, DE) [dblp]
- Michael Wybrow (Monash University - Caulfield, AU) [dblp]
- Michelle X. Zhou (Juji Inc. - Saratoga, US) [dblp]
- computer graphics / computer vision
- society / human-computer interaction
- world wide web / internet
- Information Visualization
- Data Visualization
- Applied Perception
- Human-Computer Interaction
- Empirical Evaluations