October 27 to November 1, 2013, Dagstuhl Seminar 13441
Evaluation Methodologies in Information Retrieval
1 / 2 >
For support, please contact
Annette Beyer for administrative matters
Marc Herbstritt for scientific matters
As part of the mandatory documentation, participants are asked to submit their talk abstracts, working group results, etc. for publication in our series Dagstuhl Reports via the Dagstuhl Reports Submission System.
(Use seminar number and access code to log in)
Evaluation of information retrieval (IR) systems has a long tradition. However, the test-collection based evaluation paradigm is of limited value for assessing today's IR applications, since it fails to address major aspects of the IR process. Thus there is a need for new evaluation methodologies, which are able to deal with the following issues:
- In interactive IR, users have a wide variety of interaction possibilities. The classical paradigm only regards the document ranking for a single query. In contrast, new functions such as search term completion, query term suggestion, faceted search, document clustering, query-biased summaries etc. also have a strong influence on the user's search experience and thus should be considered in an evaluation.
- From a user's point of view, evaluation of performance of IR systems should be in terms of how well they are supported with respect to whole search sessions. Typically, users initiate a session with a specific goal (e. g. acquiring crucial information for making a decision, learning about topics or events they are interested in, or just for getting entertained). Thus, the overall quality of a system should be evaluated wrt. the user's goal. However, it is an open research issue how this can be achieved.
- There is an increasing number of search applications (especially on mobile devices), which support specific tasks (e. g. finding the next restaurant, comparing prices for a specific product). Here goal-oriented evaluation may be more straightforward. From an IR researcher's point of view, however, we would like to learn about the quality of contribution of the underlying IR engine, and how it can possibly be improved.
- Besides ad-hoc-retrieval, also monitoring or filtering is an important IR task type. Here streams of short messages (e.g. tweets, chats) pose new challenges. It is an open question whether or not the classical relevance-based evaluation is sufficient for a user-oriented evaluation.
In order to address these issues, there is a need for the development of appropriate methodologies such as:
- Evaluation infrastructures provide test-beds along with evaluation methods, software and databases for computing measures, collecting and comparing results.
- Test-beds for interactive IR evaluation are hardly reusable at the moment (with the exception of simulation approaches like e.g. in the TREC session track). However, sharing data from user experiments might be an important step in this direction.
- Living labs use operational systems as experimental platforms on which to conduct user-based experiments at scale. To be usable, we need a sites attracting enough traffic, and an architecture that allows for plugging in components from different researcher groups.
- Frameworks for modeling system-user interactions with clear methodological implications
This seminar aims to
- Increase understanding of the central problems in evaluating information retrieval
- Meld a cross-fertilization of ideas in the evaluation approaches from the different IR evaluation communities
- Create new methodologies and approaches for solving existing problems
- Enhance the validity and reliability of future evaluation experiments
- Examine how to extract pertinent IR systems design elements from the results of evaluation experiments, in the long run.
To attain the goals of the seminar, each participant will be expected to identify one to five crucial issues in IR evaluation methodology. These perspectives will result in primarily theoretical presentations with empirical examples from current studies. Based on these contributions we will identify a selected set of methodological issues for further development in smaller working groups. The expected outcomes of the seminar will be the basis for one or more new evaluation frameworks and improved methodological solutions.
- Data Bases / Information Retrieval
- Evaluation design
- Evaluation analysis
- Test collections
- Lab experiments
- Living labs