Dagstuhl Seminar 08131: Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives

Dagstuhl Seminar 08131

Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives

( Mar 24 – Mar 28, 2008 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/08131

Organizers

Michael Ashburner (University of Cambridge, GB)
Ulf Leser (HU Berlin, DE)
Dietrich Rebholz-Schuhmann (European Bioinformatics Institute - Cambridge, GB)

Contact

Annette Beyer (for administrative matters)

Publications

Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Michael Ashburner, Ulf Leser, and Dietrich Rebholz-Schuhmann (Eds.). Dagstuhl Seminar Proceedings, Volume 8131. June 3, 2008

Summary

Show Summary

Researchers in Text Mining and researchers active in developing ontological re-sources provide solutions to preserve semantic information properly, i.e. in ontologies and/or fact databases. Researchers from both fields tend to work independently from each other, but there is a shared interest to profit from ongoing research in the com-plementary domain. The relatedness of both domains has led to the idea to organize a workshop that brings together members of both research domains.

Life Science researchers deliver their findings in scientific publications. These documents are nowadays distributed electronically and increasingly processed by automatic means to also incorporate those findings and the data into structured, scientific databases. Methods for this purpose are generally subsumed under the term “Text Mining”, encompassing techniques belonging to the fields of machine learning, infor-mation retrieval and natural language processing. Text Mining-based solutions have, for instance, been developed for the identification of protein-protein interactions, of gene regulatory events, for the functional annotation of proteins, for the identification and prioritization of disease-related genes, and for the analysis of results from high-throughput experiments.

Text Mining for the Life Sciences has received considerable interest over the last years and is now an established area for conferences and workshops (e.g., ISMB, KDD, ECCB, Coling, ACL, PSB) and has lead to international large-scale challenge events (KDD-Cup, Genomics track at TREC, BioCreative2&2, BioNLP). The cause for this interest is the ever increasing amount of publications imposing an unbearable work burden on the individual researcher and the promising advances in natural lan-guage processing and machine learning that form the solution to the problem, if they are integrated into biomedical applications.

Text Mining has to cope with a large semantic gap between the raw textual data and the representation of meaningful results in databases, e.g., normalization of events in the text to conceptual representations of events according to “textbook” knowledge. It is hoped that ontologies fill this gap delivering a structured representation of biomedical knowledge. Although large and increasingly comprehensive biological ontologies are now available for many relevant topics (e.g. Gene Ontology, Sequence Ontology, Phenotype Ontologies etc.), it has not yet been proven what type of resources are ideally suited for Text Mining solutions.

Investigating on the aims of research in Text Mining and in ontological design, we find that ontologies are not designed to support Text Mining but rather to improve the annotation of database content. Although, Text Mining solutions intend to fill data-bases with content, it is not the case that Text Mining solution find ontological concepts easily in the literature, and, even more, ontological resources are not designed to support Text Mining solutions in the sense that the ontological terms fit to the demands of a natural language processing system. However, the Text Mining community exploits ontological resources to link generated evidence from the literature to the ontological concepts. Furthermore, the ontologies are not only a tool, but also a target for Text Mining research. Plenty of methods have been devised that automatically or semi-automatically construct ontologies or enrich existing ontologies by extracting terms and relationships from biomedical text collections.

These areas are researched by a community of researchers working in a highly interdisciplinary way in the domains of biology, biochemistry, chemistry, medicine, machine learning, formal ontologies, natural language processing, bioinformatics and others. It was the aim for this seminar to bring together researchers from all those areas to investigate on the state-of-the-art in both research fields, to discuss the suit-ability and progress of available resources, to identify areas where we are lacking tools, standards, or resources, and to foster joint opportunities for Text Mining and ontological research for the benefits of life science research.

In preparation of the seminar and prior to the meeting, the organizers identified three areas that best highlight the achievements and challenges in bringing together ontologies, Text Mining, and biological research:

exploring the benefits resulting from improved relations between Text Mining and biological ontologies,
technical advances in Text Mining and their application to life science re-search, and,
impact of advanced natural language processing (NLP) methods, and
success stories of Text Mining solutions with and without ontological support.

The seminar brought together more than 40 internationally renowned researchers from all domains mentioned beforehand. The ambience of the seminar is best de-scribed with the concept of a prolonged, lively and heated discussion. The discussion was mainly driven by the divergence of requirements, goals, and expectations be-tween the Text Mining and the ontology community. On the other side, a number of talks have pointed out the successful integration of Text Mining solutions into re-search in ontological design and the exploitation of ontological resources for suc-cessful Text Mining solutions.

Participants

Show Participants

Michael Ashburner (University of Cambridge, GB)
Elena Beisswanger (Universität Jena, DE)
Judith A. Blake (The Jackson Laboratory - Bar Harbor, US)
Christopher Brewster (University of Sheffield, GB)
Ted Briscoe (University of Cambridge, GB)
Paul Buitelaar (National University of Ireland - Galway, IE) [dblp]
Anita Burgun-Parenthoine (University of Rennes, FR)
Nigel Collier (National Institute of Informatics - Tokyo, JP)
Anna Divoli (University of Chicago, US)
Juliane Fluck (Fraunhofer SCAI - St. Augustin, DE)
Jörg Hakenberg (Arizona State University - Tempe, US)
Robert Hoehndorf (MPI for Evolutionary Anthropology, DE)
Martin Hofmann-Apitius (Fraunhofer SCAI - St. Augustin, DE)
Jung-Jae Kim (European Bioinformatics Institute - Cambridge, GB)
Martin Krallinger (CNIO - Madrid, ES)
Michael Krauthammer (Yale University, US)
Robert Kueffner (LMU München, DE)
Ulf Leser (HU Berlin, DE) [dblp]
Suzanna Lewis (Lawrence Berkeley National Laboratory, US)
David Milward (Linguamatics Ltd. - Cambridge, GB)
Hans-Michael Mueller (CalTech - Pasadena, US)
Peter Murray-Rust (University of Cambridge, GB)
Goran Nenadic (University of Manchester, GB)
Jong C. Park (KAIST - Daejeon, KR)
Dietrich Rebholz-Schuhmann (European Bioinformatics Institute - Cambridge, GB)
Jasmin Saric (Boehringer Ingelheim Pharma GmbH, DE)
Andreas Schlicker (MPI für Informatik - Saarbrücken, DE)
Michael Schroeder (TU Dresden, DE)
Johannes Schuchhardt (MicroDiscovery GmbH - Berlin, DE)
Stefan Schulz (Uniklinikum Freiburg, DE)
David Shotton (University of Oxford, GB)
Irena Spasic (Univ. of Manchester, GB)
Robert Stevens (University of Manchester, GB) [dblp]
Jian Su (Infocomm Research - Singapore, SG)
Laszlo van den Hoek (Erasmus Univ. - Rotterdam, NL)
Erik van Mulligen (Erasmus Univ. - Rotterdam, NL)
Thomas Wächter (TU Dresden, DE)

Classification

data bases / information retrieval
semantics / formal methods
bioinformatics

Keywords

Text mining
natural language processing
ontologies
ontology design
machine learning
bioinformatics
medical informatics
knowledge management

Seminar 08131

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 08131

Ontologies and Text Mining for Life Sciences: Current Status and Future Perspectives

( Mar 24 – Mar 28, 2008 )

Permalink

Organizers

Contact

Publications

Summary

Participants

Classification

Keywords