TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


SmartER Affiliations

Enhancing Open Repositories through Harvesting and Extracting Affiliation Data as First-class Citizen

October 2023 – September 2026

Description

The dblp computer science bibliography provides search functionality over metadata of scientific publications and links to full-text PDFs of scientific publications for the discipline of computer science. To this end, dblp provides high-quality metadata on author names, titles, and venues, including a unique identification of authors, whenever possible. To take the next step, we plan to extend the dblp database to include affiliation information, whenever possible. We have compiled three use cases through which the users benefit from affiliation data. These include direct benefits, such as new and useful search functionality, and indirect benefits, such as better author disambiguation as well as a more accurate data basis for scientometric studies and measuring scientific output. The goal of this project is to develop and evaluate an e-research tool chain that addresses all three use cases and elevates affiliations to a first-class citizen in the dblp data environment.

We have broken down this challenge into four tasks: Get the data, extract the metadata, integrate it into both the back end and the front end of dblp and introduce the data to the community. Specifically, we will build a multi-source metadata harvester to automatically discover and collect metadata from different structured and unstructured web sources such as RDF on the Web of Data, full-text PDFs, Websites, and custom APIs provided by publishers. We download the content and extract and cleanse the metadata from the different web sources. For example, we apply entity recognition to extract the metadata from a PDF, in particular the authors’ affiliations, and match the extracted metadata to external knowledge bases, e.g., lists of known affiliations. The extracted information is fused into a metadata record based on an extended, provenance-aware metadata model. We ingest the new metadata records into the dblp database, where it is manually inspected, edited, and confirmed by curators using the dblp editorial manager. Through this iterative manual inspection, feedback is generated, which is returned to improve the machine learning model for extracting affiliation information, and is also used to improve the metadata harvester.

Through user studies, we tailor the new affiliation interface to meet the users’ needs, but also take care to integrate the new information into the ongoing author disambiguation and quality assurance processes of dblp’s editorial management system. Last but not least, all gathered information will be made publicly available under FAIR principles as part of the ongoing dblp effort to support the e-research community with high-quality and trustworthy datasets, which are already used by thousands of researchers and software developers worldwide.

Partners

Organisation

The project is funded by a grant of the German Research Foundation (DFG) funding programme "e-Research Technologies" (grant project number 515537520).

Web links
Infrastructures
dblp