September 26 – October 1 , 2021, Event 21393

Interoperability of Metadata Standards in Cross-Domain Science III


Simon Cox (CSIRO – Clayton South, AU)
Arofan Gregory (Jaffrey, US)
Simon Hodson (CODATA – Paris, FR)
Steven McEachern (Australian National University – Acton, AU)
Joachim Wackerow (GESIS – Mannheim, DE)

For support, please contact

Heike Clemens


This workshop builds on the outcomes of two previous Dagstuhl Workshops in this series in 2018 and 2019, and on virtual collaborations, conducted during the pandemic that furthered those outcomes. The focus is on the alignment of standards and technologies for cross-domain data combination and the application of this to concrete use cases. The workshop will make a direct contribution to the International Science Council CODATA Decadal Programme on ‘Making Data Work for Cross-Domain Grand Challenges’, which will be formally launched in October 2021.

Workshop Goals

The workshop will focus on the challenges and solutions for data combination in three case studies (listed below). For each case study, an overview of relevant semantics and metadata profiles will be prepared before the workshop. The workshop will explore and report on the challenge of semantic alignment and the utility of potential solutions. Outputs will include:

  1. Short overview reports on the state of semantics and metadata for each case study;
  2. Recommendations for future work and possible solutions to improve interoperability of semantics and metadata in the chosen case studies.

Background: The Challenge of Combining Cross-Domain Data

To face many of today’s global grand challenges, data is needed from different domains and disciplines, and from different institutional levels, and it must be interoperable to be useful. Research projects in such fields, whether for policy or scientific purposes, often involve the use of data from a wide variety of sources, ranging from specific, local data sets to those supplied by higher-level national and international organizations. A huge proportion of research effort is expended to integrate and harmonize this data so that a meaningful analysis can be conducted.

Global grand challenges require data coming from a wide range of domains and institutional levels, presenting us with diverse issues:

  • Semantics, classifications, and terminology must be clear not only across domains and national boundaries, but also vertically within chains of data reporting and use
  • Metadata specifications for different purposes must be comprehensible at a computational as well as human-readable level, requiring both harmonization/alignment and better machine-actionable models and techniques
  • The provenance and processing of data must be made explicit in a fashion which supports further computation, enabling machine reproducibility of findings
  • The connection between scientific micro-data and official statistics at the national and international level must be strengthened, to improve both usability and quality for policy and scientific researchers alike

New approaches to discovery and analysis of data allow for truly large-scale projects which rely heavily on automation, cutting across traditional boundaries between domains and disciplines. Such approaches require that the data resources themselves be prone to automated discovery, access, and use. Some of the basic concepts behind such approaches can be found in the FAIR Data Principles.

At heart, however, these ideas require that the multiplicity of models used to structure and describe data be themselves prone to interchange at a computational level. This demands standardization and mapping across various data and metadata models, efforts which have been to some degree ongoing for many years. By themselves, however, these efforts are insufficient. What is needed are models/frameworks at a higher level of abstraction, which can support such harmonization at a computational level. Efforts to develop such a model are emerging but are still nascent. Further, a community of widespread practice is needed so that such models will be adopted and used.

This work brings together experts from both the world of official statistics and global policy monitoring data, technologists, and researchers with a scientific and academic focus. Technologies which address the creation, management and exchange of metadata will be central to this work, to support discovery, analysis, automated processing, and enhanced reusability of data. Further, the intersection of these technologies with machine learning approaches will be considered. A broad range of standard models and specifications in these areas will serve as a focus of the effort, looking not only at how such models can be aligned, but also how best to perform computation across them.

Workshop Case Studies

Building on the achievements of previous workshop and the progress made in virtual Working Groups during the pandemic, this workshop will apply the progress made in semantics and metadata specifications (including DDI-CDI) to the following case studies:

  • Policy monitoring and the research-policy interface (disaster risk reduction and SDGs);
  • Resilient and Healthy Cities;
  • Infectious Diseases (including COVID-19).

For each case study, an online collaborative Working Group will prepare an up-to-date audit of standards, specifications and vocabularies that are key to research in their domain. The challenge of semantic alignment and the utility of potential solutions, including profiles for FAIR digital objects and DDI-CDI, will be explored and appraised.

Motivation text license
  Creative Commons BY 3.0 DE
  Joachim Wackerow

Event Series

Online Publications

We offer several possibilities to publish the results of your event. Please contact publishing(at) if you are interested.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf in the library.