October 11 – 16 , 2020, Event 20423

CANCELLED Interoperability of Metadata Standards in Cross-Domain Science, Health, and Social Science Applications III

Due to the Covid-19 pandemic, this seminar was cancelled. A related Event was scheduled to September 26 – October 1 , 2021 – Seminar 21393.


Simon Cox (CSIRO – Clayton South, AU)
Arofan Gregory (Jaffrey, US)
Simon Hodson (CODATA – Paris, FR)
Steven McEachern (Australian National University – Acton, AU)
Joachim Wackerow (GESIS – Mannheim, DE)

This workshop builds on the outcomes of two previous Dagstuhl Workshops in 2018 and 2019 on the alignment of standards and technologies for cross-domain data combination. The first two workshops in this series have produced draft guidelines and use case documentation to provide insight into the cross-domain challenges which form the focus of the ISC CODATA Decadal Programme on ‘Making Data Work for Cross-Domain Grand Challenges’.

Four initial Working Groups emerged from discussions at the workshop in 2019. This third workshop will act as a face-to-face meeting and as a sprint for them, following and augmenting their online collaboration.

Scope and Background

To face many of today’s global grand challenges, data is needed from different domains and disciplines, and from different institutional levels, and it must be interoperable to be useful. Research projects in such fields, whether for policy or scientific purposes, often involve the use of data from a wide variety of sources, ranging from specific, local data sets to those supplied by higher-level national and international organizations. A huge proportion of research effort is expended to integrate and harmonize this data so that a meaningful analysis can be conducted.

Global grand challenges require data coming from a wide range of domains and institutional levels, presenting us with diverse issues:

  • Semantics, classifications, and terminology must be clear not only across domains and national boundaries, but also vertically within chains of data reporting and use
  • Metadata specifications for different purposes must be comprehensible at a computational as well as human-readable level, requiring both harmonization/alignment and better machine-actionable models and techniques
  • The provenance and processing of data must be made explicit in a fashion which supports further computation, enabling machine reproducibility of findings
  • The connection between scientific micro-data and official statistics at the national and international level must be strengthened, to improve both usability and quality for policy and scientific researchers alike

Working Groups

Each group contains experts in the use case and its data, as well as experts in relevant standards, technologies, and semantic/conceptual models.

Group 1:

Semantic Interoperability and Conceptual Framework
This group focuses on conceptual and semantic issues impacting the problem space. There are draft guidelines for vocabularies (a.k.a. terminologies, classifications, ontologies, etc.) and how they are managed and published. These will form one basis of the work. Additionally, a conceptual framework for data-sharing activities has been discussed and drafted but is in early stages of development. This framework is critical in communicating and organizing the outputs from all of the other working groups. Other activities may be identified as the work proceeds.

Group 2:

Policy Monitoring Indicators
This group focuses on the Sustainable Development Goals (SDGs), Disaster Risk Reduction (DRR), and other policy monitoring indicators, and the relationship of these data to scientific work in the disaster risk reduction, infectious disease, and resilient cities groups. One area of focus would be technology alignment in support of researcher use of international aggregates (like the SDGs), covering W3C RDF Data Cube Vocabulary, Statistical Data and Metadata eXchange (SDMX), and various related aspects.

Group 3:

Infectious Disease
The focus of this group is on the end-to-end flow of data in infectious disease research. This will initially cover the Health and Demographic Surveillance System (HDSS) infrastructure work from last year and builds on the London School of Hygiene & Tropical Medicine (LSHTM) case integrating this data with other data, for end users’ analysis and for better using SDG data. It will in turn be expanded to include an examination of pandemic epidemiology data, particularly with reference to Coronavirus disease 2019 (COVID-19). All of the draft guidelines in these areas would be further explored. Relationships to DRR and the Indicator Graph would also be explored. This group covers many technical areas, including reusable approaches to data integration and intersection with

Group 4:

Resilient Cities
This group focuses on the topics emerging from recent work in Medellín, Columbia, and on other examples, including some from India and elsewhere. Specifically, data related to transportation, health, planning, and measuring economic impacts will be examined, as a means of approaching challenges for data integration. The use of international data for references purposes will be considered, as will needed harmonization around time, geography, and other issues identified in earlier workshops.


This work brings together experts from both the world of official statistics and global policy monitoring data, technologists, and researchers with a scientific and academic focus. Technologies which address the creation, management and exchange of metadata will be central to this work, to support discovery, analysis, automated processing, and enhanced reusability of data. Further, the intersection of these technologies with machine learning approaches will be considered. A broad range of standard models and specifications in these areas will serve as a focus of the effort, looking not only at how such models can be aligned, but also how best to perform computation across them.

