Gregory, Arofan; Todd, Jim; Amadi, David; Greenfield, Jay; Muyingo, Sylvia; Tomlin, Keith
One of the key requirements for FAIR data reuse is that the user of a FAIR data resource understands the exact nature of the data. The FAIR principles talk about the kinds of metadata needed to describe data, but it is necessary for implementers to understand how these metadata can be provided, to effectively realise FAIR within their systems. This implementation guide describes the way all aspects of the data are made available for use, both within and from outside the INSPIRE Network community, using standard metadata to describe the data. This is an exploration of how generic standards can be used to express the agreed community metadata set. The INSPIRE platform supports network studies using population health data to stand up their own instances of a common data model called the OMOP CDM. The WorldFAIR project is an exploration to facilitate a better understanding of what is needed for data infrastructures to provide data in line with the FAIR principles within and across domains.
The types of metadata used in INSPIRE are aligned as much as possible with existing and popular models common in the public health domain. Primary among these are the standards (and tools) coming from OHDSI (Observational Health Data Sciences and Informatics), notably their OMOP Common Data Model (CDM). This suite of products addresses the definition of specific concepts and their semantics, standard (primarily medical) classifications, and the mechanism for selecting data from among those available to produce a specific cohort for analysis. These standards are common within the public health domain internationally, and INSPIRE has chosen to use them to reduce the significant cost of developing tools for many aspects of data and metadata management and use.
FAIR demands that we provide data in a useful way to those who may not be familiar with the community tools and standards used by INSPIRE. More generic standards are thus needed to support this broader community. It is significant that members of the OHDSI community have already looked at how Schema.org – developed and supported by many popular search engines, Google foremost among them – can be used in combination with the OHDSI OMOP CDM to describe data resources. Here, INSPIRE builds on that work to describe how INSPIRE data resources, specifically, can be documented in a way which will be maximally accessible to users both within the community and external to it.
One critical part of the overall information set provided by standard FAIR metadata is a description of the experiment for which the data was used, and the protocol employed in the selection and analysis of the data. This aspect of the metadata description is a major focus of the implementation guide, and one for which Schema.org would seem to be well-suited.
WorldFAIR WP (Work Package) 07 is one of eleven domain-specific case studies being undertaken by the WorldFAIR project, with the domain-specific practices being analysed across these domains in WP02. Early indications from WP02 suggest that Schema.org is one of the standards which will be recommended as part of the Cross-Domain Interoperability Framework (CDIF). This implementation guide contributes to an understanding of exactly how Schema.org fits into the description of domain data.
While some open questions remain, the implementation guide has achieved its primary goal of showing how standards such as Schema.org can be used within the public health domain to provide a complete set of the information needed for FAIR data use across and within domain boundaries.
The report is available on Zenodo.