WorldFAIR Final Policy Brief (Deliverable D1.4)
Authors: Hodson, Simon; Gregory, Arofan
This document presents the policy recommendations from the WorldFAIR project. It synthesises the project’s findings and presents recommendations for specific stakeholders, for the European Open Science Cloud and other Research Infrastructures globally.
To meet the challenges and opportunities confronting 21st century science, including the need to support interdisciplinary research and the impact of AI, it calls for a shift to a data engineering approach and for investment in metadata uplift and the implementation of the FAIR principles to enable this.
FAIR Implementation Profiles (FIPs) in WorldFAIR: What Have We Learnt? (Deliverable 2.1)
Authors: Gregory, Arofan; Hodson, Simon
Report on the FAIR Implementation Profiles completed by project Case Studies in 2022. Project Deliverable D2.1 for EC WIDERA-funded project “WorldFAIR: Global cooperation on FAIR data policy and practice”.This report gives a brief overview of the experience of the WorldFAIR project in using FAIR Implementation Profiles (FIPs). It describes the WorldFAIR project, its objectives and its rich set of Case Studies; and it introduces FIPs as a methodology for listing the FAIR implementation decisions made by a given community of practice. Subsequently, the report gives an overview of the initial feedback and findings from the Case Studies, and considers a number of issues and points of discussion that emerged from this exercise. Finally, and most importantly, we describe how we think the experience of using FIPs will assist each Case Study in its work to implement FAIR, and will assist the project as a whole in the development of two key outputs: the Cross-Domain Interoperability Framework (CDIF), and domain-sensitive recommendations for FAIR assessment.
We hope this report will be of interest to data experts who want to find out more about the WorldFAIR project, its remarkable and diverse array of Case Studies, and about FIPs. It is important to stress that this report does not set out to give a comprehensive appraisal of the FIPs approach and could not do so. All the WorldFAIR Case Studies have developed an initial FIP, but the process of reflection on practice will continue throughout the project. Each Case Study will complete at least one further FIP, and in some cases more than one, towards the end of the project and this will enrich our understanding of the utility of the approach. At that stage, we intend to be able to incorporate some robust prospective and aspirational considerations, and we need to consider how best to represent this in the FIPs.As noted above, the final section of this report looks forward to the development of the Cross-Domain Interoperability Framework (CDIF), and domain-sensitive recommendations for FAIR assessment. On both these counts, we consider that the FIPs approach has helped considerably:
For the CDIF, through helping refine our initial functional analysis of the requirements for cross-domain FAIR, and—as predicted—helping identify some candidate cross-domain standards.
For the FAIR assessment recommendations, through demonstrating that the FIPs can provide an empirical basis for such recommendations, reflecting both the current practice, and the aspirations of a given community or research domain.
WorldFAIR’s Experience with FIPs – second set of FAIR Implementation Profiles for each case study (Deliverable 2.2)
This report provides a summary of the experience of the WorldFAIR project using FAIR Implementation Profiles (FIPs). It briefly revisits the project’s interest in the FIPs approach and summarises the use of FIPs by each of the eleven WorldFAIR case study work packages (WPs).
The report then provides feedback on the use of FIPs. This includes a discussion of what went well and what went less well; considerations from the case studies on how FIPs might best be used; and constructive feedback on how the approach, and the supporting tools and technologies, might be improved. These points are presented as a set of targeted recommendations at the end of the report.
Cross-DomainInteroperabilityFramework (Deliverable 2.3)
The Cross-Domain Interoperability Framework (CDIF) is designed to support FAIR implementation for these projects by establishing a ‘lingua franca’ for this information, based on existing standards and technologies to support interoperability, in both human- and machine-actionable fashion. CDIF is a set of implementation recommendations, based on profiles of common, domain-neutral metadata standards which are aligned to work together to support core functions required by FAIR.
This report presents a core set of five CDIF profiles, which address the most important functions for cross-domain FAIR implementation.
- Discovery (discovery of data and metadata resources)
- Data access (specifically, machine-actionable descriptions of access conditions and permitted use)
- Controlled vocabularies (good practices for the publication of controlled vocabularies and semantic artefacts)
- Data integration (description of the structural and semantic aspects of data to make it integration-ready)
- Universals (the description of ‘universal’ elements, time, geography, and units of measurement).
Each of these profiles is supported by specific recommendations, including the set of metadata fields in specific standards to use, and the method of implementation to be employed for machine-level interoperability.
A further set of topics is examined, establishing the priorities for further work. These include:
- Provenance (the description of provenance and processing)
- Context (the description of ‘context’ in the form of dependencies between fields within the data and a description of the research setting)
- Perspectives on AI (discussing the impacts of AI and the role that metadata can play)
- Packaging (the creation of archival and dissemination packages)
- Additional Data Formats (support for some of the data formats not fully supported in the initial release, such as NetCDF, Parquet, and HDF5).
In each of these topics, current discussions are documented, and considerations for further work are provided.
Recommendations and framework for FAIR Assessment within (and across) disciplines (Deliverable 2.4)
This report presents the WorldFAIR recommendations on FAIR assessment and discusses a framework for FAIR assessment to take into account the practices of scientific disciplines. This report surveys current activity on FAIR assessment tools and highlights some of the issues that have been encountered, particularly in relation to the practices of repositories serving particular research disciplines.
The report then discusses the purpose of FAIR assessment. After discussing current approaches, we again argue that FAIR assessment should consider community practice and convergence on recognised FAIR standards and technologies. The report then examines means of understanding domain practices. Here we point to the work of the WorldFAIR case studies. Finally, after discussing some of the challenges of FAIR assessment in relation to domain-specific and cross-domain requirements, including those of machine-to-machine interoperability, we present recommendations towards a framework for FAIR assessment, and towards FAIR assessment more broadly.
Digital Recommendations For Chemistry FAIR Data Policy And Practice (Deliverable 3.1)
Authors: McEwen, Leah; Bruno, Ian
The overarching goal of the WorldFAIR Chemistry Work Package (WP03) is to support the use of chemical data standards in research workflows, between and across disciplines. This will enable downstream data reuse through provision of practical direction and resources. The aim of this deliverable is to establish a framework that can be used by policymakers and developers of services and tools to support FAIR (Findable, Accessible, Interoperable and Reusable) reporting of chemical data. Specific objectives are to highlight applicability of existing standards at a practical level and to identify gaps that need to be addressed to achieve wider data re-use goals.
This report reviews some of the critical and persistent issues around documentation of chemical information. These were identified through a series of webinar panels on the theme: “What is a chemical?”, and through other conferences, workshops, and ongoing collaborative projects run as part of the WorldFAIR project and by the International Union of Pure and Applied Chemistry (IUPAC, the lead organisation of WP03).
Chemicals are everywhere and every tangible object has a chemical nature that impacts its use and behaviour in the environment. As chemical data and chemical principles are increasingly applied broadly across disciplines, the range of representations and contexts for chemical substances and data become more diverse and less easy to precisely define. Molecular entities are fundamental to our understanding of material properties and underlie the configuration of many chemical data models and resources, but it is also critical to look beyond the molecule to particles, surfaces, and states, and their behaviour under different conditions. Few chemistry-related disciplines have mature standards, and better practices in data reporting and interoperability are needed across the board, in both industry and academia. This will allow sharing and reuse requirements to be met in relation to international chemicals management policies and sustainable development goals.
This report additionally considers documentation requirements to achieve FAIR sharing of chemistry data in ways that are Reliable, Interpretable, Processable, and Exchangeable (RIPE), and with minimal loss of quality. Increasing the level of consumable FAIR data depends upon documenting data upstream of sharing to ensure that meaning and quality can be assessed and reassessed appropriately. It is not enough for data objects to be accessible; data need to be accompanied by metadata which provide the contextual information required to enable interoperability and reuse. Fully articulating the scope, structure, and exposure of metadata is critical to enable broader technical mechanisms for programmatic data exchange. The RIPE framework can help research ecosystems across sectors to focus on information requirements, resources and practices required to facilitate provision of data that are mature for sharing, and fully AI-ready across a broad range of use cases. Consistent and comprehensive communication of existing and emerging standards and resources is an important priority to effectively address the challenges confronting meaningful and effective reuse of chemistry data.
Collectively, the chemistry community has over a century of experience in developing and refining standards for communicating high quality chemical information. Explorations undertaken within and alongside WP03 are helping to clarify where these fall short of FAIR ideals, and how we can advance in addressing more complex needs across chemistry and other disciplines. While we have many of the components needed, further refinement of current processes and tools are necessary to enable establishment and use of workflows for sharing quality chemical data, particularly in interdisciplinary contexts. The present focus on the FAIR data principles provides a framework to enable previously well-established chemistry standards to become accessible and applicable for automated programmatic reuse. We envisage this report as a living document evolving over the course of the project, as we further assess IUPAC digital standards to support FAIR chemical data sharing. Future sections are planned that will provide a Roadmap and a Sustainability Blueprint for standards development and adoption as part of our collective recommendations for supporting chemical data reporting policy and practice.
The primary target audience for this report is the range of professionals involved in building and managing systems and services that support process engineers, scientists and other researchers working with data. We will also reach those involved in information management and communication, including professionals in publishing houses, libraries, standards organisations and at other information resources. Additional audiences include chemists, data scientists and other researchers who are actively working with informatics and programmatic applications, and those who are in positions to influence policy that impacts chemical data reporting and exchange.
Other deliverables under development in WorldFAIR Chemistry (WP03) will further demonstrate and facilitate the use of chemistry data standards, including a digital cookbook of interactive recipes demonstrating how to handle chemical data (D3.2 Training package), and protocol specifications for exchanging chemical representations and other metadata via Application Programming Interface (API) services (D3.3 Utility services).
WP03 activities are coordinated through the International Union of Pure and Applied Chemistry (IUPAC), the world authority on chemical nomenclature and terminology that constitute a common global language for communicating chemistry. In the context of the formal IUPAC process for reviewing and adopting consensus standards in chemistry, this work should be regarded as provisional guidance. Complete review and adoption of standards through IUPAC to reach the status of “Recommendation,” which has a specific meaning in the IUPAC lexicon, will occur after WP03 is complete.
WorldFAIR Training Package: FAIR Chemistry Cookbook (Deliverable 3.2)
Chalk Stuart, Munday Sam, Kroenlein Ken, McEwen Leah, Mustafa Fatima
The International Union of Pure and Applied Chemistry (IUPAC) has initiated a community project through the WorldFAIR Initiative to develop an online community resource of practical and re-usable training materials that demonstrate how to manage digital data files and content. The overall goal is to get practical tools and tips in the hands of practicing chemists to lower barriers and smooth the adoption of best practices for sharing and re-using FAIR chemical data.
The purpose of this deliverable is to develop a digital web resource that will support various user groups in the chemical sciences and allied fields with training in the FAIR principles and machine-readable chemical data. This “Cookbook” serves as a toolbox of interactive recipes for implementing FAIR at various levels and for various user experience levels, ranging from educators who need demonstration resources for instruction, to students who learn by doing, to practitioners who need a quick orientation on a tool or resource.
This Cookbook is envisioned to serve a variety of users across a wide range of scientific domains and sectors. Commercial, academic, and educational institutions are anticipated to benefit. Researchers and development units, including scientists incorporating machine learning into their research and librarians supporting researchers, would find this resource especially helpful. The Cookbook’s broad applicability will be extremely helpful to journal editors, repository curators, systems and software developers, and anyone else handling chemical data.
This report describes an online cookbook platform that can be readily accessed with broadly available online infrastructure and exemplifies the FAIR principles and best practices in cheminformatics. Developing a sustainable infrastructure that invites practical contributions from chemists, data scientists, educators, and students worldwide enables IUPAC to leverage the collective expertise of the community in best practices for managing and reusing chemical data.
The overarching goal of the WorldFAIR Chemistry Work Package (WP03) is to support the use of chemical data standards in research and curation workflows and to enable downstream data reuse through provision of practical direction and resources. Other deliverables developed under the WorldFAIR Chemistry (WP03) case study include a framework that can be used by policymakers and developers of services and tools to support FAIR reporting of chemical data (D3.1 Digital guidance), and a specification for a shared data model for chemical information exchange (D3.3 Utility Services for Chemistry Standards).
WP03 activities are coordinated through IUPAC, the world authority on chemical nomenclature and terminology that constitute a common global language for communicating chemistry. In the context of the formal IUPAC process for reviewing and adopting consensus standards in chemistry, this work should be regarded as provisional guidance. Complete review and adoption of standards through IUPAC to reach the status of “Recommendation,” which has a specific meaning in the IUPAC lexicon, will occur after WP03 is complete. The IUPAC WorldFAIR project thanks all contributors to the original Cookbook project, listed in Appendix 8.1.
Digital Utility services for Chemistry Standards For Chemistry FAIR Data Policy And Practice (Deliverable 3.3)
Thiessen Paul, Bolton Evan, Williams Antony, McEwen, Leah Rae
The International Union of Pure and Applied Chemistry (IUPAC) has initiated a community project through the WorldFAIR initiative to define a common protocol for programmatic exchange of chemical representations. Representing chemical substances in the form of distinct chemical structures is fundamental to communicating chemical information. Validation of chemical structure description is an essential requirement for the re-usability of FAIR chemical data. The outcome of this work includes a specification that articulates a shared data model for chemical information exchange through an API that can be implemented by any system that manages chemical structure records. This deliverable outlines a conceptual framework and provides a demo prototype to engage community input and adoption.
This deliverable aims to describe criteria for web-based services that participating organisations can implement based on their existing and/or preferred technologies (e.g., toolkits, programming languages). The services are intended to confirm chemical identity and provide real time feedback on the machine-readability of chemical data and metadata representations based on IUPAC standard rule sets and community best practices. The goal is to support a range of stakeholders engaging in chemical data exchange online, including providers of chemical databases, curators of chemical repositories, chemistry application developers, chemical toolkit developers, and researchers sharing, searching and analysing chemical information programmatically. The initial specification focuses on resolving chemical entities and validating chemical structure representations.
The overarching goal of the WorldFAIR Chemistry Work Package (WP03) is to support the use of chemical data standards in research and curation workflows, between and across disciplines. This will enable downstream data reuse through provision of practical direction and resources. Other deliverables developed under the WorldFAIR Chemistry (WP03) case study further demonstrate and facilitate the use of chemistry data standards, including a framework that can be used by policymakers and developers of services and tools to support FAIR reporting of chemical data (D3.1 Digital guidance), and a digital ‘cookbook’ of interactive recipes demonstrating how to handle digital chemical data (D3.2 Training package).
WP03 activities are coordinated through IUPAC, the world authority on chemical nomenclature and terminology that constitute a common global language for communicating chemistry. In the context of the formal IUPAC process for reviewing and adopting consensus standards in chemistry, this work should be regarded as provisional guidance. Complete review and adoption of standards through IUPAC to reach the status of “Recommendation,” which has a specific meaning in the IUPAC lexicon, will occur after WP03 is complete.
This work was supported (in part) by the U.S. National Center for Biotechnology Information of the National Library of Medicine (NLM), U.S. National Institutes of Health.
This manuscript has been reviewed by the Center for Computational Toxicology and Exposure, United States Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The authors declare no conflict of interest.
Nanomaterials Domain-Specific FAIRification Mapping (Deliverable 4.1)
Authors: Lynch, Iseult; Afantitis, Antreas; Exner, Thomas; Papadiamantis, Anastasios
WP04 of WorldFAIR aims to increase the FAIRness (Findability, Accessibility, Interoperability and Reusability) of nanomaterials datasets and computational models. The initial focus is on toxicity and safety-related datasets as this is where the bulk of the effort has been to date. We note that nanomaterials are a very broad category of materials, combining chemicals and materials features, and overlapping strongly with the emerging domain of advanced materials. Nanosafety is a very broad domain, covering exposure, toxicity and risk assessment and requires extensive characterisation of the pristine (as-produced) nanomaterials and their physical, chemical, biological and macromolecular transformations within the various environments in which they are present. Thus, the positioning of nanomaterials between Chemistry (WP03) and Geochemistry (WP05) is intentional, as there are strong overlaps of concepts and approaches with both, and solutions applicable to one of these domains are likely to be applicable to others also, leading to the potential for mutual learning and accelerated implementation of the FAIR concepts across these domains.
This specific deliverable lays out the domain and its communities, and the various projects and contributors active in the FAIR-nanomaterials and nanosafety domain. It then presents our initial FAIR implantation Profile (FIP) which describes the current state of the field (an ‘As-Is’ FIP) and discusses the domain-specific challenges relating to nanomaterials and its FAIR landscape. The deliverable then lays out the developments needed to reach the ‘To-Be’ FIP, as the optimal approach to make nanomaterials and nanosafety data FAIR, based on current best practice across the FAIR community. We note that the As-Is FIP will be updated at the end of the WorldFAIR project, to capture: the rapid development in the field; our own efforts to enhance the number of domain-specific FAIR-enabling Resources (FERs) and FAIR-supporting resources (FSRs); FIPs underway in the aforementioned WorldFAIR Chemistry (WP03) and Geochemistry (WP05) case studies, as well as the efforts underway in the Partnership for Assessment of the Risks of Chemicals (PARC), which has a very strong FAIR data focus. Subsequent activities in WP04 will implement a case study to foster development and piloting of interoperability standards, and guidelines for increasing FAIRness in the interlinked scientific disciplines of chemical toxicity, nanomaterials toxicity and materials modelling.
The FAIR mapping represents a critical step towards identifying both the domain-specific features and the general features needed to maximise nanosafety data and model FAIRness, highlighting areas for further development and standardisation especially in the domain-specific aspects such as metadata standards and ontologies. Existing ontologies have major gaps in their semantic coverage, and project-specific terminology is not fully integrated/accessible via ontology look-up services. Building on the mapping and ‘As-Is’ FIP for nanosafety, WP04 will develop an index (registry) and workflows for FAIRification of nanoinformatics tools and models (D4.2), and recommendations for nanomaterials-specific human and machine-readable provenance and persistence policies (D4.3).
This nanomaterials FIP, and its subsequent iteration, form a critical part of the overarching WorldFAIR cross-domain mapping and indeed have already been integrated into D2.1 and D11.1.
WorldFAIR FAIRification of nanoinformatics tools and models recommendations (Deliverable 4.2)
Varsou Dimitra-Danai, Kolokathis Panagiotis, Tsoumanis Andreas, Melagraki Georgia, Exner Thomas, Papadiamantis Anastasios
Nanomaterials, with properties of both chemicals and particles, offer exciting opportunities in a range of industrial and consumer applications, from sensing and diagnostics to precision medicine and agriculture. Paradoxically, the properties that make then advantageous for applications, including their small size and large surface area, are also the source of concerns regarding potential negative impacts arising from their uptake by, and interactions with, humans and the environment. Given the enormous diversity of nanomaterials compositions, it is not possible to individually test them using the current time, cost and animal-intensive regulatory testing approaches, driving an urgent need for alternative in silico approaches to predict nanomaterials safety (nanoinformatics).
The nanomaterials safety community have been actively developing a range of modelling approaches, spanning from physics-based models to data-driven approaches including machine learning models. As these models utilise and generate extensive datasets, there is a requirement for good practice in data documentation to support model development. Additionally, the models and associated software need to be FAIR (findable, accessible, interoperable and re-usable). While there is much in common with the FAIR needs for software in chemoinformatics, there are some unique aspects to nanomaterials software (nanoinformatics) that require domain-specific tailoring.
This WorldFAIR Deliverable report, which is targeted towards nanoinformatics model developers, presents a set of recommendations and prototypes for FAIRification of nanoinformatics tools and models. The deliverable is a stand-alone document focused on FAIRification of nanoinformatics tools and software primarily, addressing also FAIRIfication of the underpinning (and resulting) datasets. Organisation of the datasets into ready-for-modelling formats, for example via NanoPharos, and use of KNIME nodes to integrate the datasets directly into the modelling software, and the resulting predictions and validation statistics back into the database for further re-use are also emphasised.
This report provides an analysis of the direction FAIRification of nanoinformatics software could/should take. The report provides examples of the approaches and best practice that have emerged from Horizon 2020-funded (H2020) nanosafety-specific projects including NanoCommons, NanoSolveIT, RiskGONE, CompSafeNano to support FAIR software. Approaches and best practice examples include the documentation of models and software via existing and emerging metadata standards, establishment of a registry of nanoinformatics models, deployment of predictive models as web applications or application programming interfaces and a demonstration of model interoperability and enhanced re-usability via containerisation and deployment via a cloud platform. Recommendations for next steps are provided to drive progress.
Nanomaterials Human / machine-readable provenance and persistence policies (Deliverable 4.3)
The WorldFAIR Nanomaterials case study (WP04) has addressed several important topics including undertaking a mapping of the FAIR landscape for nanomaterials (D4.1) and development of best practice for FAIR nanoinformatics models (D4.2). D4.3, presented here, complements and extends the previous two deliverables focussing in particular on Provenance and Persistence of nanomaterials data, considering both human and machine actionable aspects. D4.3 addresses, in particular, FAIR principle R1.2 whereby (Meta)data are associated with detailed provenance, which is essential to enable re-use of data as it provides assurances to potential re-users as to where the data came from, how it was generated and for what purpose, and FAIR principle A2: Metadata should be accessible even when the data is no longer available, which is an essential aspect of ensuring provenance and persistence of data and its associated metadata.
The deliverable builds on work and best practice from:
- FAIR experts and FAIR-focussed projects (e.g., FAIRsFAIR) on the role and importance of persistent identifiers, unique identifiers and resolvable identifiers (collectively persistent identifiers (PIDs), universal unique identifiers (UUIDs) or globally unique, persistent and resolvable identifiers (GUPRIs)) to support data provenance including a landscape mapping of the types of PIDs;
- research performed in the nanomaterials domain as a pilot project around persistent identifiers for nanomaterials themselves, bearing in mind their complexity (as both chemicals and particles) and dynamic nature whereby many of their properties are extrinsic and context-dependent;
- intensive discussions with other case studies from WorldFAIR, mainly Chemistry, Geochemistry, Biodiversity, Agricultural Biodiversity and Cultural Heritage; these were facilitated in part through the WorldFAIR week-long hackathon from 1-6 October 2023 focussed on the WorldFAIR Cross-Domain Implementation Framework (CDIF) to consider how our nanomaterials domain-specific solutions (such as the InstanceMap tool) can be extended to cover other domains, or mapped to the approaches used in other domains (such as biodiversity). This Dagstuhl Workshop created a unique opportunity for the case studies to present and discuss their individual approaches to data provenance related to events (e.g., sampling events, measurement events, data creation events), samples and occurrences, leading to a convergence in approach around the provenance ontology (PROV-O), the Global Biodiversity Information Facility (GBIF) New Data model, and the process for organising events and samples. In the nanomaterials context, we have also mapped an existing provenance ontology (PROV-O) and the GBIF approach to the community developed tools for capturing the evolution of nanomaterials along their lifecycle and in products, formalised via the Instance Map Tool (Exner et al., 2024).
While it has long been known that nanomaterials are very dynamic, this evolution of the materials during storage and sampling has not been systematically incorporated into materials provenance documentation which makes comparison of data challenging, as materials may have been handled differently and thus evolved differently, leading to different outcomes in terms of their toxicity.
The guidance and policy developed and presented here in WorldFAIR D4.3 will support increased implementation of best practice around complete documentation of provenance information about nanomaterials, their samples and the data arising from their production, use, testing, etc. These recommendations are being taken up by current nanosafety projects MACRAME, PINK, CHIASMA and INSIGHT, among others, and will be further documented as a workflow for creation of FAIR digital objects from nanomaterials datasets.
WorldFAIR Nanomaterials prototype (Milestone)
This Milestone (MS11) outlines a prototype implementation of support tools to allow FAIR data generation throughout various stages of the data production and management workflow in nanomaterials science using the FAIR maturity indicators as guiding principles.
WorldFAIR (MS14) Nanomaterials FAIRification demonstration
Guidelines on how to exploit cloud solutions including the use of containerisation as a means to enhance sustainability, interoperability and re-usability of nanoinformatics models were developed in the WorldFAIR WP04 – Nanomaterials case study and presented in Deliverable Report D4.2 – FAIRification of nanoinformatics tools and models recommendations. In this milestone report (MS14), we provide a worked example of how containerisation and deployment to cloud platforms are achieved in practice for nanoinformatics models.
Formalisation of OneGeochemistry (Deliverable 5.1)
Report on the formalisation of the OneGeochemistry CODATA Working Group.
Project Deliverable D5.1 for EC WIDERA-funded project “WorldFAIR: Global cooperation on FAIR data policy and practice”.
The WorldFAIR Geochemistry Work Package Deliverable 5.1 sets out to formalise the OneGeochemistry Initiative. With the exponential growth of data volumes and production, better coordination and collaboration is needed within the Earth and Planetary Science community producing geochemical data. The mission of OneGeochemistry is to address this need and in order to do so effectively the OneGeochemistry Interim Board has applied to become the OneGeochemistry CODATA Working Group. This application has been approved by the CODATA Executive Committee. The OneGeochemistry CODATA Working Group will be led by a chair and co-chair and will form expert advisory groups where required. Becoming a CODATA Working Group gives the OneGeochemistry Initiative credibility and authority to successfully pursue a long-term governance structure and accomplish the other WorldFAIR deliverables of WP05 (Geochemistry).
Accomplishing an outline of the methodology used to populate and update FAIR Implementation Profiles and to promulgate knowledge of them, as well as creating a set of guidelines for laboratories and repositories on how to use FAIR Implementation Profiles and common variables to QA/QC data, will enable FAIRer (Wilkinson et al., 2016) geochemical data, which will in turn make interdisciplinary use easier.
Geochemical data has direct application to six of the seventeen UN Sustainable Development Goals (SDG#6 (Clean Water and Sanitation); SDG#7 (Affordable and Clean Energy); SDG#8 (Decent Work and Economic Growth); SDG#9 (Industry, Innovation and Infrastructure); SDG#13 (Climate Action); SDG#15 (Life on Land) and FAIR geochemical data will accelerate the generation of new geoscientific knowledge and discoveries. Within the greater framework of the WorldFAIR project, this deliverable has come together in collaboration with CODATA (WP01 and WP02) and the International Union of Pure and Applied Chemistry (IUPAC, WP03).
Geochemistry Scientific Content Component (Milestone)
Prent, Alexander; Wyborn, Lesley; Farrington, Rebecca; Lehnert, Kerstin; Klöcking, Marthe; Elger, Kirsten; Hezel, Dominik NFDI4Earth; ter Maat, Geertje; Profeta, Lucia
WorldFAIR Milestone 6, reported here, specifies work done and being undertaken for Deliverable 5.2 (due month 20), ‘Geochemistry Methodology and Outreach’, which has the following description: “This deliverable will outline the methodology used to develop and update FIPs and promulgate knowledge of them, including publishers to ensure the quality, interoperability and reusability of data in publications”.
As geochemical data is collected on a diversity of natural and synthetic samples (rocks, sediments, minerals, fossils, meteorites, cosmic dust, fluids, gases, etc), from the Earth or other planetary bodies, there is an incredible range of analytical instruments used and hundreds of analytical techniques applied. This results in a community with many subdisciplines that produce typically ‘long tail’ data – data that are highly specific and small in volume. The community and the data produced are heterogeneous and overlaps of common minimum variables are scarce.
We conclude that developing a single FAIR Implementation Profile (FIP) for all geochemical data will not be possible; rather, there will need to be multiple linked FIPs for geochemistry subdisciplines and at multiple levels of granularity. As a FIP is underpinned by FAIR Enabling Resources (FERs), many such FERs need to be publicly available or need to be published. By specifying any FER(s) that accompany each FAIR principle within the individual FIP, users of any geochemical dataset/database will have accurate documentation for each FAIR Principles, and thus enhance machine readability.
This Milestone describes progress towards developing a methodology designed to assist in defining the individual FERs required to fully describe the minimum scientific and technical variables used to describe any geochemical analysis. These FERs will enable the generation of multiple FIPs, facilitating published results to be reproduced and shared globally with sufficient metadata to make any geochemical resource FAIR for both humans and machines.
This Milestone report then discusses how the components of this methodology are being executed in the community, discusses resulting progress towards minimum common variables of samples, discusses how to make best practices for geochemical methods available online and specifies a set of vocabularies published to describe methodologies.
Geochemistry Methodology and Outreach (Deliverable 5.2)
Together with the earlier WorldFAIR Milestone 6, this D5.2 report focuses on advocating the utility and significance of FAIR Implementation Profiles (FIPs) for the geochemistry community, culminating in presenting a set of policy and organisational recommendations. The primary goal of this report is to foster alignment across the complex and heterogeneous geochemistry community, in producing and integrating FAIR data for the huge diversity of sample types and target analytes of this community, each often having numerous analytical methods. This document presents various ways in which the community can increase FAIRness through the publication of FERs for different levels of data granularity and FAIR community size and complexity (Figure 2). Additionally, interoperability of data between methodologies is suggested to be overcome through data abstraction (Box 1).
Following the FIP methodology, this D5.2 report makes reference to the fifteen FAIR Principles, divided into scientific and technical components. Scientific component implementations, and related community engagement, are to be based on best practice publications that outline data reporting and methodology descriptions from within specific geochemistry sub-disciplines. Parts of these publications, including tables and images in PDF or document formats, could be converted into machine actionable FAIR-enabling resources (FERs), and be part of a generic FIP for geochemistry. Technical components need to be generated, reviewed and assessed by geochemistry data infrastructure and repository technical staff, along with the development of additionally needed FERs in consultation with other FAIR data management expert groups (e.g., CODATA-DDI Alliance activity, the DDI-CDI group, the RDA Vocabulary Services Interest Group, IUPAC, etc.) and the “Ten Simple Rules for making a vocabulary FAIR” (Cox et al. 2021).
This report is the result of interactions with the geochemistry community through the OneGeochemistry Initiative, its board members, research infrastructure experts, analytical facilities and international leaders in geochemistry data management systems (EarthChem, DIGIS-GEOROC, AGN–AusGeochem, GFZ Data Services, NFDI4Earth, and EPOS MSL Laboratories).
Guidelines for implementing Geochemistry FIPs (Deliverable 5.3)
As a long-tail scientific discipline with highly specific and heterogeneous analytical methods, the geochemistry community faces challenges in achieving FAIR data compliance. While many repositories satisfy the Findable and Accessible principles of FAIR, increased modernisation of existing standards and development of additional data standards are required to achieve Interoperability and Reusability of data.
This third deliverable of the WorldFAIR Geochemistry Work Package (WP05) aims to guide the geochemistry data infrastructure community towards convergence by identifying FAIR Enabling Resources (FERs) that are currently being used by the geosciences community. Promulgation of used resources and their uptake by other infrastructure providers is part of the push towards convergence. The WorldFAIR Geochemistry Work Package proposes creating a reference FIP or catalogue of FERs to promote interoperability and prevent duplication of efforts. The geochemistry reference FIP is designed as a living document, allowing continual updates by the community. It serves as a tool for laboratories, repositories, and infrastructure providers to enhance data FAIRness. Through the provision of a reference FAIR Implementation Profile (FIP) or catalogue of FERs as part of this report, data providers and producers are provided with a tool to help select FERs when building or updating their infrastructure to become more FAIR.
Together with the second deliverable (D5.2) of the WorldFAIR Geochemistry Work Package which outlined the usefulness and importance of FIPs, this report and the associated reference FIP can be used by the geochemistry community – particularly by data creators and providers – to improve their FAIRness. We recommend that new and emerging geochemistry data producers and providers consult the geochemistry reference FIP and ideally choose to implement existing FERs, although the selection and implementation of FERs should align with the principles and community needs that the specific data system serves.
The goal is to facilitate the implementation of commonly-used FERs, and so improving data FAIRness, with a resource that fosters interoperability, accelerates convergence on data standards, and ultimately enhances the accessibility and reusability of geochemical data. This report and the reference FIP aim to encourage the reuse of available resources, prevent duplication, and enhance convergence on data standards within the geochemistry community. Community collaboration, the continuous evolution of the living reference FIP document to support FAIR compliance and convergence towards standardisation are needed to continue improving FAIRness in the geochemistry data community.
Cross-national social sciences survey FAIR implementation case studies (Deliverable 6.1)
McEachern, Steven; Orten, Hilde; Thome Petersen, Hanna; Perry, Ryan
This report provides an overview of the data harmonisation practices of comparative (cross-national) social surveys, through case studies of: (1) the European Social Survey (ESS) and (2) a satellite study, the Australian Social Survey International – European Social Survey (AUSSI-ESS). To do this, we compare and contrast the practices between the Australian Data Archive and Sikt.no, the organisations responsible for the data management of ESS and AUSSI-ESS.
The case studies consider the current data management and harmonisation practices of study partners in the ESS, including an analysis of the current practices with FAIR data standards, particularly leveraging FAIR Information Profiles (FIPs) and FAIR Enabling Resources (FERs).
The comparative analysis of the two case studies considers key similarities and differences in the management of the two data collections. Core differences in the use of standards and accessible, persistent registry services are highlighted, as these impact on the potential for shared, integrated reuse of services and content between the two partner organisations.
The report concludes with a set of recommended practices for improved management and automation of ESS data going forward—setting the stage for Phase 2 of WorldFAIR Work Package 6—and outlines the proposed means for implementing this management in the two partner organisations. These recommendations focus on three areas of shared interest:
Aligning standards
Establishing common tools
Establishing and using registries
in order to advance implementation of the FAIR principles, and to improve interoperability and reusability of digital data in social sciences research.
Cross-national Social Sciences survey best practice guidelines (Deliverable 6.2)
McEachern, Steven; Orten, Hilde; Perry, Ryan; Strand, Kristina
The Social Surveys Work Package (WP06) of the WorldFAIR project is focussed on the improvement of FAIR practices in the management of harmonised content in cross-national social surveys. The first report from the Work Package (Deliverable 6.1) provided an overview of the practices of comparative (cross-national) social surveys, through case studies of: (1) the European Social Survey (ESS) and (2) a satellite study, the Australian Social Survey International – European Social Survey (AUSSI-ESS). The focus of this Deliverable 6.2 is oriented towards progress on three recommendations (Rs) from that first report – the use of the DDI Lifecycle and variable cascade (R6.1 and R6.2), and requirements for formal registries of variables and reusable content (R6.5). To achieve this, this paper explores the development of recommended practices for the management and processing of cross-national survey data for the establishment of harmonised social science datasets.
In this deliverable, we outline a proposed workflow for the processing of data harmonisation of social surveys, that takes account of the practical steps required to bring diverse content together in a machine-actionable way, and that could best take advantage of external registered, persistent content. This workflow considers the core steps involved in the harmonisation process, key issues that occur in the processing of data during this process, and potential resolutions of these issues. These resolutions are all oriented towards improving FAIR practices in the harmonisation process – through the use of reusable, accessible metadata structures that can both improve processing consistency for current projects, and be applied to future harmonisation projects.
The key conclusions are two-fold. Firstly, there is a key need for the application of standardised workflows to enable consistent interaction with registry content held across multiple data repositories. The proposed workflow detailed in this report is a first effort at such a workflow model. Secondly, there is a need for consistent pre-processing of data and metadata within repositories to reduce error handling in the harmonisation process. The final section of the report provides an initial set of processing rules that could be used in such circumstances. The final third phase of this Work Package will then focus on testing this workflow and rules on new waves of data coming from the AUSSI and ESS projects.
WorldFAIR Pilot Testing Harmonisation Workflows (Deliverable 6.3)
McEachern Steven, Orten Hilde, Perry Ryan, Strand Kristina
The Social Surveys Work Package (WP06) of the WorldFAIR project is focussed on the improvement of FAIR practices in the management of harmonised content in cross-national social surveys. The first report from the Work Package (Deliverable 6.1) provided an overview of the practices of comparative (cross-national) social surveys, through case studies of: (1) the European Social Survey (ESS) and (2) a satellite study, the Australian Social Survey International – European Social Survey (AUSSI-ESS). The focus of Deliverable 6.2 was on three recommendations (Rs) from that first report – the use of the DDI Lifecycle and variable cascade (R6.1 and R6.2), and requirements for formal registries of variables and reusable content (R6.5). In Deliverable 6.2, we outlined a proposed workflow for the processing of data harmonisation of social surveys that takes account of the practical steps required to bring diverse content together in a machine-actionable way, and that could best take advantage of external registered, persistent content. This workflow considers the core steps involved in the harmonisation process, key issues that occur in the processing of data during this process, and potential resolutions of these issues.
This third report then picks up from the previous two to test out proof-of-concept implementations of the workflows outlined in the second WP report, to trial the use of standardised workflows based on registry services available at the Australian Data Archive (ADA) and Sikt through their respective Colectica registries. The workflow steps are piloted with another comparative social survey, the International Social Survey Program (ISSP), to evaluate the Cross-Cultural Survey Harmonisation workflow as a suitable process for machine-to-machine based survey harmonisation.
The pilot demonstrated that the CCSH workflow established in Deliverable 6.2 and piloted here in Phase 3 of Work Package 6 appears to be a viable method for standardising and progressively automating the process of survey data harmonisation. The six-step workflow, based on well-established procedures in cross-cultural survey data, has been shown to be a suitable means of engaging with both human and machine-mediated processes. The development of the workflow based on Sikt and ADA processes for managing the harmonisation of ESS and AUSSI-ESS, and the piloting of the workflow with the similarly structured ISSP survey data. This suggests that the workflow itself may be well suited to projects of this type.
Having said this, the workflow pilot also shows that there is still a significant degree of human manual input required. To this end, the following recommendations are proposed coming out of this Phase 3 work:
- Establishment of standardised access controls both to data and metadata registries, to limit the need for less technical users to navigate access control systems
- Establishment of a code repository for interaction with social science metadata repositories.
- Establishment of mechanisms for reuse of conceptual variable and other reference metadata across the DDI standards ecosystem. (It was not clear for example how to use or reference a conceptual variable in the Sikt ESS metadata registry within the ADA harmonisation tool)
- Standardised practices and code libraries for the creation of DDI resource packages for external reuse (to facilitate the reuse in Recommendation 3).
Population Health Data Implementation Guide (Deliverable 7.1)
Gregory, Arofan; Todd, Jim; Amadi, David; Greenfield, Jay; Muyingo, Sylvia; Tomlin, Keith
One of the key requirements for FAIR data reuse is that the user of a FAIR data resource understands the exact nature of the data. The FAIR principles talk about the kinds of metadata needed to describe data, but it is necessary for implementers to understand how these metadata can be provided, to effectively realise FAIR within their systems. This implementation guide describes the way all aspects of the data are made available for use, both within and from outside the INSPIRE Network community, using standard metadata to describe the data. This is an exploration of how generic standards can be used to express the agreed community metadata set. The INSPIRE platform supports network studies using population health data to stand up their own instances of a common data model called the OMOP CDM. The WorldFAIR project is an exploration to facilitate a better understanding of what is needed for data infrastructures to provide data in line with the FAIR principles within and across domains.
The types of metadata used in INSPIRE are aligned as much as possible with existing and popular models common in the public health domain. Primary among these are the standards (and tools) coming from OHDSI (Observational Health Data Sciences and Informatics), notably their OMOP Common Data Model (CDM). This suite of products addresses the definition of specific concepts and their semantics, standard (primarily medical) classifications, and the mechanism for selecting data from among those available to produce a specific cohort for analysis. These standards are common within the public health domain internationally, and INSPIRE has chosen to use them to reduce the significant cost of developing tools for many aspects of data and metadata management and use.
FAIR demands that we provide data in a useful way to those who may not be familiar with the community tools and standards used by INSPIRE. More generic standards are thus needed to support this broader community. It is significant that members of the OHDSI community have already looked at how Schema.org – developed and supported by many popular search engines, Google foremost among them – can be used in combination with the OHDSI OMOP CDM to describe data resources. Here, INSPIRE builds on that work to describe how INSPIRE data resources, specifically, can be documented in a way which will be maximally accessible to users both within the community and external to it.
One critical part of the overall information set provided by standard FAIR metadata is a description of the experiment for which the data was used, and the protocol employed in the selection and analysis of the data. This aspect of the metadata description is a major focus of the implementation guide, and one for which Schema.org would seem to be well-suited.
WorldFAIR WP (Work Package) 07 is one of eleven domain-specific case studies being undertaken by the WorldFAIR project, with the domain-specific practices being analysed across these domains in WP02. Early indications from WP02 suggest that Schema.org is one of the standards which will be recommended as part of the Cross-Domain Interoperability Framework (CDIF). This implementation guide contributes to an understanding of exactly how Schema.org fits into the description of domain data.
While some open questions remain, the implementation guide has achieved its primary goal of showing how standards such as Schema.org can be used within the public health domain to provide a complete set of the information needed for FAIR data use across and within domain boundaries.
Population health resource library and training package (Deliverable 7.2)
Authors: Todd Jim, Tomlin Keith, Bhattacharjee Tathagata, Amadi David, Greenfield Jay, Fils Doug, Mailosi Dorothy, Kanjala Chifundo, Molloy Laura
This project, WorldFAIR – Global Cooperation on FAIR Data Policy and Practice, is funded by the European Commission’s WIDERA coordination and support programme under the Grant Agreement no. 101058393. The project consists of 14 work packages, of which work package 7 (WP07) focusses on Population Health. WP07 is led by London School of Hygiene and Tropical Medicine working under the INSPIRE network. The work builds on the delivery of the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) which includes funding by Wellcome (formerly Wellcome Trust) and IDRC Canada.
The objective of WP07 is to develop a suite of methods and standards to provide the framework for the Go-FAIR principles for population health data. These standards form the basis of an AI-Ready description of data suitable for use by population health scientists, and understandable across domain and institutional boundaries. The first deliverable (D7.1) identified the Implementation Guide that could be used for population health data, and how it can be developed. This deliverable (D7.2) provides a step-by-step guide as to how to achieve the standards. The deliverable is aimed at population health scientists in low-resource settings, who know their own data and want to make those data FAIR.
Population health uses many different tools to collect and manage data. One set of tools includes the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and an OHDSI data analysis workbench that runs on top of it. The OMOP common data model has been used to harmonise and share data, and previous work has shown the tools needed to make OMOP data FAIR. Beyond the data themselves, the results from the analyses conducted on OMOP data can be used as indicators for the success of development goals, including the United Nations Sustainable Development Goals (SDGs). At each stage the tools, data, models and activities need to be described in a way that can be understood by other scientists and by computer search algorithms.
This deliverable provides an introduction to the processes involved in making population health data FAIR in a pipeline that spans data collection through data analysis into an SDMX indicators database, and gives seven tutorials on what is needed at each step in this pipeline. It outlines the need to describe the study and the study context, how to use DDI Codebook and DDI Lifecycle with study data and how to use repositories like GitHub to make the metadata available. The next tutorials describe the extract-transform-load (ETL) process for putting the data into an OMOP CDM and the role of JSON-LD in preparing the data for machine searching in Schema.org in line with DDI-CDI.
Together these tutorials give an overview of the steps in the OMOP processes which are a pipeline for the data, and how these steps can be performed and documented. Finally the tutorials show how predictive and causal analysis can be conducted and documented using the OMOP CDM and the OHDSI data analysis workbench and how the results can be integrated into an SDMX data cube, which would align with UN standards for SDG indicators.
The deliverable does not provide detailed training for each step, but rather introduces the topic and clarifies the practical knowledge and skills that are needed to make this type of health data more FAIR.
Note: Six of the tutorials are hosted on the WorldFAIR Vimeo channel, https://vimeo.com/user/91439529/folder/18642763, which provides functionality for playing videos. Alternatively you can download the tutorials and play in your local environment to experience full functionality. The remaining tutorial is accessible via this report.
Population Health Data Policy and practice recommendations (Deliverable 7.3)
This is the third and final deliverable from WorldFAIR WP07 on Population Health. Its primary aim is to provide a broad set of practical recommendations for health data producers in low- and middle-income countries who are seeking to make their data FAIR. This includes recommendations for: data documentation methods for those new to the process; harmonisation of data structures through use of a Common Data Model; standards for the publication of rich machine-readable metadata; methods for developing study packages to conduct federated analyses within and across domains; and the publication of FAIR Implementation Profiles.
This document is a synthesis of the discovery and work described in the two previous WP07 deliverables. As a reflection of the environment in which WP07 works, the methods proposed are open-source, freely available and frequently used. Some of them, such as the OHDSI/OMOP Common Data Model and the schema.org metadata standard, have grown in popularity in high-income settings and are underpinned by active user networks and freely available teaching resources. Their promotion here supports the argument for a coherent global standardisation of FAIR data methods, and this consistency promotes capacity building of the required data skills in lower-income settings that are integral to the adoption and success of FAIR data aspirations world-wide.
As well as reinforcing compatibility within the health data domain, WP07 has also chosen standards and methodologies that are compatible with other domains that comprise the WorldFAIR project. This approach is expanded in detail within WorldFAIR’s WP02 Cross-Domain Interoperability Framework (CDIF) which recognises that true interoperability of data – the ability to combine them in a scientifically rigorous way within and across domains – can dramatically enhance their value.
The intersections between population health, climate change and humanitarian crises are already acknowledged and within WorldFAIR itself: WP07 has clear links with urban health (WP08), social surveys (WP06), and disaster risk reduction (WP12). Such commonality in FAIR approaches across domains lies at the heart of what WorldFAIR seeks to achieve.
These recommendations are written for all health data producers in lower- and middle-income countries, regardless of their stage on the FAIR journey. Some may have little or no experience of data documentation; others may be ready to adopt a Common Data Model or publish machine-readable metadata; whilst others may be ready to conduct federated analyses. This document suggests recommendations for all, and these recommendations are consistent with those that would be made in higher-income settings.
The closing part of this document explores the burgeoning field of FAIR metrics – tools for measuring the extent to which FAIR has been achieved. This document does not add to this body of work because it takes the view that aiming for the FAIR principles by a data producer is such an individual and iterative process that following these recommendations, rather than applying an externally-generated metric, is more likely to achieve the goal of making health data FAIR.
Urban Health Data – Guidelines And Recommendations (Deliverable 8.1)
Ortigoza, Ana
This report provides a summary of actions and findings of the Urban Health Mapping and Assessment (Task 8.1) for WorldFAIR Work Package 08. Firstly, we assessed the implementation of FAIR principles within the Urban Health field, through two case studies: 1) the elaboration of a web data platform for the SALURBAL project (Urban Health in Latin America); and 2) the elaboration of a FAIR Implementation Profile (FIP) for the Urban Health discipline in general. Secondly, we focused on the data collection and harmonisation process of health survey data that was led by the SALURBAL team and allowed the elaboration of consensus on terminologies and procedures that facilitates the use of survey health data in cities for research and action.
The FAIR Implementation Profile for the SALURBAL case study contributed to the renovation process that the data system was undergoing, offering valuable guidance on good practices currently possible for making data FAIR. The elaboration and documentation of standard procedures used in data and metadata identification for the SALURBAL web platform not only contributed to their findability (‘F’) but also the access and reuse of the data (‘A’, ‘R’). For the Urban Health discipline, the FAIR Implementation Profile shed light on the lack of a common repository for urban health data, and showed that most urban health data should be encoded following DDI standards. It also showed the inconsistent process of identifying data and metadata used in urban health. Lessons learned during the process support recommendations towards 1. the promotion of a deeper understanding of the FAIR principles among urban health researchers and practitioners; and 2. the systematisation of a data management plan during the design and initial steps of a research project that can guide the implementation of FAIR principles across different domains and working groups. The results of the SALURBAL (WorldFAIR WP08) FIP were used to create a web resource (A FAIR Primer) which contextualises the activity, provides information about the FAIR principles, relays some of the FIP’s findings and provides guidance for making SALURBAL data more FAIR. This resource is included here as Appendix A.
Regarding health survey data, we found challenges in the harmonisation process that may be difficult for the use and interoperability of data between countries and within countries across time such as 1. disagreement in the definition of risk factors; 2. lack of consistency in categories or measurement units used for an indicator; 3. discrepancy in scales and questionnaires used for retrieving information about similar health behaviours or health outcomes. We leveraged the SALURBAL experience during this harmonisation process to propose a guideline for future harmonisation in health survey data in which the main recommendations are focused on the need to 1. generate consensus on definition and measurements in health data; and 2. revisit questionnaires and scales commonly used for some health behaviours and establish commitment on common uses.
The development of these deliverables for Task 8.1 made visible the gaps and needs of FAIR implementation in the urban health research community. Consequently, we will design and develop dissemination and training materials that can support and guide research and practitioners
Urban Health Data: Learning and training workshop (Milestone)
Authors: Ortigoza Ana, Bilal Usama, Li Ran, Anderson Theresa
The purpose of WorldFAIR WP08 is to advance the FAIR and CARE principles [Wilkinson 2016; Russo Carroll 2021] in the practices of Urban Health research. To do this, the WP is developing guidelines and training in best practice. Interim guidelines were developed and presented in D8.1 “Urban Health Data – Guidelines and Recommendations”. These guidelines and other work then informed the development of training materials for the course “How to create a data management plan with CARE and FAIR: introduction to essential concepts and good practices in data management and sharing for researchers and practitioners in urban/ public health” which was delivered as a special component of the Summer Course series at the Urban Health Collaborative – Drexel University during the week of 26-30 June 2023.
The present Milestone (WorldFAIR MS 12) provides information about the course for which the training materials were developed. The materials used in the workshop are available at https://zenodo.org/records/10231138. The materials will be further developed and refined through a consultation workshop and presented in D8.2 as the final version of the training materials and the documentation guidelines.
Urban health data – learning and training (Deliverable 8.2)
WorldFAIR Deliverable 8.2 “Urban Health Data – Learning and training” aims to describe efforts from WorldFAIR WP08 to create a community of practice in FAIR and CARE principles for urban health, train this community of practice, and show improvements in our own implementation of the FAIR principles in our work. This deliverable has two key parts: 1) describing a training course we developed in June 2023 on FAIR (Findable, Accessible, Interoperable and Reusable) and CARE (Collective Benefit, Authority to control, Respect, and Ethics) principles and an activity we are conducting in May 2024 to bring together the local FAIR/CARE community in Philadelphia (and beyond); and 2) an updated FAIR implementation profile as part of the SALURBAL (Salud Urbana en América Latina – Urban Health in Latin America) project.
In the first part, we describe our inaugural course, “How to create a data management plan with CARE and FAIR”, conducted 26-30 June 2023 with ten participants. The course aimed to bridge gaps in understanding the FAIR and CARE principles. We found an increase in participants’ knowledge and emerging interests in the topics through pre- and post-course surveys. The course encountered challenges in exemplifying the FAIR and CARE principles in practical urban health settings, highlighting a need for improved materials that clearly illustrate how to operationalise these principles. Moving forward, the Urban Health Collaborative is organising a series of activities in May 2024 to continue developing a community of practice around these principles. These include roundtable discussions, webinars, and interviews with project leaders, aimed at refining training materials and fostering a culture of continued education and support in data management and stewardship. This will include a keynote lecture by Dr. Theresa Anderson that will be livestreamed to the WorldFAIR community.
Our updated FAIR Implementation Profile describes our efforts to harmonise multi-country urban health data, aiming to create a machine-actionable resource that aligns with the FAIR principles. This initiative represents a critical step in addressing the data management complexities of urban health research, offering a pragmatic approach to the harmonisation of extensive datasets across various countries and domains. We summarise the project journey through the development and implementation of a data engineering initiative. We detail the use of data engineering techniques such as Dimensional Modelling and One Big Table to manage and organise data, making it accessible and efficient for analysis. In updating upon WorldFAIR Deliverable 8.1 here, we connect the project’s specific challenges to the broader goal of establishing a robust, FAIR-compliant data infrastructure in urban health research. Finally, we introduce the SALURBAL portal, a user-friendly interface designed to facilitate public access to the project’s harmonised data. This portal is expected to serve as a model for similar initiatives, promoting the principles of FAIR data management and underlining the potential benefits of such approaches in urban health and beyond.
Data Standard For Sharing Ecological And Environmental Monitoring Data Documented For Community Review (Deliverable 9.1)
Authors: Miller, Joe; Robertson, Tim; Wieczorek, John
Deliverable 9.1 for the WorldFAIR Project’s Biodiversity Work Package (WP09). Biodiversity standards are essential for FAIR data, in particular for interoperability. Current standards need to be improved with new data models to better reflect the complexity of biodiversity and serve the information needed to address biodiversity loss and climate change.
This Deliverable D9.1, focused on Task 9.1, describes the FAIR data model being developed in WorldFAIR WP09 with the Global Biodiversity Information Facility (GBIF) leading a community collaboration.
Facilitated by the WorldFAIR project, GBIF’s engagement with the biodiversity community has led to this Deliverable – a new draft core Unified Model. The model was developed in collaboration with the Biodiversity Information Standards Group (TDWG) and through community consultation via webinars, open drafting of documents and solicitation of test datasets from the various stakeholders. A review of comparable standards has led to the development of a draft core model framework that is known to the community which should make adoption easier.
The new model is centred around the ‘Event’ – something happened at some place during some period of time, optimally described by a protocol. This conclusion is based on research which describes how successful models are expressed and the flexibility of the Event to accommodate many types of data. The Unified Model is applicable to all currently-used data types and potential new data to be shared with GBIF. The current community engagement approach (more on which in D9.2) is to test individual components of the model with engagement activities and example datasets. This will continue with new tests expanding the potential utility of the model.
The WP tasks performed to date (collation of previous material, engagement with TDWG and Darwin Core (DwC) standard leads, webinars, use of shared documents to build use cases, building of exemplar datasets for collection management systems, and provision of several avenues for feedback) have resulted in the new provisional model currently under consultation. Testing and community engagement to date indicates that the model will better reflect the complexity of biodiversity data leading to more efficient use of our community’s data in research and policy. The feedback we have received also indicates a steep learning curve for the future implementation of the data model. This feedback is essential for the development of publishing tools.
This work aligns with the overall objectives of WorldFAIR by focused development on improving the interoperability of biodiversity data. Our FAIR Implementation Profile (FIP) will be enhanced by this improved functionality. This work promotes cross-domain interaction, as the Unified Model will enhance sharing of data in related Work Packages such as Agricultural Biodiversity, Oceans, and Geochemistry in the final portion of the WorldFAIR grant period. This work has been undertaken in alignment with the overall WorldFAIR goals, in particular WP02 on Engagement, Synthesis, Recommendations and FAIR Assessment.
Community consultation and finalisation of Biodiversity FAIR data impact: Final data model and training materials completed and shared (Deliverable 9.2)
Authors: Miller Joe, Robertson Tim, Wieczorek John
Biodiversity standards are essential for FAIR data, in particular for interoperability. Current standards need to be improved with new data models to better reflect the complexity of biodiversity and serve the information needed to address biodiversity loss and climate change. This Deliverable D9.2 is focussed on the results of Task 9.2: ‘Community consultation and finalisation of Biodiversity FAIR Data Impact’.
Facilitated via WorldFAIR, Global Biodiversity Information Facility (GBIF)’s engagement with the biodiversity community has led to this Deliverable – a new community-approved data standard that has progressed through a long community-led process. This Deliverable details the importance of collaboration and getting broad uptake of a new standard by the primary user community. This aligns with the overall objectives of WorldFAIR by focusing development on improving the interoperability of biodiversity data. Our FAIR Implementation Profile (FIP) will be enhanced by this improved functionality. This work promotes cross-domain interaction as the Unified Data Model will enhance sharing of data in related Work Packages such as Agricultural Biodiversity, Oceans, and Geochemistry in the final portion of the WorldFAIR grant period. This work has been undertaken in alignment with the overall WorldFAIR goals, in particular WP02 on Engagement, Synthesis, Recommendations and FAIR Assessment.
The new model, CamTrap DB, has been incorporated into the new and evolving GBIF Unified Data Model (see WorldFAIR D9.1). CamTrap DB is an important update to the previous camera trap data models. The use is growing of camera traps to non-invasively monitor biodiversity. Cameras are set in nature and take images of organisms with the resulting images being processed into GBIF occurrences – the evidence of a species at a particular place and time. Technologies have greatly improved in recent years and a plethora of new camera traps have been deployed, alongside the advent of AI technologies to more easily and rapidly identify organisms on the camera trap images. Camera traps are set to become major tools for biodiversity monitoring. In order to meet these goals, the image data management needed improvement: data management rather than data collection is now the limiting factor.
GBIF, through WorldFAIR, helped the camera trap research community rapidly take the updated model and facilitate adoption of best practices and the development of new GBIF infrastructure for FAIR data publication. This work is now adopted as part of the GBIF Unified Data Model. This work is an example of a research infrastructure facilitating a community standard development and making sure their best practices are codified and used within the research infrastructure.
Agriculture-related pollinator data standards use cases report (Deliverable 10.1)
Trekels, Maarten; Pignatari Drucker, Debora; Salim, José Augusto; Ollerton, Jeff; Poelen, Jorrit; Miranda Soares, Filipi; Rünzel, Max; Kasina, Muo; Groom, Quentin; Devoto, Mariano
Although pollination is an essential ecosystem service that sustains life on Earth, data on this vital process is largely scattered or unavailable, limiting our understanding of the current state of pollinators and hindering effective actions for their conservation and sustainable management. In addition to the well-known challenges of biodiversity data management, such as taxonomic accuracy, the recording of biotic interactions like pollination presents further difficulties in proper representation and sharing. Currently, the widely-used standard for representing biodiversity data, Darwin Core, lacks properties that allow for adequately handling biotic interaction data, and there is a need for FAIR vocabularies for properly representing plant-pollinator interactions. Given the importance of mobilising plant-pollinator interaction data also for food production and security, the Research Data Alliance Improving Global Agricultural Data Community of Practice has brought together partners from representative groups to address the challenges of advancing interoperability and mobilising plant-pollinator data for reuse. This report presents an overview of projects, good practices, tools, and examples for creating, managing and sharing data related to plant-pollinator interactions, along with a work plan for conducting pilots in the next phase of the project.
We present the main existing data indexing systems and aggregators for plant-pollinator interaction data, as well as citizen science and community-based sourcing initiatives. We also describe current challenges for taxonomic knowledge and present two data models and one semantic tool that will be explored in the next phase. In preparation for the next phase, which will provide best practices and FAIR-aligned guidelines for documenting and sharing plant-pollinator interactions based on pilot efforts with data, this Case Study comprehensively examined the methods and platforms used to create and share such data. By understanding the nature of data from various sources and authors, the alignment of the retrieved datasets with the FAIR principles was also taken into consideration. We discovered that a large amount of data on plant-pollinator interaction is made available as supplementary files of research articles in a diversity of formats and that there are opportunities for improving current practices for data mobilisation in this domain. The diversity of approaches and the absence of appropriate data vocabularies causes confusion, information loss, and the need for complex data interpretation and transformation. Our explorations and analyses provided valuable insights for structuring the next phase of the project, including the selection of the pilot use cases and the development of a ‘FAIR best practices’ guide for sharing plant-pollinator interaction data. This work primarily focuses on enhancing the interoperability of data on plant-pollinator interactions, envisioning its connection with the effort WorldFAIR is undertaking to develop a Cross-Domain Interoperability Framework.
WorldFAIR Agricultural Biodiversity Standards, Best Practices and Guidelines Recommendations (Deliverable 10.2)
Drucker, Debora et al.
The WorldFAIR Case Study on Agricultural Biodiversity (WP10) addresses the challenges of advancing interoperability and mobilising plant-pollinator interactions data for reuse. Previous efforts, reported in Deliverable 10.1 – from our discovery phase – provided an overview of projects, best practices, tools, and examples for creating, managing and sharing data related to plant-pollinator interactions, along with a work plan for conducting pilot studies. The current report presents the results from the pilot phase of the Case Study, which involved six pilot studies adopting standards and recommendations from the discovery phase. The pilots enabled the handling of concrete examples and the generation of reusable materials tailored to this domain, as well as providing better estimates for the overall costs of adoption for future projects.
Our approach for plant-pollinator data standardisation is based on the widely-used standard for representing biodiversity data, Darwin Core, developed and maintained by the Biodiversity Information Standards (TDWG), in conjunction with a data model and vocabulary proposed by the Brazilian Network of Plant-Pollinator Interactions (REBIPP). The pilot studies also underwent a process of “FAIRification” (i.e., transforming data into a format that adheres to the FAIR data principles) using the Global Biotic Interactions (GloBI, Poelen et al. 2014) platform. Additionally, we present the publishing model for Biotic Interactions developed in collaboration with the Global Biodiversity Information Facility (GBIF), which leads the WorldFAIR Case Study on Biodiversity, as part of the proposed GBIF New Data Model, along with a concrete example of its use by one of the pilots. This effort led to the development of ‘FAIR best practices’ guidelines for sharing plant-pollinator interaction data. The primary focus of this work is to enhance the interoperability of data on plant-pollinator interactions, aligning with WorldFAIR efforts to develop a Cross-Domain Interoperability Framework. We have successfully promoted the adoption of standards and increased the interoperability of plant-pollinator interactions data, resulting in a process that allows for tracing the provenance of the data, as well as facilitating the reuse of datasets crucial for understanding this essential ecosystem service and its changes due to human impact.
Our effort demonstrates there are several possible paths for FAIRification, tailored to institutional needs, and we have shown that different approaches can contribute to promoting data interoperability and data availability for reuse, which is the ultimate goal of this initiative. Consequently, we have successfully ensured FAIR data for understanding plant-pollinator interactions at biologically-relevant scales for crops, with broad participation from initiatives in Europe, South America, Africa, North America, and elsewhere. We have also established concrete guidelines on FAIR data best practices customised for pollination data, metadata, and other digital objects, promoting the scalable adoption of these standards and FAIR data best practices by multiple initiatives. We believe this effort can assist similar initiatives in adopting interoperability standards for this domain and contribute to our understanding of how plant-pollinator interactions contribute to sustain life on Earth.
WorldFAIR Agricultural Biodiversity Standards, Best Practices and Guidelines Recommendations: Tutorial (Deliverable 10.2)
Gonzalez-Vaquero Rocio Ana et al.
Although plant-pollinator datasets have become available worldwide, most of these datasets are published as supplementary files of research articles in a diversity of formats (WorldFAIR Deliverable 10.1, Trekels et al. 2023). In these cases, the FAIR principles (Findable, Accessible, Interoperable, and Reusable data, Wilkinson et al. 2016) are not fulfilled, which limits data retrieval and their use in subsequent integrative studies. Especially for datasets published as interaction networks, information regarding each interaction and species traits is still sparse, despite the efforts of numerous initiatives (Salim et al. 2022).
Currently, the most used standard to share biodiversity data is Darwin Core (DwC, https://dwc.tdwg.org), which is a set of terms that allow capturing information about events, taxa and their occurrence in nature. The standard is extensible, so that particular information can be captured under discipline-specific terms. For plant-pollinator interactions in particular, a vocabulary of terms has been recently proposed by the Brazilian Network of Plant-Pollinator Interactions (REBIPP, https://www.rebipp.org.br, Salim et al. 2022). Thus, REBIPP terms can be used together with DwC to standardise plant-pollinator datasets and to make them accessible on different platforms, such as the REBIPP database.
The aim of this tutorial is to facilitate the standardisation of any spreadsheet containing plant-pollinator interaction data, and sharing the final product in the REBIPP platform. A standardised dataset can be shared in other platforms as well, such as GloBI (https://www.globalbioticinteractions.org) and GBIF (https://www.gbif.org/new-data-model) (see WorldFAIR Deliverables 10.2 and 10.3 for more information). The process of cleaning and standardising the data is done with OpenRefine (https://openrefine.org), a free, open source program useful for transforming biodiversity data en masse. Plant-pollinator databases may differ considerably in the level of detail of the information they contain and in how this information is presented, but we expect that every user will be able to standardise their own data by reproducing some of the steps detailed herein.
WorldFAIR Agricultural biodiversity FAIR data assessment rubrics (Deliverable 10.3)
Drucker, Debora et al.
The WorldFAIR Case Study on Agricultural Biodiversity (WP10) addresses the challenges of advancing interoperability and mobilising plant-pollinator interactions data for reuse. Previous efforts, reported in WorldFAIR Deliverable 10.1, ‘Agriculture-related pollinator data standards use cases report’ (Trekels et al., 2023), provided an overview of projects, good practices, tools, and examples for creating, managing and sharing data related to plant-pollinator interactions. It also outlined a work plan for conducting pilot studies. Deliverable 10.2 (Drucker et al., 2024) presented Agricultural Biodiversity Standards, Best Practices and Guidelines Recommendations. This deliverable presented results from six pilot studies that adopted standards and recommendations from the earlier report. The current report complements the efforts with Agricultural Biodiversity FAIR data assessment rubrics.
We introduce a set of FAIR assessment tools tailored to the plant-pollinator interactions domain. These tools are designed to help researchers and institutions evaluate adherence to the FAIR principles. In the discovery phase, we found that a significant amount of data on plant-pollinator interactions is available as supplementary files of research articles, in a diversity of formats such as PDFs, Excel spreadsheets, and text files. The diversity of approaches and the lack of appropriate data vocabularies lead to confusion, information loss, and the need for complex data interpretation and transformation. Our proposed framework primarily targets researchers in this domain who wish to assess the FAIRness of the data they produce and take action to improve it. However, we believe it can also benefit data reviewers, data stewards, data repository managers and librarians dealing with plant-pollinator data. Our approach focuses on being as familiar as possible with the researcher’s practices, language, and jargon. Ultimately, we aim to promote data publishing and reuse in the plant-pollinator interactions domain.
We present a ‘Rubric for the assessment of Plant-Pollinator Interactions Data’ with examples from the data from the pilots developed in Deliverable 10.2 and in relation to the FAIR Implementation Profile (FIP) created by Work Package 10. We conduct ‘dataset assessments’ of available data from research projects surveyed in the discovery phase. Additionally, we describe in detail the ‘Automated FAIR-enabled Data Reviews’ generated by the Global Biotic Interactions (GLoBI) infrastructure, with examples from the pilots.
We believe the tools described in this report will encourage data publishing and reuse in the plant-pollinator interactions domain. Moving from diverse approaches and siloed initiatives to widely available FAIR plant-pollination interactions data for scientists and decision-makers will enable the development of integrative studies that enhance our understanding of species biology, behaviour, ecology, phenology, and evolution.
An assessment of the ocean data priority areas for development and implementation roadmap
(Deliverable 11.1)
After an introduction to the Intergovernmental Oceanographic Commission of UNESCO’s Ocean Data and Information System (ODIS), the report summarises an evaluation of FAIR Implementation Profiles and FAIR Enabling Resources compiled from across WorldFAIR case studies. It then synthesises supplementary insights obtained through a survey distributed across project partners, and identifies a pathway to implement sustainable cross-domain (meta)data flows to inform and support the current development of the Cross-domain Interoperability Framework (CDIF), a major output of WorldFAIR.
The WorldFAIR case studies on biodiversity, disaster risk reduction, chemistry, and cultural heritage were identified as focal points to bridge with ODIS, being complementary to the strategic priorities of marine science and sustainable ocean management and offering clear socio-technical interfaces compatible with ODIS’s own interoperability approaches. The high-level roadmap in this report outlines the general approach that will be pursued in the remaining tasks in the ocean science case study.
In summary, this report draws from current data practices insight from the international, multi-domain WorldFAIR consortium to identify the most viable routes to establish and sustain cross-domain data interoperability.
WorldFAIR New interoperability specifications and policy recommendations (Deliverable 11.2)
Author: Pier Luigi Buttigieg
Following closely from WorldFAIR Deliverable 11.1, this deliverable introduces a set of (meta)data interoperability specifications and recommendations for policies that would ensure their meaningful implementation and development within projects such as WorldFAIR and frameworks such as the Cross-Domain Interoperability Framework (CDIF). This is a concrete step towards interoperable regional and global data spaces (in the terms technical and accurate sense) using domain and regionally neutral interoperability conventions. This is essential to power the emerging integrative, AI-augmented ecosystems such as digital twins, cloud-native solutions, and virtualisation engines.
Specifications developed in the central case study of Work Package (WP) 11 – the Intergovernmental Oceanographic Commission’s Ocean Data and Information System (ODIS) – were screened for their appropriateness to the cross-domain goals of WorldFAIR, and are now ready for community review in the CDIF community development space. Collectively, these specifications are referred to as “CDIF Core” and include (meta)data exchange specifications for 44 entity types (including datasets and other creative works, projects, organisations, and software) that are now available for review by other domains. These specifications are already aligned to prevailing conventions and web architectural patterns used by millions of machines worldwide (based on exchange of JSON-LD/schema.org). As such, this deliverable reports on a concrete advancement of CDIF and anticipates demonstrations of cross-work-package digital exchanges. Should these exchanges be successful and the specifications be well-managed under CDIF, a pathway to come full circle and align ODIS to CDIF is also described.
Perhaps of more importance in the long-term, this deliverable also describes, at a technical level, how the CDIF Core specifications should be managed by WorldFAIR and external implementation groups to maintain coherence and alignment, while supporting discoverable innovation. It also describes 1) how to nest domain- or regionally-specific content into the domain-neutral CDIF Core; and 2) how co-developers may feed back into the CDIF Core specifications in a principled and fully traceable way.
Following this work, WP11 will – as described in Task 11.3 – leverage CDIF Core to pursue demonstrations of cross-domain interoperability with WorldFAIR partners (particularly in biodiversity, cultural heritage, and disaster risk reduction) and/or themes to confirm the viability of the approaches described in this report.
Ocean Science and Sustainable Development Demonstration (Deliverable 11.3)
In close succession of WorldFAIR deliverables 11.1 and 11.2, this deliverable briefly reports on demonstrations of verified, FAIR (meta)data exchanges with selected, independent partners through the Ocean Data and Information System (ODIS). These exchanges are facilitated through Web architectural approaches and linked open data (LOD) norms, which the ODIS Architecture (ODIS-Arch) has used to create digital supply chains between highly diverse data systems around the world. With WorldFAIR’s support, ODIS-Arch has been extended with new (meta)data profiles to support cross-domain interoperability, in alignment with the principles of the emerging Cross-Domain Interoperability Framework (CDIF). Due to its success, the ODIS approach to domain-independent, interoperable data and information flow is being used as a guiding reference implementation for CDIF (described further in D11.2), aligned to its core principles.
While primarily concerned with Work Package (WP) 11, this deliverable also has bearing on the thematic areas of WP03 (Chemistry), WP05 (Geochemistry), WP09 (Biodiversity), WP10 (Agricultural Biodiversity), WP12 (Disaster Risk Reduction), and WP13 (Cultural Heritage). This document focuses on WP03, WP09, WP12, and WP13, but the demonstrations and new specifications noted are relevant to WP05 and WP10 due to their thematic proximity to the former WPs. As described in Section 2, (meta)data from each domain represented in these WPs is now flowing across the ODIS Federation, and is – to varying degrees – FAIR within and beyond it. Due to alignments in LOD implementation, direct interoperation between ODIS (WP11) and GBIF (WP9) is now being implemented, which has far-reaching implications for marine biodiversity data flow and the strengthening of Essential Ocean Variable (EOV) data systems. Presently, interoperation potential with other WPs is less technically direct; however, clear avenues to mature the status quo are present and discussed in Section 2, based on insights from WorldFAIR case studies.
In terms of the wider datascape, this deliverable shows that cross-domain digital interoperability can be straightforward, should 1) a trusted, regionally and domain-neutral entity provide coordination and conflict resolution (in the case of WP11, the International Oceanographic Data and Information Exchange of IOC-UNESCO, the Intergovernmental Oceanographic Commission), 2) the will to collaborate, rather than compete, exists across partners, 3) global perspective and multilateralism inform highly competent technical leadership, and 4) clear implementation and operational concerns are ranked above untested innovation and bureaucratic convenience.
The overall conclusion of this deliverable is one of great optimism: the demonstrations presented here, and the trend across the other WorldFAIR case studies discussed, indicate a strong convergence towards domain-neutral (meta)data exchange over the Web, where domain-specific conventions are either translated to or embedded within generic serialisations and semantics to allow rapid and accurate communication across highly diverse implementation and operational scenarios. Work to secure the progress made in WorldFAIR will continue and seek further resourcing to fulfil the great promise co-developed during the project.
Disaster Risk Reduction Case Study Report
Bolland, Jill; Fakhruddin, Bapon; Reinen-Hamill, Richard
This report describes the types of data used for disaster risk reduction (DRR) and provides two country case studies, for Fiji and Sudan, with an in-depth look at the DRR datasets and associated metadata used by each country. These datasets were assessed against 15 FAIR (Findable, Accessible, Interoperable, Reusable) data metrics to identify which elements of FAIR were met. The report also provides a broader context giving details on the national, regional, and global agencies providing or hosting DRR data as well as initiatives aiming to increase the FAIRness of DRR data.
Both of our case study countries are using remote sensing data which were assessed as having the richest metadata and met most of the FAIR metrics used in the assessment. Strategies for exploiting this data are discussed as they have great potential to provide up to date information during an emergency and to fill gaps in DRR data.
An essential task for any scientific discipline is the establishment of common standards and terminologies. Historically, standards have differed considerably with agencies creating standards and vocabularies based on their own use cases and priorities; consequently, there is currently no universal standard used by all DRR practitioners. We discuss the most widely used standard definitions and provide suggestions for harmonising standards. As both the United National Nations Office for Disaster Risk Reduction (UNDRR) and the World Meteorological Organisation (WMO) have been working toward improving the FAIRness and consistency of DRR data, we describe their efforts and outline their lessons learned and recommendations. Our next deliverable, which discusses metadata standards, controlled vocabularies, and ontologies, will add to this discussion.
While the current report focuses entirely on the DRR research area, DRR research is interdisciplinary by nature, encompassing researchers from earth sciences, climate change and environmental sciences, social studies, cultural information, and others. A key recommendation from the UNDRR is that there should be interdisciplinary collaboration when setting standards and definitions; therefore, increasing FAIRness in DRR has the potential to increase FAIRness across many related disciplines.
The study found that the data used by Fiji and Sudan for DRR is missing many key FAIR data elements. Hazard data tended to score highest for FAIRness, particularly hazard data originating from satellites. In contrast, vulnerability and exposure data were the least FAIR with little metadata and limited machine readability. However, there are some excellent regional and global initiatives aimed at increasing the level of FAIRness in DRR data. The UNDRR is currently reinventing its DRR database to provide a much more coherent and consistent view of the state of DRR both globally and nationally. We applaud this project and believe that significant effort should be made by the global and regional agencies to work together to provide standards, controlled vocabularies, data distribution platforms, resources and guidance for all people working to reduce the impact of disasters.
Disaster Risk Reduction Domain-specific FAIR vocabularies (Deliverable 12.1)
Bolland, Jill; Shanker, Neeraj; Reinen-Hamill, Richard; Fakhruddin, Bapon
Disasters are inherently complex with wide-ranging and cascading impacts. The exponential growth in data generated daily, coupled with the complex nature of disasters, means we are hitting the limits of humans’ capacity to fully exploit all the data available for disaster risk reduction (DRR). This can be addressed with well-designed, pretrained Artificial Intelligence (AI) algorithms that can analyse large, complex datasets and fuse heterogeneous data. However, machine-readable, semantically linked data is a precursor for the use of AI in DRR.
Nations possessing ample resources and technical proficiency are better positioned to leverage DRR data effectively, thereby potentially creating disparities in the accessibility and application of DRR data. Recent advances in technology – particularly remote sensing data, which is income-agnostic and provides global coverage – provide an opportunity to reduce DRR data gaps. Global DRR institutions should collaborate proactively with countries and regional institutions to ensure the provision of Findable, Accessible, Interoperable, and Reusable (FAIR) and open DRR data. This could help bridge any historical or emergent DRR data inequalities.
This deliverable explores the use of vocabularies in the DRR domain and how controlled vocabularies coupled with ontologies can enhance the semantic value of DRR data thereby improving interoperability. Enhancing semantic interoperability would result in improved collaboration and communication within the DRR domain and facilitate collaborations with other scientific domains. The final sections of this report provide examples of the use of remote sensing data and AI for DRR. We hope that the ideas and suggested actions in this report can be used to transform raw DRR data to valuable insights and decisions that produce tangible reductions in the impact of disasters worldwide.
Disaster Risk Reduction Domain-specific FAIR vocabularies (Deliverable 12.2)
Disasters are inherently complex with wide-ranging and cascading impacts. The exponential growth in data generated daily, coupled with the complex nature of disasters, means we are hitting the limits of humans’ capacity to fully exploit all the data available for disaster risk reduction (DRR). This can be addressed with well-designed, pretrained Artificial Intelligence (AI) algorithms that can analyse large, complex datasets and fuse heterogeneous data. However, machine-readable, semantically linked data is a precursor for the use of AI in DRR.
Nations possessing ample resources and technical proficiency are better positioned to leverage DRR data effectively, thereby potentially creating disparities in the accessibility and application of DRR data. Recent advances in technology – particularly remote sensing data, which is income-agnostic and provides global coverage – provide an opportunity to reduce DRR data gaps. Global DRR institutions should collaborate proactively with countries and regional institutions to ensure the provision of Findable, Accessible, Interoperable, and Reusable (FAIR) and open DRR data. This could help bridge any historical or emergent DRR data inequalities.
This deliverable explores the use of vocabularies in the DRR domain and how controlled vocabularies coupled with ontologies can enhance the semantic value of DRR data thereby improving interoperability. Enhancing semantic interoperability would result in improved collaboration and communication within the DRR domain and facilitate collaborations with other scientific domains. The final sections of this report provide examples of the use of remote sensing data and AI for DRR. We hope that the ideas and suggested actions in this report can be used to transform raw DRR data to valuable insights and decisions that produce tangible reductions in the impact of disasters worldwide.
Disaster Risk Reduction findings and recommendations (Deliverable 12.3)
This report provides recommendations to Disaster Risk Reduction (DRR) practitioners for increasing FAIRness in data used for all phases of Disaster Risk Reduction. These recommendations are formed based on literature reviews; our first deliverable, which provided a detailed assessment of FAIRness of data for two country case studies; our second deliverable on the state of data management and vocabularies in DRR; and from our own opinions as DRR researchers.
There is much to do to improve the FAIRness of DRR data, but we suggest that the first action is to improve the accuracy and interoperability of international disaster databases. Data on historical events are fundamental to our understanding of the future impact of disasters and our ability to mitigate and recover from them. Ensuring the accuracy of disaster databases is a fundamental step toward building safer, more resilient communities and nations.
Cultural Heritage Mapping Report: Practices and Policies supporting Cultural Heritage image sharing platforms (Deliverable 13.1)
Knazook, Beth; Murphy, Joan
Deliverable 13.1 Cultural Heritage Mapping Report: Practices and Policies supporting Cultural Heritage image sharing platforms outlines current practices guiding online digital image sharing by institutions charged with providing care and access to cultural memory, in order to identify how these practices may be adapted to promote and support the FAIR principles for data sharing. It looks closely at the policies and best practices endorsed by a range of professional bodies and institutions representative of Galleries, Libraries, Archives and Museums (the ‘GLAMs’) which facilitate the acquisition and delivery, discovery, description, digitisation standards and preservation of digital image collections. The second half of the report further highlights the technical mechanisms for aggregating and exchanging images that have already produced a high degree of image interoperability in the sector with a survey of six national and international image sharing platforms: DigitalNZ, Digital Public Library of America (DPLA), Europeana, Wikimedia Commons, Internet Archive and Flickr. This report will be a valuable resource in producing recommendations for aligning existing professional practice in the sector with the FAIR principles – a key milestone for the case study.
The report concludes with some thoughts on the position of the Digital Repository of Ireland (DRI) as an image sharing platform within this landscape, as a stewarding repository for both cultural heritage organisations in Ireland seeking to preserve and make accessible their collections as well as research projects curating, examining, preparing and delivering cultural heritage data for reuse. At the end of the WorldFAIR project, the DRI will aim to have tested and implemented recommendations that align established collections delivery mechanisms to facilitate the use of cultural heritage images as research data, improving the findability, accessibility, interoperability and reusability of Ireland’s visual cultural memory.
Cultural Heritage image sharing recommendations report (Deliverable 13.2)
Knazook, Beth; Murphy, Joan
Deliverable 13.2 for the WorldFAIR Project’s Cultural Heritage Work Package (WP13). Although the cultural heritage sector has only recently begun to think of traditional gallery, library, archival and museum (‘GLAM’) collections as data, long established practices guiding the management and sharing of information resources has aligned the domain well with the FAIR principles for research data, evidenced in complementary workflows and standards that support discovery, access, reuse, and persistence. As explored in the previous report by Work Package 13 for the WorldFAIR Project, D13.1 Practices and policies supporting cultural heritage image sharing platforms, memory institutions are in an important position to influence cross-domain data sharing practices and raise critical questions about why and how those practices are implemented.
Deliverable 13.2 aims to build on our understanding of what it means to support FAIR in the sharing of image data derived from GLAM collections. This report looks at previous efforts by the sector towards FAIR alignment and presents 5 recommendations designed to be implemented and tested at the DRI that are also broadly applicable to the work of the GLAMs. The recommendations are ultimately a roadmap for the Digital Repository of Ireland (DRI) to follow in improving repository services, as well as a call for continued dialogue around ‘what is FAIR?’ within the cultural heritage research data landscape.
Implementation and Testing of Cultural Heritage Image Sharing Recommendations: DRI Case Study Report (Deliverable 13.3)
This is a summary report describing a use case for infrastructures that host image data for the cultural heritage sector. WorldFAIR Project Work Package 13 (Cultural Heritage) was tasked with providing a pathway to enabling wider adoption of image-sharing policies and practices in Galleries, Libraries, Archives and Museums (‘GLAM’ organisations) which align with the FAIR principles for research data, and are easily exchanged with commonly-used technologies and standards for sharing data in other domains of practice represented in the WorldFAIR Cross-Domain Interoperability Framework (CDIF).
This report builds on the findings of a mapping exercise in Deliverable 13.1, ‘Cultural Heritage Mapping Report: Practices and policies supporting cultural heritage image sharing platforms’ and outlines the process undertaken by the Digital Repository of Ireland (DRI) to respond to the general recommendations developed by an international working group in Deliverable 13.2, ‘Cultural Heritage Image Sharing Recommendations Report’, both through technical implementation and the development of best practice guidance for DRI members and the broader user community served by the repository.
Deliverable 13.3 describes the implementation plan and current state of activities, some of which will still be in progress at the time of publication. This is primarily a record of decisions made towards realising the five key recommendations. Further resources which document the outcomes of these activities will be made available in the DRI repository at the close of the WorldFAIR project.
Cultural Heritage Test Ingest (MS13)
This Milestone describes the development of a case study for FAIR-aligned image sharing in the cultural heritage domain, and details the technical implementation carried out in the test environment at the Digital Repository of Ireland (DRI), Ireland’s national repository for cultural heritage, humanities and social sciences data.
It contributes to an understanding of the D13.3 ‘DRI Case Study Report’, which outlines in greater detail the context, discussion, actions and results of DRI’s response to the 5 broadly generalisable recommendations produced by a coordinated working group of cultural heritage professionals for D13.2 ‘Cultural Heritage Image Sharing Recommendations Report’. Both of these deliverables followed on the publication of D13.1 ‘Cultural Heritage Mapping Report: Practices and policies supporting cultural heritage image sharing platforms’, which demonstrated the ways in which image data and metadata interoperability had already been explored within the cultural heritage sector.




You must be logged in to post a comment.