Defining a core metadata framework for cross-domain data sharing and reuse

October 1 @ 8:00 am October 6 @ 5:00 pm

The workshop takes place at Schloss Dagstuhl – Leibniz Center for Informatics on October 1 to October 6, 2023. See also the corresponding Dagstuhl web page and its information on COVID-19.


Discovery, Access, and Evaluation

This group will look at the existing work regarding data discovery and access, and extend this to address the related aspects of data assessment and evaluation. While there are several discrete areas involved in these functions, much of the metadata required is the same. This group would look at these functions and how they combine, with an eye toward defining a coherent set of metadata appropriate to each distinct step, up to the point of integration and analysis.

Integration and Semantic Mapping

Data integration has several aspects, among which are the combination of data at a structural level, and the ability to equate similar semantics related to the data. The roles played by concepts are the point of contact between these. While structural manipulation of data can be automated to a great extent, the mapping of semantics – even when informed by knowledge of the roles played by the relevant concepts – is much less prone to automation. This group will look at how the different aspects of data integration can be addressed, and what is possible or desirable in terms of automation and support for non-automated activities.

Standard Expression of “Universals”

Some types of information do not require domain-specific expression, either because they are described in a consistent fashion across domain boundaries, or because the domain descriptions of them are universally employed. This category includes not only time, geography, and (basic) units of measure, but also extends to such ubiquitous classifications as the species taxonomy. Understanding where a “lingua franca” such as CDIF is needed, and where it is not, is important in providing guidance to adopters. This group will explore where these limitations are, and the implications for interoperability across domain boundaries.

Events, Occurrences, and Samples

Many domains have similar approaches for organizing their data: events become the subject of measurements. Often, samples are a critical aspect of such approaches. The act of measurement – the occurrence – is often an important focus. This type of data description – and the description of relevant data collection and sampling events – has implications for what constitutes sufficient provenance information. These elements are combined in different ways across domains, and yet bear many similarities. The goal of this group is to identify similarities across domains, and to consider the requirements for exchange of this information across domains boundaries. The discussion is likely to be an exploratory one, but can draw on the models and standards which address these topics such as GBIF, OGC’s Observations & Measurements, OHDSI’s OMOP CDM, and others.

Cross-Cutting and Presentational/Editorial Issues

The organization and presentation of the CDIF guidelines requires that each functional area be presented in a stepwise fashion, with a clear return on effort for adoption. If CDIF is to be adopted by the target communities, it must be easy to understand what should be implemented by practitioners in different domains, according to what data they wish to provide for cross-domain use. This group would look at any such cross-cutting issues and how they can be addressed and documented for the purposes of the CDIF guidelines. It is expected that this group will perform a coordinating role across the work of other groups, as needed.