EHDS Regulation Article 60) Duties of health data holders
The health data holder shall communicate to the health data access body a description of the dataset it holds in accordance with Article 77. The health data holder shall, at a minimum on an annual basis, check that its dataset description in the national dataset catalogue is accurate and up to date.
These records are crucial for effective data discovery, which is supported by dataset catalogues designed as Web platforms. These platforms must expose metadata records and ensure that their data discovery systems can perform searches ranging from simple full-text queries to more complex faceted search or linked data queries. The governance of these datasets catalogues within the EHDS is overseen by Health Data Access Bodies, making the support for a seamless data discovery experience essential.

DCAT-AP (Data Catalog Vocabulary - Application Profile) is a metadata standard designed to describe datasets in a way that facilitates their discovery, accessibility, and interoperability. By providing a structured common vocabulary - a lingua franca - for the dataset catalogues, DCAT-AP ensures that datasets can be easily shared and reused across various platforms. It enables data users to search for datasets efficiently, understand their content and purpose and ensure their reuse.
Enhancing Findability and Reuse according to the FAIR data principles
DCAT-AP supports the implementation of the FAIR data principles. The FAIR data principles stand for Findability, Accessibility, Interoperability, and Reusability, which are guidelines to ensure that data and metadata are FAIR . The EHDS Regulation endorses the FAIR data principles to ensure health data sharing and responsible reuse across the EU. The Regulation lays the foundation for an ecosystem of federated national dataset catalogues alongside a central catalogue.
EHDS Regulation Article 79) EU Dataset Catalogue
1. The Commission shall establish an EU dataset catalogue connecting the national (x) dataset catalogues established by the health data access bodies in each Member State as well as the dataset catalogues of authorised participants in HealthData@EU.
2. The EU dataset catalogue, the national dataset catalogues and the dataset catalogues of authorised participants in HealthData@EU shall be made publicly available.
The EHDS also provides the necessary metadata properties that align with what a data user might search for. What is the dataset about? Who created the dataset? Who is responsible for it? Who can access the dataset, and under what conditions? When was the dataset published? Where was the dataset collected? Where can the dataset be accessed or downloaded? Why was the dataset created? How is the dataset structured (e.g., data model, file format, schema)? Etc. For instance, DCAT-AP includes essential metadata properties such as dataset types (e.g., geospatial or statistical data), access conditions (e.g., open data or protected data), and dataset themes, ensuring efficient categorisation and discoverability.
EHDS Regulation Article 80) Minimum specifications for datasets of high impact
The Commission may, by means of implementing acts, determine the minimum specifications for datasets of high impact for secondary use, taking into account existing Union infrastructures, standards, guidelines and recommendations.
Interoperability and machine readability of DCAT-AP
DCAT-AP is inherently designed to support interoperability, machine readability, and machine actionability, using the RDF (Resource Description Framework) to enable seamless interaction between health data platforms and AI applications. This dual nature allows datasets to be more accessible not only to people but also to machines such as AI applications, provided that dataset catalogues offer the necessary interfaces and care about interoperability for machine interaction.

Its linked data approach (RDF-based structure), coupled with the use of persistent identifiers (HTTP URIs), promotes the concept of openness, enables the creation of interconnected knowledge graphs, which enhance the discoverability of datasets and reduce the need for duplicated descriptions. Cross-domain compatibility, federation, and semantic interoperability ensure that metadata records are harvestable, indexable, and manageable across federated catalogues. They reduce the need to duplicate descriptions, as they encourage the creation of global knowledge graphs pointing to single sources of truth. The standard also includes rules for managing duplication of metadata records across federated catalogues. By leveraging open controlled vocabularies to describe concepts and relations, DCAT-AP models information in a way that machines can process as both readable and actionable data. This capability extends the semantic web, increasing the discoverability and reuse of datasets catalogued with DCAT-AP.
DCAT-AP in the EU Data Spaces
DCAT-AP helps ensure that metadata standards across EU data spaces are aligned, which fosters the seamless sharing of data across borders and domains. This is essential to the goals of the European Data Strategy , which aims to create a single European data market.
Example of the interconnection between EOSC EU Node and EHDS
The DataCite Metadata Schema, utilised within the European Open Science Cloud (EOSC), offers a standardised framework for describing and cataloguing research data, publications, and other scholarly outputs. This standardisation is vital for ensuring the discoverability, accessibility, and reuse of research outputs, thereby supporting open science and promoting a connected and interoperable research environment across Europe. The envisioned alignment of DCAT-AP and DataCite standards within the RDF framework is intended to facilitate the interconnection between the European Health Data Space (EHDS) and the EOSC through DataCite-to-DCAT-AP mapping. While DCAT-AP is used to catalogue datasets that can be employed in research, the DataCite schema provides a detailed description of digital objects generated by research and assigns Persistent Identifiers (PIDs) for their citation. These PIDs can be seamlessly integrated into DCAT catalogues, thereby enhancing both the discoverability and understanding of datasets. For instance, this interconnection enables machines to efficiently list all research outputs associated with a specific dataset. This is particularly beneficial when handling sensitive health data that cannot be shared directly, as it allows researchers to discover 'proxy' information related to the sensitive data. This approach enhances researchers' understanding of the dataset's relevance and potential applicability in their studies.

The interconnection of DCAT-AP and DataCite standards enhances the interoperability and findability of datasets, reinforcing the connection to FAIR data principles and supporting open science and health data reuse across Europe. The governance of these dataset catalogues in scope of the EHDS is overseen by Health Data Access Bodies, in line with the EHDS regulation. These bodies are responsible for ensuring that metadata records meet high-quality standards for data discovery."
'European Health Data Space – Working with the future European metadata catalogue'
EUPHA conference 9 November 2023, 9:00-10:00, Dublin