HealthDCAT-AP mandates the use of persistent dereferenceable URIs (i.e., HTTP URIs) for the identifier of the metadata record. (i.e.: dct:identifier). |
Example: https://hdab-catalogue.org/publisher_0/datasets/68112f77-8f2c-496a-8398-77e52b60c883) |
A persistent identifier is a stable and unique reference to a metadata record or dataset. In HealthDCAT-AP, these identifiers are typically HTTP URIs - web addresses that point directly to the specific dataset's description or metadata. By using persistent URIs, the data can be reliably accessed and shared across platforms, ensuring that even if the metadata is moved or copied, the original reference remains intact. This allows for consistent data management and discoverability over time. The use of persistent dereferenceable URIs also applies to other DCAT properties linking a dataset to other datasets (e.g.: IsVersionOf, ...) or external resources (e.g.: IsReferencedBy, ...) where persistent dereferenceable URIs are expected. The use of persistent dereferenceable URIs are essential for ensuring the stability, reliability, and accessibility of resources within the EU Health Data Space and beyond. We know the benefit of Digital Object Identifiers (DOI) for providing stable, reliable, and accessible identifiers for digital objects. A DOI is a unique alphanumeric string assigned to a digital object that provides a permanent link to its location on the Internet. It enhances the organisation, sharing, and citation of digital content in a wide range of fields like scientific publications. The same logic applies to metadata records that will be shared and exchanged between dataset catalogues. Similarly, in HealthDCAT-AP, each metadata record is assigned a persistent dereferenceable URI (HTTP URIs), ensuring that the original metadata can always be reliably accessed. The primary dataset catalogue serves as the single source of truth, and any updates must occur at the source. Copies made for discoverability must always reference the original metadata. |
![]() |
Webinar DCAT-AP Health - Session 1 (June 2023) by DIGIT.D2 – Interoperability – Use Case 1: Harvesting 'PURIs allow to explore the network of catalogues beyond a single portal.' |
While copies of the master metadata can be made for enhancing discoverability purposes on the Web, only the master metadata must be updated at the primary source by the data holder. Maintaining persistent dereferenceable URIs for metadata identifiers allows efficient metadata management. 'It is the user that has to ensure that its use of a dataset is legally acceptable, not the data catalogue. And they can do this because users can retrieve the original metadata' . Thus at any time, a copy of a metadata can be retrieved by returning at its source thanks to its main identifier. |
If a metadata record does not meet the HealthDCAT-AP requirements for metadata identifiers, Health Data Access Bodies (HDABs) are required to assign an additional persistent, dereferenceable URI (HTTP URI) to the metadata identifier while retaining the original identifier provided by the data holder. When aggregating metadata, HDABs must ensure the integrity of metadata. DCAT-AP 3.0's Catalogue Record class provides HDABs with the necessary metadata elements for managing aggregated metadata:
|
DCAT is a RDF vocabulary representing a dataset catalogue. It is a knowledge graph (Linked Data) which relies on:
|
All these Web ontologies participate to create a knowledge graph: Fine grained structure of information unambiguously interpretable by machines. By using a knowledge graph, stakeholders can more easily find, understand, and use datasets, as the relationships between datasets and resources (like publications or codes) are explicitly defined. |
![]() |
Annex 2: In describing a dataset, the DCAT-AP metadata model includes more relationships, such as SKOS code lists (shown in orange) and Web resources (shown in blue), expressed as HTTP URIs, than textual information (shown in black). |
![]() |
Figure 2: DCAT-AP includes properties that link a dataset to other datasets or resources using HTTP URIs. |
To help data users understand how a dataset can be used, investigating its previous applications can be highly beneficial. As a knowledge graph, DCAT helps bridge input data with research data, offering insights into how datasets have been utilised (Annex 2). The requirement of using persistent URIs for metadata identifiers is not only of interest for the management of a system of distributed catalogues, it is also beneficial for building the European Health Data Space as an extended data graph where all resources are interconnected over the Web via communication standards and where machines can create knowledge (See: 2.1 Background - DCAT-AP in the EU Data Spaces). |
Linked data principles: |
|
One requirement for operating the EU Health Data Space is to care for the HTTP URIs and to be able to identify broken URIs (i.e., relationships). Another requirement for governing the EHDS will be to measure its level of maturity: percentage of URIs, level of achieved interoperability, for instance, metadata being a dataset proxy it provides information on the data conformity to standards (ref: dct:conformsTo), etc. The principles of linked data (i.e., using URIs, providing useful information, and linking to other URIs) are foundational for building this interconnected space. Maintaining functional URIs and measuring the level of interoperability will be key to ensuring the maturity and effectiveness of the European Health Data Space. |
The EHDS Regulation introduces the obligation for data holders to review their metadata records (aka 'dataset description') once a year. |
3. ... The health data holder shall, at a minimum on an annual basis, check that its dataset description in the national dataset catalogue is accurate and up to date. |
This newly introduced rule for data holders requires a metadata control management process, adding a new feature to DCAT-AP. In the CatalogRecord class of DCAT-AP 3.0, the mandatory property dct:modified shows when the metadata was last updated, but it doesn't indicate when it was reviewed. In the same Class, the 'listing date' informs on which date, the description of the dataset was included in the catalogue. If the EU central health dataset catalogue or national catalogues will manage the review process, HealthDCAT-AP would need a new property, like healthdcatap:reviewDate, to show when the metadata was last reviewed or updated by the data holder. Alternatively, an internal technical metadata property, "reviewDate," could be managed within the data holder's internal system without being shared across catalogues to address this requirement. According to the EHDS Regulation, data holders are responsible for reviewing their dataset descriptions, with no mandate given to dataset catalogues for this process. In both harvesting and publishing scenarios, it is best practice to include the DCAT CatalogRecord alongside the Dataset, Distribution, and DataService classes. The CatalogRecord is essential because it provides metadata specifically about the metadata entry itself. Notably, it indicates the Application Profile used for the metadata record (through dct:conformsTo), specifying which profile the catalogued resource's metadata adheres to. This enables aggregator agents to validate the metadata against specific constraints during integration. This information is particularly valuable as HealthDCAT-AP introduces three distinct sets of cardinalities, making it crucial for ensuring accurate validation and interoperability across catalogues. |


