Article 51 of the EHDS Regulation expands the scope of health data to include additional domains that impact health, such as pathogen and environmental data. Many of these domains already rely on specific standards for describing, categorising, or modeling their data, often tailored to their unique purposes. Below is a brief, non-exhaustive overview of some of these standards and their focal areas:
- The HL7 (Health Level Seven International) wikidata:Q17054989 defines a set of international standards for the exchange, integration, sharing, and retrieval of electronic health information. HL7 standards provide a comprehensive framework for clinical and administrative data. Its primary scope is the exchange of individual clinical and administrative data elements (e.g., patient demographics, clinical observations). It is used to describe individual health records or transactions, not entire datasets. HL7's data models and messaging standards are HL7 V2, V3, and FHIR.
- FHIR (Fast Healthcare Interoperability Resources) wikidata:Q19597236 is a standard describing data formats and elements for exchanging electronic health records. Developed by HL7, it is designed to enable fast and efficient exchange of healthcare information. It uses modern web technologies and focuses on interoperability. FHIR focuses on specific elements like patients, observations, medications, and other clinical data points rather than on metadata for datasets as a whole.
- OpenEHR wikidata:Q838025 is an open standard specification in health informatics that describes the management and storage, retrieval and exchange of health data in electronic health records (EHRs). Part of the OpenEHR framework, OpenEHR Archetypes are formal models or templates that define the structure, meaning, and relationships of health-related data in an interoperable and standardised way.
- ICD (International Classification of Diseases) wikidata:Q50018 is a globally recognised standard , maintained by the World Health Organization (WHO), for coding diseases and health conditions. It provides standardising classification codes for diseases and health conditions. ICD codes and descriptions can be used to standardise the classification of health-related datasets in HealthDCAT-AP.
- LOINC (Logical Observation Identifiers Names and Codes) wikidata:Q502480 is a universal standard for identifying health measurements, laboratory observations, and clinical data. LOINC codes can be used to describe lab tests, measurements, and other clinical observations in HealthDCAT-AP.
- SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) wikidata:Q37616346 is a comprehensive clinical terminology that provides codes, terms, synonyms, and definitions used in clinical documentation and reporting such as diseases, clinical findings, and procedures. SNOMED CT can be utilised to describe clinical concepts and healthcare terms in HealthDCAT-AP.
- The ISO/IEC 11179 wikidata:Q3146900 standard provides guidelines for metadata registries, including the registration and management of metadata for data elements. It offers a structured approach to define and manage metadata elements, which can be applied to health datasets. As DCAT does not provide recommendations on metadata management, ISO/IEC 11179 can serve as a complementary standard to provide the necessary guidelines for Health Data Access Bodies to manage metadata effectively. Together, DCAT and ISO/IEC 11179 can support the creation of interoperable and well-governed health data spaces.
- OMOP (Observational Medical Outcomes Partnership) wikidata:Q125499706 Common Data Model standardises the format and content of observational health datasets (i.e.: clinical observations, treatments, and outcomes data).
- CDISC (Clinical Data Interchange Standards Consortium) wikidata:Q571067 standards facilitate the exchange of clinical trial data and include models like CDASH (Clinical Data Acquisition Standards Harmonization) and SDTM (Study Data Tabulation Model).
- SDMX (Statistical Data and Metadata Exchange) wikidata:Q2713163 is an international initiative that aims to standardise the exchange of statistical data and metadata. Used by statistical organisations to describe and exchange entire datasets and their metadata. It is the foundation for StatDCAT-AP.
- DICOM (Digital Imaging and Communications in Medicine) wikidata:Q81095 is a standard for the handling, storing, printing, and transmitting information in medical imaging.
- MeSH (Medical Subject Headings) wikidata:Q199897 is a comprehensive controlled vocabulary used by the National Library of Medicine (NLM) to index and organise biomedical and health-related information in databases like PubMed and MEDLINE. MeSH terms facilitate precise and consistent search and retrieval of scientific and medical information by categorising content into hierarchical topics and subtopics. This system includes descriptors, qualifiers, and supplementary concept records to cover various aspects of medical knowledge, ensuring that researchers, healthcare professionals, and librarians can find relevant information efficiently.
- GSIM (Generic Statistical Information Model) wikidata:Q122873933 GSIM is a reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information used in the production of official statistics (information objects). The framework enables generic descriptions of the definition, management and use of data and metadata throughout the statistical production process.
- WHO classifications: https://www.who.int/standards/classifications
By integrating established health data standards, HealthDCAT-AP offers a comprehensive framework for describing health datasets in a machine-actionable and interoperable way. This ensures that health data, whether clinical, epidemiological, or genomic, can be efficiently exchanged across Member States, facilitating secondary uses like research and public health analysis. Integrating these standards into various HealthDCAT-AP properties facilitate improved interoperability by promoting standardised data exchange protocols, as well as harmonised data models and formats. These standards, developed to meet specific domain requirements, are highly effective within their respective contexts. It is important that these standards are in use in HealthDCAT-AP as soon as they are de facto recognised as standard due to widespread community usage. During the implementation phase of the EHDS, reviewing the common standards used by health data holders to describe their datasets would be highly beneficial. This review will offer valuable insights for the effective governance of the health data space for instance to evaluate and further foster data harmonisation advancements.

HealthDCAT-AP has been designed to serve as a comprehensive framework for describing diverse health data that utilise various data models, exchange services, formats, and vocabularies, including thesauri and ontologies. It provides an integrated solution for managing data relevant to Article 33 within health dataset catalogues. The vocabulary of HealthDCAT-AP must be robust enough to describe the standards associated with the dataset in a machine-actionable way.

Examples of health Data Models, Services, Formats, and Ontologies for use in HealthDCAT-AP

Data models

Data services

Formats

Thesauri

Ontologies

OMOP Common Data Model: A framework for transforming data from various sources (e.g., EHRs, claims data) into a common format.

OpenEHR: Open standard for managing electronic health records (EHRs).

SDTM (Study Data Tabulation Model) defines a standard structure for human clinical trial (study) data tabulations

etc.

FHIR API: Standard APIs for exchanging healthcare data using RESTful principles.

OpenEHR API: Facilitating the management and exchange of electronic health records.

DICOM Protocol: Beyond being just a file format, DICOM also defines the communication protocol used to exchange medical images and associated information between medical devices, such as scanners, servers, workstations, and printers. This ensures that different systems and devices can communicate and interpret the data correctly.

DICOM File Format: The DICOM standard defines a file format for medical images, which includes both the image data (e.g., MRI, CT scans, X-rays) and associated metadata (e.g., patient information, imaging parameters). The file extension is typically .dcm.

FASTA is a text-based format used for representing nucleotide sequences or peptide sequences (proteins) in bioinformatics.

etc.

ICD is used globally for health management, epidemiology, and clinical purposes, providing codes for diseases, conditions, and procedures.

UMLS integrates multiple health and biomedical vocabularies, providing a large compendium of healthcare-related terms and their relationships.

RxNorm: A normalised naming system for generic and branded drugs

MESH (Medical Subject Headings)

etc.

GO (Gene Ontology) provides a controlled vocabulary to describe gene and gene product attributes across all species, focusing on biological processes, cellular components, and molecular functions.

OBO (Open Biological and Biomedical Ontologies)

DOID (Disease Ontology)

etc.

Search for Medical data models:
Medical-Data-Models.org is a web-based platform designed to provide access to a comprehensive repository of data models that are used in clinical research and patient care.