Glossary

EHDS Regulation Article 2 Definitions

(y) 'dataset catalogue' means a collection of dataset descriptions, arranged in a systematic manner and including a user-oriented public part, in which information concerning individual dataset parameters is accessible by electronic means through an online portal;

(f) 'interoperability' means the ability of organisations, as well as of software applications or devices from the same manufacturer or different manufacturers, to interact through the processes they support, involving the exchange of information and knowledge, without changing the content of the data, between those organisations, software applications or devices;

(u) 'health data user' means a natural or legal person, including Union institutions, bodies, offices or agencies, which has been granted lawful access to electronic health data for secondary use pursuant to a data permit, a health data request approval or an access approval by an authorised participant in HealthData@EU;

(w) 'dataset' means a structured collection of electronic health data;

(ad) 'data quality' means the degree to which the elements of electronic health data are suitable for their intended primary and secondary use;

(aa) 'data quality and utility label' means a graphic diagram, including a scale, describing the data quality and conditions of use of a dataset;

(t) 'health data holder' means any natural or legal person, public authority, agency or other body in the healthcare or the care sectors, including reimbursement services where necessary, as well as any natural or legal person developing products or services intended for the health, healthcare or care sectors, developing or manufacturing wellness applications, performing research in relation to the healthcare or care sectors or acting as a mortality registry, as well as any Union institution, body, office or agency, that has either:
(i) the right or obligation, in accordance with applicable Union or national law and in its capacity as a controller or joint controller, to process personal electronic health data for the provision of healthcare or care or for the purposes of public health, reimbursement, research, innovation, policy making, official statistics or patient safety or for regulatory purposes; or
(ii) the ability to make available non-personal electronic health data through the control of the technical design of a product and related services, including by registering, providing, restricting access to or exchanging such data;

(64) The establishment of one or more health data access bodies, supporting access to electronic health data in Member States, is essential to promoting the secondary use of health-related data. Member States should therefore establish one or more health data access bodies to reflect, inter alia, their constitutional, organisational and administrative structure. However, one of those health data access bodies should be designated as a coordinator in the event there is more than one health data access body. Where a Member State establishes several health data access bodies, it should lay down rules at national level to ensure the coordinated participation of those bodies in the European Health Data Space Board (the 'EHDS Board'). That Member State should, in particular, designate one health data access body to function as a single contact point for the effective participation of those bodies, and ensure swift and smooth cooperation with other health data access bodies, the EHDS Board and the Commission. Health data access bodies could vary in terms of organisation and size, spanning from a dedicated fully fledged organisation to a unit or department in an existing organisation.

Other definitions

Controlled vocabulary: A controlled vocabulary is a predefined, standardised set of terms and phrases used to ensure consistency in naming and categorising concepts within a dataset. It restricts the use of alternative terms or synonyms to avoid ambiguity and maintain uniformity in data description. Controlled vocabularies are commonly used in metadata, taxonomies, and classification systems to improve data discoverability, interoperability, and accuracy in search and retrieval across datasets or systems. Examples include thesauri, ontologies, and code lists.

Content negotiation: Content negotiation refers to mechanisms defined as a part of HTTP that make it possible to serve different versions of a document (or more generally, representations of a resource) at the same URI, so that user agents can specify which version fits their capabilities the best (Wikipedia). This mechanism can, for example, be used to serve an RDF representation of a DCAT metadata record for data exchange or an HTML format for browsers to display as a web page.

Data dictionary: A data dictionary is a centralised repository of metadata that provides definitions, descriptions, and details about the structure, fields, and variables within a dataset. It typically includes information such as data types, allowed values, relationships between fields, and the meaning of each element. A data dictionary helps data users understand the content and structure of a dataset, facilitating proper use and interpretation of the data.

Knowledge graph: In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities (Wikipedia).

Linked data: Linked data is the general term for a set of best practices for exposing data in machine-readable form using the content-negotiation feature of the standard HTTP web protocol. These best practices support the development of tools to link and make use of data from multiple web sources without the need to deal with many different proprietary and incompatible application programming interfaces (APIs), and use of HTTP to request data in structured form meant for machines instead of human-readable displays (doi.org).

Namespace: In the context of Linked Data, a namespace helps records have unique names. A namespace is a component of the URI. In a group of URIs produced as part of a dataset the shared part of the URI is often the namespace. For example all concepts of the Language of Bindings thesaurus start with "https://w3id.org/lob/" which is the namespace for the thesaurus. In Linked Data the namespace may be declared with a shortcut using the keyword prefix. For example: @prefix lob: . The prefix lob can then be used instead of the full namespace.

Tabular data: Tabular data refers to data organised in a structured format of rows and columns, where each row represents a single record or entity, and each column represents a specific attribute or variable. This structure is commonly found in spreadsheets or relational databases, making it easy to store, query, and analyse. Tabular data is often used for structured datasets where relationships between variables are well-defined. (See: Metadata vocabulary for tabular data)

Semantic Web: The Semantic Web, sometimes known as Web 3.0 is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata (Wikipedia).

SKOS: Simple Knowledge Organization System–provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary. As an application of the Resource Description Framework (RDF), SKOS allows concepts to be composed and published on the World Wide Web, linked with data on the Web and integrated into other concept schemes. In basic SKOS, conceptual resources (concepts) are identified with URIs, labeled with strings in one or more natural languages, documented with various types of note, semantically related to each other in informal hierarchies and association networks, and aggregated into concept schemes.

URI: It stands for 'Universal Resource Identifier', and it is a unique address for a documentation record that we want others to refer to. The concept of 'paper', as defined in the Getty Arts & Architecture Thesaurus (AAT), can be referenced by the URI: 'http://vocab.getty.edu/aat/300014109'. One of the benefit of using URIs is machine disambiguation. I.e. it is clear to a machine where to point users when a record refer to 'paper' according to the Getty AAT definition. Also, a URI can be matched with other words for 'paper' in different languages, thus making records language independent.

EHDS Regulation

EHDS Regulation in a nutshell

Whereas

CHAPTER I

CHAPTER II

CHAPTER III

CHAPTER IV

CHAPTER V

CHAPTER VI

CHAPTER VII

CHAPTER VIII

CHAPTER IX

ANNEXES

HealthDCAT-AP Introduction

Background

Objectives of HealthDCAT-AP

Use case: HealthData@EU - A federated Infrastructure

Roadmap

Methodology

Draft version of HealthDCAT-AP

Mapping to DCAT-AP, HealthDCAT-AP

Scope of HealthDCAT-AP

HealthDCAT-AP and health standards

HealthDCAT-AP vs DCAT-AP, GeoDCAT, ...

HealthDCAT-AP Model

Metadata management

Metadata fit for the purpose of Generative AI

Faceted search

Wikidata as global knowledge hub

Semantic annotation and semantic search

Sample distribution

Analytics

Quality annotation

Purpose for collecting data

Minimum HealthDCAT-AP elements

Tools

HealthDCAT-AP

HealthDCAT-AP editor

Demo catalogue

HealthDCAT-AP validator

Data profiling toolkit

HealthDCAT-AP API

EHDS Dataset categories

Glossary

Related pages