Welcome to the Metadata Editor. This tool is designed to create and edit metadata records compliant with Health DCAT-AP metadata standard.
The editor is integrated with a Fair Data Point (FDP) Metadata Catalogue, which stores all metadata. No data is saved within the editor itself.
Procedure
tbd.
tbd.
HealthDCAT-AP Metadata Standard
The HealthDCAT-AP is designed as an extension to the DCAT-AP, incorporating its principal classes such as dcat:Catalog, dcat:CatalogRecord, dcat:Dataset, dcat:Distribution, and dcat:DataService. DCAT stands for Data Catalog Vocabulary. It's a standard used to describe data available in online catalogs, making it easier for people to find and understand data sets. Imagine a library catalog, but instead of books, it's for datasets. DCAT is fundamentally based on the use of Uniform Resource Identifiers (URIs). This foundation is crucial because it leverages the innate power and structure of the internet to organize and identify datasets. This application profile addresses the customisation of the DCAT data model for health catalogued resources within the scope of the proposal for a Regulation of the European Parliament and of the Council on the European Health Data Space [EUR-Lex - 52022PC0197].
Here's why using URIs is advantageous:
Unique Identification: Every dataset has a unique URI, ensuring clear reference, like a library's call number.
Standardization: URIs provide a consistent format, simplifying data sharing and discovery.
Linkability: URIs allow datasets to connect with metadata and related resources, creating a web of references.
Centralized Updates: Changes made once via a URI are visible to all users, ensuring up-to-date information.
Accessibility: URIs make data accessible globally, enabling wider collaboration.
Integration: Datasets with URIs can easily integrate into web applications and services.
Persistence: Well-designed URIs remain stable over time, ensuring long-term data access.
Push Metadata
This feature enables you to transfer your record to the Metadata Catalogue.
Procedure
tbd.
tbd.
Backup RDF
This feature allows you to create a local backup of your filling progress.
Procedure
tbd.
tbd.
Import RDF
This feature enables you to import a local backup of your filling progress or an external DCAT RDF Turtle file. If the RDF was not created with this editor, it should be compliant with DCAT-AP 3.0. Parsing errors may occur, for example, if a property is linked to free text instead of a URI, if the Dataset is a blank node, or if it relies on an outdated version of DCAT. Only recognized properties will be loaded into the editor.
The import function will only incorporate properties of the Dataset, Distributions, Samples, and Analytics classes.
Procedure
tbd.
tbd.
Validate Your Metadata
Publish Your Metadata
Use this menu to publish your metadata. Enter your credentials to finalize the submission and add the record to the catalogue.
If you do not already have credentials, please contact EHDS2@sciensano.be
If you don’t have credentials, please enter your email. Your metadata will be published as a draft and won't be immediately visible on the catalogue page.
You are about to duplicate this record. Please select the new catalogue from the list below to continue.
Advanced Mode
You are about to create a backup of this metadata record. This ensures that all changes are saved and can be restored if necessary.
Status
Next ?
Dataset Discovery
A set of metadata properties that describe essential attributes of the dataset, facilitating its intelligibility, relevance, and usability for a variety of purposes. These properties provide insights into the content, scope, context, etc.
Definition: A name given to the dataset.
Usage: This property can be repeated for parallel language versions of the name.
Example: Linking of registers for COVID-19 vaccine surveillance
Definition: Alternative title of the dataset such as an acronym.
Example: LINK-VACC
Definition: A keyword or tag describing the dataset.
Example: Corona virus
Definition: A free-text account of the dataset.
Usage: This property can be repeated for parallel language versions of the description.
Example: The LINK-VACC project links selected variables from existing registries for COVID-19 vaccine surveillance, in order to ensure the monitoring of COVID- 19 vaccines in the phase following their marketing authorization (post-authorization surveillance). This includes the measurement of uptake and coverage of the vaccination, the estimation of vaccine effectiveness, and continuous monitoring of the vaccine's safety. For these purposes, existing pseudonymized data on COVID-19 laboratory test results, hospitalized COVID-19 patients, COVID-19 vaccinations, underlying health problems, socio-demographic and -economic factors, and healthcare worker status are linked.
Definition: A statement about the lineage of a dataset.
Usage: Information about how the data was collected, including methodologies, tools, and protocols used.
Example: The data for the LINK-VACC project is sourced from several existing databases, including Vaccinnet+, HealthData COVID-19 database (Contact tracing and Clinic database), CoBRHA, STATBEL, and the AIM database. These databases collectively provide comprehensive demographic, clinical, and socio-economic data relevant to the project's objectives.
Definition A free text statement of the purpose of the processing of data or personal data.
Example The primary objective of Sciensano's LINK-VACC project is to monitor COVID-19 vaccines post-authorization and evaluate the public health value of prioritizing vaccination for people with comorbidities. This involves assessing the vaccines' effectiveness and safety in the broader population context, beyond the limited scope of clinical trials, and determining future vaccination policies in public health emergencies such as epidemics or pandemics.
Release date definition: The date of formal issuance (e.g.: publication) of the dataset.
Modification date definition: The most recent date on which the Dataset was changed or modified.
Release Date
Modification Date
Definition: A definition of the population within the dataset.
Example: The population targeted by the LINK-VACC project comprises all individuals in Belgium who have received a COVID-19 vaccine, undergone testing for COVID-19, or have been hospitalized with a confirmed diagnosis of COVID-19. The project also considers healthcare professionals and the general Belgian population for understanding vaccination coverage and effectiveness, especially among those with comorbidities and varying socio-economic backgrounds.
Definition: A geographic region that is covered by the dataset.
Example: Belgium
Select
Definition A temporal period that the dataset covers.
Start Date
End Date
Definition: A language of the dataset.
Usage: This property can be repeated if there are multiple languages in the dataset.
Example: English.
Select
Definition A temporal period for which the dataset is available for secondary use.
Start Date
End Date
Definition: The frequency at which the dataset is updated.
Select
Definition: The minimum spatial separation resolvable in a dataset, measured in meters.
Example: 10.3
Number
Definition: The minimum time period resolvable in the dataset.
Years
Months
Days
Hours
Minutes
Seconds
Contacts
All forms for submitting contact details related to the dataset.
To ensure accuracy, consistency, and ease of reference, it is preferred that all contact information be provided as URIs (Global identifiers that can be dereferenced (accessed via HTTP) to retrieve RDF data) from an authoritative Register.
Definition: An entity (organisation) responsible for making the dataset available.
Usage: In addition to the Publisher information, the Publisher Type (Definition: A type of organisation that makes the dataset available) must be provided as well as a Publisher Note (Definition: A description of the publisher activities).
Example of Publisher Note: Sciensano is a research institute and the national public health institute of Belgium. It is a so-called federal scientific institution that operates under the authority of the federal minister of Public Health and the federal minister of Agriculture of Belgium.
Provide Register URI
Name
URL
Mail
URI
Trusted Data Holder
Select
Publisher Type
Select
Publisher Note
Definition: Health Data Access Body supporting access to data in the Member State.
Provide HDAB Register URI
Name
URL
Mail
Select
Select
Definition: An entity responsible for producing the dataset.
Definition: Contact information that can be used for sending comments about the dataset.
Documentation
A collection of metadata properties that provide detailed information about the dataset's documentation, lineage, relationships, and legal context. To enhance accessibility, HTTP URIs are used to reference or resolve these resources, allowing users to directly access relevant information.
Definition: A page or document about this dataset.
Definition: A web page that provides access to the dataset, its distributions and/or additional information.
Example: https://sciensano.service-now.com/sp
URI
Definition: A description of a relationship with another resource.
Example: The LINK-VACC project is related to 5 other exiting projects and each relationship is expressed as a "Qualified relation" providing the landing page (URL) to the project and using a controlled vocabulary to define the nature of the relationship.
Definition: An Agent having some form of responsibility for the resource.
Example: The Belgian Public Health Institute is the "processor" of the LINK-VACC dataset.
Definition: An activity that generated, or provides the business context for, the creation of the dataset.
Example: Data linkage.
Definition: A statement related to quality of the Dataset, including rating, quality certificate, feedback that can be associated to the dataset.
Definition: The version indicator (name or identifier) of a resource.
Text
Definition: A description of the differences between this version and a previous version of the Dataset.
Usage: This property can be repeated for parallel language versions of the version notes.
Definition: A related dataset that is a version, edition, or adaptation of the described dataset.
Definition This property refers to a related dataset of which the described dataset is a version, edition, or adaptation.
URI
Definition A related dataset from which the described dataset is derived.
URI
Definition: A related resource, such as a publication, that references, cites, or otherwise points to the dataset.
URI
Definition: A related resource.
URI
Definition: The legal basis used to justify processing of personal data.
Definition: A secondary identifier of the dataset, such as MAST/ADS17, DataCite18, DOI19, EZID20 or W3ID21.
Definition TBD.
Example TBD.
URI
Categorisation
A set of metadata properties that classify and describe the key characteristics and compliance aspects of the dataset. These properties serve as filters within a data catalogue
Example: Health (mandatory by default, with the option to add other themes)
Select
Definition The legislation that mandates the creation or management of the dataset.
Usage: For health datasets, the value must include the ELI of the EHDS Regulation. Multiple legislations may apply to the dataset.
Example: http://data.europa.eu/eli/reg/2022/868/oj (ELI of the Data Governance Act)
URI
Definition: The health category to which this dataset belongs as described in the Commission Regulation on the European Health Data Space laying down a list of categories of electronic data for secondary use, Art.33.
Select
Definition: A category of the dataset or tag describing the dataset.
Usage: A dataset may be associated with multiple themes. Wikidata HTTP URIs MUST be used.
Example: http://www.wikidata.org/entity/Q84263196
URI
Definition A type of the dataset.
Usage: A dataset may be associated with multiple dataset types like "statistical" and "High Value Dataset".
Example: Personal data (for personal electronic health data)
Select
Definition: Key elements that represent an individual in the dataset.
Usage: The list of key elements representing an individual in the dataset is expected to be comprehensive and complete.
Example: Age Exact, Blood type, Current Employment, ect.
Select
Definition: An implementing rule or other specification for the dataset (e.g., a formal standard).
Usage: Wikidata HTTP URIs MUST be used.
Definition: Coding systems in use (ex: ICD-10-CM, DGRs, SNOMED-CT, ...)
Usage: Wikidata HTTP URIs MUST be used. For each coding system, the most relevant code values used in the dataset should be provided (Specify Code Values).
Example: https://www.wikidata.org/wiki/Q24254958 (Orphanet Rare Disease Ontology) Examples of code values: 26348 [Identifier (HTTP URI): http://www.orpha.net/ORDO/Orphanet_26348] (Acquired prothrombin deficiency), etc.
Definition: The typical minimum and maximum age range of the population represented in the dataset.
Usage: The values provided should indicate the general age coverage and must not reveal any personal information (e.g., use ranges like "0-106").
Mininum age
Number
Maximum age
Number
Definition: Size of the dataset in terms of the number of records.
Usage: An approximate count of the records is expected.
Number
Definition: Number of records for unique individuals.
Usage: An approximate count of the records is expected.
Number
Data Access
A set of metadata properties that describe various ways to access and interact with the dataset:
Dataset Distribution: Describes how the dataset is made accessible.
Dataset Sample: Provides subsets or representative examples of the dataset to facilitate evaluation and understanding.
Dataset Analytics: Offers insights into the dataset by describing the analytical tools, services, or methods available to users for deriving value from the data.
Definition: Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public.
Example:
Public: The dataset is available under general open data rules, such as those covered by the High Value Datasets Implementing Regulation.
Restricted: The dataset contains protected data and is accessible only under specific conditions, as outlined in regulations like the Data Governance Act.
Non-public: The dataset includes resources that may contain sensitive or personal information, falling under regulations such as the EHDS Regulation.
Select
Definition: An available distribution for the dataset.
Usage: For sensitive health datasets (e.g., personal electronic health data), a distribution must include the landing page of the Health Data Access Body supporting data access.
Definition: A sample distribution of the dataset.
Usage: For sensitive data, HealthDCAT-AP requires data holders to provide a sample distribution of the dataset (e.g., mock-up data, anonymized data, synthetic data, etc.) in any computer-readable format (e.g., CSV, JSON). If applicable, a data dictionary should also be published. The data dictionary must be published using CSVW, resulting in an RDF format for the sample distribution. A more complex use case involves merging both requirements by simultaneously producing the dataset sample as tabular data along with its data dictionary using CSVW.
You can import variables into the HealthDCAT-AP editor using a prefilled Excel or CSV template. Select your preferred format below to download the corresponding template. Once the template is filled, upload it to integrate variable descriptions into your metadata record.
Definition: An analytics distribution of the dataset.
Usage: Data holders are encouraged to provide HTTP URIs pointing to API endpoints or document repositories where users can access or request associated resources such as technical reports of the dataset, quality measurements, usability indicators,... or analytics services such as data visualization tools.
Example: http://atlas.ecdc.europa.eu/public/index.aspx (Surveillance Atlas of Infectious Diseases)
A set of metadata properties used primarily for metadata management within a data repository. These properties, often referred to as administrative metadata, support the internal management of datasets and are typically not intended for exchange with external systems or users. Organisations may also add their own properties to further tailor the management of datasets according to their specific needs.
Definition: The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue.
Usage: The use of persistent dereferenceable URIs is mandatory in the HealthDCAT-AP profile (i.e., HTTP URIs)
Definition: A timestamp indicating the last date of revision of the metadata
Usage: The EDHS Regulation states that the health data holder shall, at a minimum, on an annual basis check that its dataset description in the national catalogue is accurate and up to date.
Metadata Update Date
Data Quality
[Under Development] A set of metadata properties to generate a data quality and utility label (certificate) of the related health dataset.