A semantic framework defines meanings and relationships between the domain concepts. Each concept is identified with a persistent unique identifier. Two datasets sharing the same concept are easily identified because their semantic annotation is consistent across the entire data space. |
The common abstractions of a semantic framework are: |
Entities represent objects, concepts, or things within a domain. Each entity is a distinct object that can be uniquely identified. Attributes are properties or characteristics of entities. They provide more information about an entity. Relationships describe how entities are connected or related to one another. They help to define the interactions between different entities.
Generalisation – "is_a" relations Classes (or types) are used to categorise entities into groups based on shared characteristics. Each instance of a class is an individual entity. Hierarchies and taxonomies are used to organise entities into parent-child relationships, creating a structured classification system. Ontologies are comprehensive frameworks that define the entities, attributes, relationships, and rules within a specific domain. They provide a formal representation of knowledge. Identifiers are unique values assigned to entities to distinguish them from one another. |
There are 2 options to implement an effective harmonised semantic framework for the EU Health Data Space:
Option 1: Implement the semantic framework from scratch using Linked Data technologies For example, Wikidata and Wikibase are the tools SEMIC has chosen to enable collaboration in creating semantic data models. Tools: https://joinup.ec.europa.eu/collection/semic-support-centre/wikidata-and-wikibase Example of domain semantic framework: https://linkedopendata.eu/wiki/The_EU_Knowledge_Graph (Resources provided by Andrea Perego) Option 2: Use an existing operational semantic framework to speed up the implementation phase of the data space The proposed solution for the European Health Data Space and in use in HealthDCAT-AP is to rely on [Wikidata.org] as large-scale, human-readable, machine-readable, multilingual, multidisciplinary, collaborative, centralised, editable, structured, and linked knowledge-base. The Scientific Publication 'Wikidata: A large-scale collaborative ontological medical database' provides rationales about the advantages offered by Wikidata.org as 'large-scale semantic framework' and 'valuable medical resource' for the EHDS. Additionally, the paper "Wikidata as a knowledge graph for the life sciences" provides further insights into Wikidata's suitability for the EHDS. |
'The lack of an integrated and structured version of biomedical knowledge hinders efficient querying or mining of that information, thus preventing the full utilisation of our accumulated scientific knowledge.'
'Interoperable: Wikidata items are extensively cross-linked to other biomedical resources using Universal Resource Identifiers (URIs), which unambiguously anchor these concepts in the Linked Open Data cloud (Jacobsen et al., 2018). Wikidata is also available in many standard formats in computer programming and knowledge management, including JSON, XML, and RDF.' |
GEONAMES: an external semantic resource used in DCAT-AP |
Geonames.org is a geographical database that integrates various geographical data sources accessible through various web services, under a Creative Commons attribution 4.0 licence and has a community-driven model. The database contains over 25 million geographical names and consists of 11 million unique features. These features include information such as location names, coordinates, and additional metadata. GeoNames identifiers can be used as persistent URIs in DCAT-AP dct:spatial property to interlink geographical data with other datasets, enhancing the semantic richness of data integration. Comment: HealthDCAT-AP has introduced several new properties, and DCAT-AP also includes properties where it is beneficial to use Wikidata as a large authority vocabulary. This is because standardising the values of these properties as HTTP URIs is essential, and ensuring that these values come from an authoritative vocabulary enhances consistency:
|