It leverages natural language processing (NLP), machine learning (such as Large Language Models), and ontologies to interpret the context, intent, and relationships between words in a query. Unlike traditional keyword-based search, semantic search algorithms consider the broader context to better interpret user intent.

For example, Elasticsearch or OpenSearch are advanced search tools that go beyond basic keyword searches. They power semantic search, which understands the intent behind a query instead of just matching keywords. By leveraging Natural Language Processing (NLP) and machine learning techniques, like Large Language Models (LLMs), ElasticSearch enhances how datasets are searched. For example, it recognises that 'myocardial infarction' and 'heart attack' are related terms, providing users with more accurate and relevant search results

For non-technical users, this means they can use more natural language queries, such as 'What are the main causes of heart disease in Europe?' and receive contextually appropriate results. Technical users, on the other hand, will benefit from vector-based search, where terms are represented as numerical vectors, making it easier to capture relationships between different datasets.

Recognising the significance of semantic search as a major evolution in search technology, HealthDCAT-AP has introduced new free-text properties and RDF concepts to further extend the DCAT knowledge graph:

- Free-text Properties in HealthDCAT-AP: Properties such as dct:description, healthdcatap:populationcoverage (what), dpv:hasPurpose (why), dct:provenance (how), and healthdcatap:publishernote (who) are designed to enhance NLP/LLM capabilities.
Figure 3: Screenshot of a Proof Of Concept genAI application integrated in a dataset catalogue developed by the Belgian Health Data Agency.
- Enhancing semantic search through ontology integration with the extended HealthDCAT-AP knowledge graph, enriched with additional ontologies like the Data Privacy Vocabulary, healthdcatap:healthTheme and healthdcatap:hasCodeValues (based on SKOS), leverages Linked Data principles to create a more interconnected and meaningful search experience. By utilising the SPARQL Protocol and RDF Query Language (SPARQL), users can perform precise and targeted queries that explore the rich relationships and contextual relevance embedded within the graph. This integration enables users to discover information that is both semantically enriched and contextually relevant, maximising the power of Linked Data to connect and interpret disparate data sources.

This approach is exemplified in Rajaram Kaliyaperumal's presentation, "Metadata and FAIR Data Point," particularly in the slides titled "Why storing metadata in RDF is better?" In this presentation, two use cases are demonstrated, both relying on Linked Data principles to showcase the advantages of RDF-based metadata storage. These use cases highlight the potential of linked data infrastructures, illustrating how semantic web technologies can significantly enhance data discoverability and interoperability.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX ejp: <
http://purl.org/ejp-rd/vocabulary/>
PREFIX dcterm: <
http://purl.org/dc/terms/>
PREFIX fdo: <
http://rdf.biosemantics.org/ontologies/fdp-o#>
PREFIX rdfs: <
http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <
http://www.w3.org/2004/02/skos/core#>
PREFIX wdt: <
http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?resource ?id ?title ?description ?homepage (STR(?disease_name) AS ?disease) ?country_name ?country_code WHERE {

 VALUES ?disease_iri {<
http://www.orpha.net/ORDO/Orphanet_98056>}

 # GET all the sub diseases of our input disease class
 ?ordo_class_iri rdfs:subClassOf* ?disease_iri;
 rdfs:label ?disease_name.

 SERVICE <
http://178.63.49.197:7300/repositories/ordo-catalog-fdp> {
 ?resource a ?type ;
 dcat:theme ?ordo_class_iri;
 dcterm:description ?description;
 dcterm:title ?title;
 dcterm:publisher [dcterm:spatial ?publisher_location];
 dcat:landingPage ?homepage;
 fdo:metadataIdentifier [dcterm:identifier ?id].

 ?publisher_location skos:relatedMatch ?wiki_data_uri. ?wiki_data_uri rdfs:label ?country_name;
 wdt:P297 ?country_code.
 }
}

Example of the SPARQL query 1 (Source and copyright 2022: Rajaram Kaliyaperumal - Semantic Web expert with software developer background). Based on the Linked Data principles, the query associates remote resources such as a DCAT catalogue (FAIR DATA POINT), the Orphanet ORDO ontology, wikidata.org to retrieve information on available resources.
Source: Metadata and FAIR DATA POINT (Rajaram Kaliyaperumal (Biosemantics group, Leiden))