Linked Data#
Linked data is structured data that can be queried in semantic way in combination with a broader body of existing data. In effect, it means that the data can be read and understood autonomously by computers by virtue of attached metadata with specific properties. Several web technologies facilitate the development of linked-data, which when made available online can be regarded as part of the semantic web
. The primary technologies are:
the Hypertext Transfer Protocol (HTTP)
Uniform Resource Identifiers (URIs) or the more general Internalized Resource Identifier (IRI)
the Resource Desrciption Framework (RDF)
The RDF is a data model for metadata. It allows metadata to be expressed via a directed graph, with each edge expressed as a ‘triple’. The elements of the triple are the subject node (that the edge leads from), the predicate joining two nodes and the object node that the edge enters. Each of the parts can be identified by a URI, or an object can be a literal. The RDF graph can be serialized, with the Terse RDF Triple Language (turtle) being common, with XML and JSON-LD being used also. If it needs to be combined with HTML then HTML RDFA can be used. Other associated technologies are SPARQL
which is a query language for RDF graphs and RDF Schema (RDFS) and the Web Ontology Language (OWL) which allow semantic descriptions of RDF data. A triplestore
is a type of database for holding RDF graphs as triples. The Shapes Constraint Language (SHACL) is used to describe and validate RDF graphs.
Ontologies in the context of linked data are way to describe classes of objects (individuals), via nouns, and relations (property assertions) between the objects, via verbs. They are constructed in such a way to allow autonomous formal (propositional) logical reasoning based on the relations they describe. That is, new information can be generated about an object or relationship through inference based on links descibed by an ontology. The OWL allows for authoring of ontologies.
Vocabularies and Ontologies#
Vocabularies can be used in RDF graphs, such as through RDFS or OWL. Persistent URLs (PURLs) are often used to host vocabularies, so that they can continue to be resolved even if their hosting address changes. purl.org and similar are often seen in vocabulary and ontology hosting.
Dublin Core: a general purpose metadata vocabulary for describing resources. See also RFC 2413 and
http://purl.org/dc/elements/1.1/
.Friend of a Friend (FOAF): Ontology describing persons and their relationship with other people and objects. Used in WebId.
VCard: Description of people and organisations
Org: Description of organisations
Provenance, Authoring and Versioning: IRI
http://purl.org/pav/
SPDX: allows exchange of data about software packages
Web Annotation Ontology
Simple Knowledge Organisation System (SKOS)
Data and Datasets#
VoID: RDFS vocabulary for expressing metadata about RDF datasets
Data Catalog Vocabulary (DCAT): Interoperability for data catalogs published on the web, widely used in the EU public sector. W3 Resource and namespace
http://www.w3.org/ns/dcat#
.RDF Data Cube Vocabulary: Vocabulary for describing data cubes - for multidimensional data
DCAT-AP: DCAT Application Profile for data portals in Europe.
StatDCAT-AP: DCAT-AP extension for statistical data sets in data portals.
Metadata Schema for the Description of Research Data Repositories
Data Quality Vocabulary
Dataset Usage Vocabulary
Collections:
schema.org: Vocabs include internet media (Book, Movie etc), Person, Organization
EuroVoc: EU Vocabularies
Software#
Python:
PyLADAPI: Python Linked Data API - plugin for Flask or FastAPI providing linked data functions.
Java:
Apache Jena: See also Fuseki. An RDF triplestore.
Research Objects#
https://wf4ever.github.io/ro/2016-01-28/ https://www.researchobject.org/specifications/bundle/
Scientific Applications#
The Infrastructure for Spatial Information in Europe (INSPIRE) is a set of resources for spatial data information sharing in Europe. It includes a set of directives to make sure that published data is interoperable. Some INSPIRE resources:
Spatial data service type: https://inspire.ec.europa.eu/metadata-codelist/SpatialDataServiceType/
Degrees of Conformity: https://inspire.ec.europa.eu/metadata-codelist/DegreeOfConformity/
EU Open Data Portal: Merged into data.europa.eu
Further Reading#
Statistical Data and Metadata Exchange (SDMX): Community to enhance exchange of statistical data and metadata. Includes also the Content Oriented Guidelines (COG)
SchemaVer: Semantic schema versioning