Reflections on implementing Linked Open Data on participating Nordic spatial infrastructure projects (Bergen)

Oktober 2022

Øyvind Liland Gjesdal and Peder Gammeltoft


It is always interesting to try and put action behind one’s words. At the recent Nordic Spatial Humanities workshop in Bergen, we had set the goal to try and implement a common Linked Open Data (LOD) ontology for all the participating projects: Icelandic Saga Map, Mapping Saints, Norse World, Norwegian place-names, and Swedish place-name register. It was with some trepidation that the Bergen team, Henrik Askjer and the blog’s authors, Øyvind and Peder, set this goal. As you can read in the previous post, some of the participating projects were LOD-compatible, whereas others were struggling with practical implementations and theoretical concepts.

The challenge was to get people into LOD ‘thinking’, at the same time as being able to align data to the same ontology and data model. We understood, from what we saw at the previous workshop and collected from the discussions there, that the participating institutions’ data were both complex and quite heterogeneous, with ‘spatiality’ as the one combining element between them all.


Our chosen model for the workshop was to try to map our data to CIDOC-CRM (CIDOC CRM Special Interest Group 2022) using the profile defined in the Linked Art Data Model. “The Linked Art Data Model is an application profile that can be used to describe cultural heritage resources, with a focus on artworks and museum-oriented activities. It defines common patterns and terms to ensure that the resulting data can be easily used and is based on real-world data and use cases.” (Linked Art Community 2022a) These common patterns describe and show examples of using the CIDOC-CRM concept model in practice, combined with using Getty thesaurus vocabulary to describe types. For our purposes, we focused mainly on implementing the Places component (Linked Art Community 2022b). In addition, Getty Thesaurus vocabulary is now also available using the linked.art profile (Getty 2022).


We also implemented the existing place types as SKOS concepts and used them in addition to, or instead of, the Getty vocabularies. The documentation from the Linked art community has examples and illustrations of the model we could use when we went through the OpenRefine process for writing Resource Description Framework (RDF) output.


The first step we had to take was to find which entities were present in our data and how we would like to name them and give them URLs. To keep our URLs persistent, we used OpenRefine’s ability to call out to Python (Jython) to create a UUID version 3 based on a given seed. For example, the log from our Chronicles RDF export shows this example from our Norse World OpenRefine project:


Create new column work_uuid based on column Work by filling 2687 rows with jython:import uuid return str(uuid.uuid3(uuid.NAMESPACE_DNS, "http://norseworld.uu.se/work/id" + value.encode('utf-8')))


This example shows that we create a new column in OpenRefine. It is based on the value of the Work column, [1] which is unique per work, but we add a generated UUID (with a Norse World URL used a prefix) to the value. The result is a unique repeatable UUID that we can use to create the entity Work for the Norse World datasets. For Norse World, we created these columns for multiple entities, namely location id, attestation id, work id, and locality id. For more information about the entities, see https://www.uu.se/en/research/infrastructure/norseworld/infrastructure/data-and-metadata.


Once we had populated all URLs we wanted to use for entities in OpenRefine, we started the RDF-transformation using the RDF extension (Atescomp 2022). We added some new namespaces (crm: and skos:) and used the vocabulary import utility to import the vocabularies which are downloadable for CRM (version 7.1.1 is the latest version with downloadable ontology) and SKOS. The RDF extension then gave us simple access to our model in combination with the Linked art documentation. In addition, OpenRefine offers a preview window for having a fast feedback cycle on our mappings.











Figure 1. RDF transform.


























Figure 2. RDF preview.


Some modeling differed between different datasets, and we solved this by looking at and implementing other linked.art components for some datasets. For example, the Mapping Saints dataset implemented patterns from the Object component (Linked Art Community 2022c) and could also have been used for the Person component models.


On the last day of the workshop, we published our results into an Apache Jena Fuseki endpoint (Apache Sofware foundation 2022). We then wrote some example SPARQL queries for querying the datasets we had created during the workshop. When our queries did not give us federated results across the datasets we expected, we found further mapping inconsistencies that we promptly corrected and republished, giving us our desired outcomes across the datasets.









Figure 3. SPARQL Query.


Further, we discussed that it would be helpful to point to the same things across the datasets. Most of our datasets use internal vocabularies that could map to common ones, i.e., place types in the TGN thesauri and Wikidata/TGN for places. It would also be beneficial to expand our experiment to the complete datasets, to republish them in an endpoint for querying, and possibly use it in a front-end application like Sampo-UI (Ikkala et al. 2021).


We are thankful to all the participants for all the enthusiasm and work done during the workshop. We're happy that we could work through examples from dataset to modeling to publishing and querying diverse data. After the workshop some of the participants have worked on their own datasets and the Icelandic Saga Map already offers a proof-of-concept API offering JSON-LD of some of its entities over REST-API, based on the model from our workshop.


[1] Work in Norse World stands for «a text preserved in one or more sources that the data are collected from». For more information, see 'Work and related metadata'.


Acknowledgements

The computations were performed on the Norwegian Research and Education Cloud (NREC), using resources provided by the University of Bergen and the University of Oslo. Available at https://www.nrec.no.


References

Atescomp, 2022. AtesComp/rdf-transform. Available at https://github.com/AtesComp/rdf-transform


CIDOC CRM Special Interest Group, 2022. CIDOC-CRM Available at https://www.cidoc-crm.org/


Apache Software Foundation, 2022. Apache Jena, Available at: https://jena.apache.org/


Linked Art Community, 2022a. Model. Available at https://linked.art/model/


Linked Art Community, 2022b. Places. Available at https://linked.art/model/place/


Linked Art Community, 2022c. Object production and destruction. Available at Object Production and Destruction (linked.art)


Code For Science and Society, 2022. OpenRefine. Available at https://openrefine.org/


W3C. 2009. SKOS Simple Knowledge Organization System Reference. Available at https://www.w3.org/TR/skos-reference/


Ikkala, Esko & Hyvönen, Eero & Rantala, Heikki & Koho, Mikko. (2021). Sampo-UI: A full stack JavaScript framework for developing semantic portal user interfaces. Semantic Web. 13. 1-16. 10.3233/SW-210428.


Getty, 2022. Getty vocabularies. Available at http://vocab.getty.edu/