CSV imports

CSV imports and mapping to RDF using SPARQL CONSTRUCT

CSV is a plain-text format for tabular data.

A CSV import is a combination of multiple resources:

File
The CSV file to be mapped to RDF and imported
Mapping query
A user-defined CONSTRUCT query that produces RDF

CSV import in LinkedDataHub consists of 2 steps:

  1. generic conversion creates an intermediary, generic CSV/RDF representation for each CSV row
  2. vocabulary conversion maps the CSV/RDF to the final RDF representation using the mapping query

The import process runs in the background, i.e. the import item is created before the process completes. Currently the only way to determine when it completes is to refresh the import item and check the import status (completed/failed). Upon successful report, metadata such as the number of imported RDF triples is attached to the import.

The mapping is done one row at a time, with each row resulting in a new created document, which should attach to the document hierarchy. The documents have to be URI resources. The server will automatically assign URIs for the documents constructed in the default graph. Alternatively, it is possible to explicitly specify the document graph using a GRAPH block in the CONSTRUCT template (which is a Jena-specific extension of SPARQL 1.1).

The resulting RDF data is validated against constraints in the process. Constraint violations, if any, are attached to the import item.

We provide an running example of CSV data that will be shown as RDF conversion in the following sections:

countryCode,latitude,longitude,name
AD,42.5,1.6,Andorra
AE,23.4,53.8,"United Arab Emirates"
AF,33.9,67.7,Afghanistan

Generic conversion

The data table is converted to a graph by treating rows as resources, columns as predicates, and cells as xsd:string literals. The approach is the same as CSV on the Web minimal mode.

@base <https://localhost:4443/> .

_:8228a149-8efe-448d-b15f-8abf92e7bd17
<#countryCode> "AD" ;
<#latitude> "42.5" ;
<#longitude> "1.6" ;
<#name> "Andorra" .

_:ec59dcfc-872a-4144-822b-9ad5e2c6149c
<#countryCode> "AE" ;
<#latitude> "23.4" ;
<#longitude> "53.8" ;
<#name> "United Arab Emirates" .

_:e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9
<#countryCode> "AF" ;
<#latitude> "33.9" ;
<#longitude> "67.7" ;
<#name> "Afghanistan" .

Vocabulary conversion

This step provides a semantic "lift" for the generic RDF output of the previous step by mapping it to classes and properties from specific vocabularies. It also connects instances in the imported data to the documents in LinkedDataHub's dataset.

These are the rules that hold for mapping queries:

  • BASE value is automatically set to the imported file's URI. Do not add an explicit BASE to the query.
  • $base binding is set to the value of the application's baseURI
  • useOPTIONAL for optional cell values
  • use BIND() to introduce new values and/or cast literals to the appropriate result datatype or URI
  • when building document URIs, use natural IDs from the input data (or UUIDs if there are no IDs) and remember to URI-encode them using encode_for_uri
  • use a GRAPH block in the constructor template to construct triples for a specific document
  • construct dh:Container instances to create new container documents or dh:Item instances to create new item documents. dct:title values are mandatory for documents.
  • if you're constructing non-information resource (e.g. thing, concept) descriptions, assign them URIs with fragment identified (e.g. #this) and pair them with item documents using the foaf:primaryTopic property

We are planning to provide a UI-based mapping tool in the future.

Example

In this example we produce a SKOS concept paired with its item (document) for each country:

PREFIX  geo:  <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX  dh:   <https://www.w3.org/ns/ldt/document-hierarchy#>
PREFIX  dct:  <http://purl.org/dc/terms/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  sioc: <http://rdfs.org/sioc/ns#>

CONSTRUCT
  {
    ?item a dh:Item ;
        sioc:has_container ?container ;
        dct:title ?name ;
        dh:slug ?countryCode ;
        foaf:primaryTopic ?country .
    ?country a <http://dbpedia.org/ontology/Country> ;
        dct:identifier ?countryCode ;
        geo:lat ?lat ;
        geo:long ?long ;
        dct:title ?name .
  }
WHERE
  { 
    BIND(bnode() AS ?item)
    BIND (uri(concat(str($base), "countries/")) AS ?container)

    ?country  <#countryCode>  ?countryCode ;
              <#latitude>     ?latString ;
              <#longitude>    ?longString ;
              <#name>         ?name .

    BIND(xsd:float(?latString) AS ?lat)
    BIND(xsd:float(?longString) AS ?long)
  }

When the import is complete, you should be able to see the imported documents as children of the ${base}countries/ container.

The result of our mapping:

PREFIX  geo:  <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX  dh:   <https://www.w3.org/ns/ldt/document-hierarchy#>
PREFIX  dct:  <http://purl.org/dc/terms/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  sioc: <http://rdfs.org/sioc/ns#>

<https://localhost:4443/countries/AD/> a dh:Item ;
    sioc:has_container <https://localhost:4443/countries/> ;
    dct:title "Andorra" ;
    dh:slug "AD" ;
    foaf:primaryTopic <https://localhost:4443/countries/AD/#id459bdd90-a309-49f9-92b2-1b9b5d110471> .

<https://localhost:4443/countries/AD/#id459bdd90-a309-49f9-92b2-1b9b5d110471> a <http://dbpedia.org/ontology/Country> ;
    dct:identifier "AD" ;
    geo:lat 42.5 ;
    geo:long 1.6 ;
    dct:title "Andorra" .

<https://localhost:4443/countries/AE/> a dh:Item ;
    sioc:has_container <https://localhost:4443/countries/> ;
    dct:title "United Arab Emirates" ;
    dh:slug "AE" ;
    foaf:primaryTopic <https://localhost:4443/countries/AE/#id7ad9b80b-8fbf-4696-92fa-61facf6c2066> .

<https://localhost:4443/countries/AE/#id7ad9b80b-8fbf-4696-92fa-61facf6c2066> a <http://dbpedia.org/ontology/Country> ;
    dct:identifier "AE" ;
    geo:lat 23.4 ;
    geo:long 53.8 ;
    dct:title "United Arab Emirates" .

<https://localhost:4443/countries/AF/> a dh:Item ;
    sioc:has_container <https://localhost:4443/countries/> ;
    dct:title "Afghanistan" ;
    dh:slug "AF" ;
    foaf:primaryTopic <https://localhost:4443/countries/AF/#id5de2fd91-158a-47d8-a302-d1af205fe59f> .

<https://localhost:4443/countries/AF/#id5de2fd91-158a-47d8-a302-d1af205fe59f> a <http://dbpedia.org/ontology/Country> ;
    dct:identifier "AF" ;
    geo:lat 33.9 ;
    geo:long 67.7 ;
    dct:title "Afghanistan" .

If you are ready to import some CSV, see our step-by-step tutorial on creating an CSV import.