CSV imports

CSV imports and mapping to RDF using SPARQL CONSTRUCT

If you are ready to import some CSV, see our step-by-step tutorial on creating an CSV import.

A data import is a combination of 3 resources:

An uploaded file holding the data to be converted to RDF and imported, such as CSV or RDF file
The CONSTRUCT query that produces RDF
Target container
The container to which converted items will be POSTed, skolemized against, and become its children

The import process runs in the background, i.e. the import item is created before the process completes. Currently the only way to determine when it completes is to refresh the import item and check the import status (completed/failed). Upon successful report, metadata such as the number of imported RDF triples is attached to the import.

The converted RDF is validated against constraints before import. Constraint violations, if any, are attached to the import item.

Import CSV

CSV is a plain-text format for tabular data. CSV import in LinkedDataHub consists of 2 steps:

  1. generic conversion
  2. vocabulary conversion

We provide an running example of CSV data that will be shown as RDF conversion in the following sections:

AE,23.4,53.8,"United Arab Emirates"

Generic conversion

The data table is converted to a graph by treating rows as resources, columns as predicates, and cells as xsd:string literals. The approach is the same as CSV on the Web minimal mode.

@base <https://linkeddatahub.com/demo/city-graph/> .

  <#countryCode> "AD" ;
  <#latitude> "42.5" ;
  <#longitude> "1.6" ;
  <#name> "Andorra" .

  <#countryCode> "AE" ;
  <#latitude> "23.4" ;
  <#longitude> "53.8" ;
  <#name> "United Arab Emirates" .

  <#countryCode> "AF" ;
  <#latitude> "33.9" ;
  <#longitude> "67.7" ;
  <#name> "Afghanistan" .

Vocabulary conversion

This step provides a semantic "lift" for the generic RDF output of the previous step by mapping it to classes and properties from specific vocabularies. It also connects instances in the imported data to the documents in LinkedDataHub's dataset.

The mapping is a user-defined SPARQL CONSTRUCT query which transforms one row at a time. In this case we produce a SKOS concept paired with its item (document) for each country:

PREFIX  nsdd:  <ns/domain/default#>
PREFIX  ns:   <ns#>
PREFIX  apl:  <https://w3id.org/atomgraph/linkeddatahub/domain#>
PREFIX  geo:  <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX  dh:   <https://www.w3.org/ns/ldt/document-hierarchy/domain#>
PREFIX  dct:  <http://purl.org/dc/terms/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  sioc: <http://rdfs.org/sioc/ns#>

    ?item a nsdd:Item ;
        sioc:has_container ?this ;
        dct:title ?name ;
        dh:slug ?countryCode ;
        foaf:primaryTopic ?country .
    ?country a ns:Country ;
        foaf:isPrimaryTopicOf ?item ;
        dct:identifier ?countryCode ;
        geo:lat ?lat ;
        geo:long ?long ;
        dct:title ?name .
    BIND(bnode() AS ?item)
    ?country  <#countryCode>  ?countryCode ;
              <#latitude>     ?latString ;
              <#longitude>    ?longString ;
              <#name>         ?name
    BIND(xsd:float(?latString) AS ?lat)
    BIND(xsd:float(?longString) AS ?long)

These are the rules that hold for mapping queries:

  • BASE value is set to the application's base URI
  • ?this binding is set to the value of the target container
  • produce items (documents) and pair them with topic resources using foaf:primaryTopic/foaf:isPrimaryTopicOf properties
  • useOPTIONAL for optional cell values
  • use BIND() to introduce new values and/or cast literals to the appropriate result datatype or URI

Blank node resources in the output will be skolemized depending on their RDF types.

We are planning to provide a UI-based mapping tool in the future.

The result of our mapping (only the first resource is shown):

_:item a <https://linkeddatahub.com/demo/city-graph/ns/domain/default#Item> ;
    dct:title "Andorra" ;
    dh:slug "AD" ;
    foaf:primaryTopic _:country .

_:country a <https://linkeddatahub.com/demo/city-graph/ns#Country> ;
    foaf:isPrimaryTopicOf _:item ;
    dct:identifier "AD" ;
    geo:lat 42.5 ;
    geo:long 1.6 ;
    dct:title "Andorra" .