CSV imports
CSV imports and mapping to RDF using SPARQL CONSTRUCT
CSV is a plain-text format for tabular data.
A CSV import is a combination of multiple resources:
- File
- The CSV file to be mapped to RDF and imported
- Mapping query
- A user-defined
CONSTRUCT
query that produces RDF
CSV import in LinkedDataHub consists of 2 steps:
- generic conversion creates an intermediary, generic CSV/RDF representation for each CSV row
- vocabulary conversion maps the CSV/RDF to the final RDF representation using the mapping query
The import process runs in the background, i.e. the import item is created before the process completes. Currently the only way to determine when it completes is to refresh the import item and check the import status (completed/failed). Upon successful report, metadata such as the number of imported RDF triples is attached to the import.
The mapping is done one row at a time, with each row resulting in a new created document, which should attach to the document hierarchy. The documents have to be URI resources. The server will automatically assign URIs
for the documents constructed in the default graph. Alternatively, it is possible
to explicitly specify the document graph using a GRAPH
block in the CONSTRUCT
template (which is a
Jena-specific extension of SPARQL 1.1).
The resulting RDF data is validated against constraints in the process. Constraint violations, if any, are attached to the import item.
We provide an running example of CSV data that will be shown as RDF conversion in the following sections:
countryCode,latitude,longitude,name AD,42.5,1.6,Andorra AE,23.4,53.8,"United Arab Emirates" AF,33.9,67.7,Afghanistan
Generic conversion
The data table is converted to a graph by treating rows as resources, columns as predicates,
and
cells as xsd:string
literals. The approach is the same as CSV on the Web
minimal mode.
@base <https://localhost:4443/> . _:8228a149-8efe-448d-b15f-8abf92e7bd17 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:ec59dcfc-872a-4144-822b-9ad5e2c6149c <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" .
Vocabulary conversion
This step provides a semantic "lift" for the generic RDF output of the previous step by mapping it to classes and properties from specific vocabularies. It also connects instances in the imported data to the documents in LinkedDataHub's dataset.
These are the rules that hold for mapping queries:
BASE
value is automatically set to the imported file's URI. Do not add an explicitBASE
to the query.$base
binding is set to the value of the application's baseURI- use
OPTIONAL
for optional cell values - use
BIND()
to introduce new values and/or cast literals to the appropriate result datatype or URI - when building document URIs, use natural IDs from the input data (or UUIDs if there
are no IDs) and remember to URI-encode them using
encode_for_uri
- use a
GRAPH
block in the constructor template to construct triples for a specific document - construct dh:Container instances to create new container documents or dh:Item instances to create new item documents.
dct:title
values are mandatory for documents. - if you're constructing non-information resource (e.g. thing, concept) descriptions,
assign them URIs with fragment identified (e.g. #this) and pair them with item documents using the
foaf:primaryTopic
property
We are planning to provide a UI-based mapping tool in the future.
Example
In this example we produce a SKOS concept paired with its item (document) for each country:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dh: <https://www.w3.org/ns/ldt/document-hierarchy#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX sioc: <http://rdfs.org/sioc/ns#>
CONSTRUCT
{
?item a dh:Item ;
sioc:has_container ?container ;
dct:title ?name ;
dh:slug ?countryCode ;
foaf:primaryTopic ?country .
?country a <http://dbpedia.org/ontology/Country> ;
dct:identifier ?countryCode ;
geo:lat ?lat ;
geo:long ?long ;
dct:title ?name .
}
WHERE
{
BIND(bnode() AS ?item)
BIND (uri(concat(str($base), "countries/")) AS ?container)
?country <#countryCode> ?countryCode ;
<#latitude> ?latString ;
<#longitude> ?longString ;
<#name> ?name .
BIND(xsd:float(?latString) AS ?lat)
BIND(xsd:float(?longString) AS ?long)
}
When the import is complete, you should be able to see the imported documents as children of the ${base}countries/ container.
The result of our mapping:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX dh: <https://www.w3.org/ns/ldt/document-hierarchy#> PREFIX dct: <http://purl.org/dc/terms/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX sioc: <http://rdfs.org/sioc/ns#> <https://localhost:4443/countries/AD/> a dh:Item ; sioc:has_container <https://localhost:4443/countries/> ; dct:title "Andorra" ; dh:slug "AD" ; foaf:primaryTopic <https://localhost:4443/countries/AD/#id459bdd90-a309-49f9-92b2-1b9b5d110471> . <https://localhost:4443/countries/AD/#id459bdd90-a309-49f9-92b2-1b9b5d110471> a <http://dbpedia.org/ontology/Country> ; dct:identifier "AD" ; geo:lat 42.5 ; geo:long 1.6 ; dct:title "Andorra" . <https://localhost:4443/countries/AE/> a dh:Item ; sioc:has_container <https://localhost:4443/countries/> ; dct:title "United Arab Emirates" ; dh:slug "AE" ; foaf:primaryTopic <https://localhost:4443/countries/AE/#id7ad9b80b-8fbf-4696-92fa-61facf6c2066> . <https://localhost:4443/countries/AE/#id7ad9b80b-8fbf-4696-92fa-61facf6c2066> a <http://dbpedia.org/ontology/Country> ; dct:identifier "AE" ; geo:lat 23.4 ; geo:long 53.8 ; dct:title "United Arab Emirates" . <https://localhost:4443/countries/AF/> a dh:Item ; sioc:has_container <https://localhost:4443/countries/> ; dct:title "Afghanistan" ; dh:slug "AF" ; foaf:primaryTopic <https://localhost:4443/countries/AF/#id5de2fd91-158a-47d8-a302-d1af205fe59f> . <https://localhost:4443/countries/AF/#id5de2fd91-158a-47d8-a302-d1af205fe59f> a <http://dbpedia.org/ontology/Country> ; dct:identifier "AF" ; geo:lat 33.9 ; geo:long 67.7 ; dct:title "Afghanistan" .
If you are ready to import some CSV, see our step-by-step tutorial on creating an CSV import.