CSV imports
CSV imports and mapping to RDF using SPARQL CONSTRUCT
If you are ready to import some CSV, see our step-by-step tutorial on creating an CSV import.
A data import is a combination of 3 resources:
- File
- An uploaded file holding the data to be converted to RDF and imported, such as CSV or RDF file
- Mapping
- The
CONSTRUCT
query that produces RDF - Target container
- The container to which converted items will be POSTed, skolemized against, and become its children
The import process runs in the background, i.e. the import item is created before the process completes. Currently the only way to determine when it completes is to refresh the import item and check the import status (completed/failed). Upon successful report, metadata such as the number of imported RDF triples is attached to the import.
The converted RDF is validated against constraints before import. Constraint violations, if any, are attached to the import item.
Import CSV
CSV is a plain-text format for tabular data. CSV import in LinkedDataHub consists of 2 steps:
We provide an running example of CSV data that will be shown as RDF conversion in the following sections:
countryCode,latitude,longitude,name AD,42.5,1.6,Andorra AE,23.4,53.8,"United Arab Emirates" AF,33.9,67.7,Afghanistan
Generic conversion
The data table is converted to a graph by treating rows as resources, columns as predicates,
and
cells as xsd:string
literals. The approach is the same as CSV on the Web
minimal mode.
@base <https://linkeddatahub.com/demo/city-graph/> . _:8228a149-8efe-448d-b15f-8abf92e7bd17 <#countryCode> "AD" ; <#latitude> "42.5" ; <#longitude> "1.6" ; <#name> "Andorra" . _:ec59dcfc-872a-4144-822b-9ad5e2c6149c <#countryCode> "AE" ; <#latitude> "23.4" ; <#longitude> "53.8" ; <#name> "United Arab Emirates" . _:e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9 <#countryCode> "AF" ; <#latitude> "33.9" ; <#longitude> "67.7" ; <#name> "Afghanistan" .
Vocabulary conversion
This step provides a semantic "lift" for the generic RDF output of the previous step by mapping it to classes and properties from specific vocabularies. It also connects instances in the imported data to the documents in LinkedDataHub's dataset.
The mapping is a user-defined SPARQL
CONSTRUCT
query which transforms one row at a time. In this case we produce a SKOS concept paired with its item (document) for each
country:
PREFIX nsdd: <ns/domain/default#>
PREFIX ns: <ns#>
PREFIX apl: <https://w3id.org/atomgraph/linkeddatahub/domain#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dh: <https://www.w3.org/ns/ldt/document-hierarchy/domain#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX sioc: <http://rdfs.org/sioc/ns#>
CONSTRUCT
{
?item a nsdd:Item ;
sioc:has_container ?this ;
dct:title ?name ;
dh:slug ?countryCode ;
foaf:primaryTopic ?country .
?country a ns:Country ;
foaf:isPrimaryTopicOf ?item ;
dct:identifier ?countryCode ;
geo:lat ?lat ;
geo:long ?long ;
dct:title ?name .
}
WHERE
{
BIND(bnode() AS ?item)
?country <#countryCode> ?countryCode ;
<#latitude> ?latString ;
<#longitude> ?longString ;
<#name> ?name
BIND(xsd:float(?latString) AS ?lat)
BIND(xsd:float(?longString) AS ?long)
}
These are the rules that hold for mapping queries:
BASE
value is set to the application's base URI?this
binding is set to the value of the target container- produce items (documents) and pair them with topic resources using
foaf:primaryTopic
/foaf:isPrimaryTopicOf
properties - use
OPTIONAL
for optional cell values - use
BIND()
to introduce new values and/or cast literals to the appropriate result datatype or URI
Blank node resources in the output will be skolemized depending on their RDF types.
We are planning to provide a UI-based mapping tool in the future.
The result of our mapping (only the first resource is shown):
_:item a <https://linkeddatahub.com/demo/city-graph/ns/domain/default#Item> ; dct:title "Andorra" ; dh:slug "AD" ; foaf:primaryTopic _:country . _:country a <https://linkeddatahub.com/demo/city-graph/ns#Country> ; foaf:isPrimaryTopicOf _:item ; dct:identifier "AD" ; geo:lat 42.5 ; geo:long 1.6 ; dct:title "Andorra" .