All Projects → AtomGraph → CSV2RDF

AtomGraph / CSV2RDF

Licence: Apache-2.0 license
Streaming, transforming, SPARQL-based CSV to RDF converter. Apache license.

Programming Languages

java
68154 projects - #9 most used programming language
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to CSV2RDF

LinkedDataHub
The Knowledge Graph notebook. Apache license.
Stars: ✭ 150 (+212.5%)
Mutual labels:  linked-data, sparql, rdf, semantic-web, knowledge-graph
Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Stars: ✭ 54 (+12.5%)
Mutual labels:  linked-data, sparql, rdf, semantic-web, knowledge-graph
LD-Connect
LD Connect is a Linked Data portal for IOS Press in collaboration with the STKO Lab at UC Santa Barbara.
Stars: ✭ 0 (-100%)
Mutual labels:  linked-data, sparql, rdf, semantic-web, knowledge-graph
Semanticmediawiki
🔗 Semantic MediaWiki turns MediaWiki into a knowledge management platform with query and export capabilities
Stars: ✭ 359 (+647.92%)
Mutual labels:  linked-data, sparql, rdf, semantic-web, knowledge-graph
sparql-micro-service
SPARQL micro-services: A lightweight approach to query Web APIs with SPARQL
Stars: ✭ 22 (-54.17%)
Mutual labels:  linked-data, sparql, rdf, semantic-web
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+745.83%)
Mutual labels:  sparql, rdf, semantic-web, knowledge-graph
awesome-ontology
A curated list of ontology things
Stars: ✭ 73 (+52.08%)
Mutual labels:  linked-data, rdf, semantic-web, knowledge-graph
Web Client
Generic Linked Data browser and UX component framework. Apache license.
Stars: ✭ 105 (+118.75%)
Mutual labels:  linked-data, rdf, semantic-web, knowledge-graph
Hypergraphql
GraphQL interface for querying and serving linked data on the Web.
Stars: ✭ 112 (+133.33%)
Mutual labels:  linked-data, sparql, rdf, semantic-web
Rdf4j
Eclipse RDF4J: scalable RDF for Java
Stars: ✭ 242 (+404.17%)
Mutual labels:  linked-data, sparql, rdf, semantic-web
Nspm
🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.
Stars: ✭ 156 (+225%)
Mutual labels:  linked-data, sparql, rdf, knowledge-graph
OLGA
an Ontology SDK
Stars: ✭ 36 (-25%)
Mutual labels:  sparql, rdf, semantic-web, knowledge-graph
Hypergraphql
GraphQL interface for querying and serving linked data on the Web.
Stars: ✭ 120 (+150%)
Mutual labels:  linked-data, sparql, rdf, semantic-web
everything
The semantic desktop search engine
Stars: ✭ 22 (-54.17%)
Mutual labels:  sparql, rdf, semantic-web, knowledge-graph
YALC
🕸 YALC: Yet Another LOD Cloud (registry of Linked Open Datasets).
Stars: ✭ 14 (-70.83%)
Mutual labels:  linked-data, rdf, semantic-web, open-data
matcha
🍵 SPARQL-like DSL for querying in memory Linked Data Models
Stars: ✭ 18 (-62.5%)
Mutual labels:  linked-data, sparql, rdf
corese
Software platform implementing and extending the standards of the Semantic Web.
Stars: ✭ 55 (+14.58%)
Mutual labels:  sparql, rdf, semantic-web
semagrow
A SPARQL query federator of heterogeneous data sources
Stars: ✭ 27 (-43.75%)
Mutual labels:  linked-data, sparql, rdf
twinql
A graph query language for the semantic web
Stars: ✭ 17 (-64.58%)
Mutual labels:  linked-data, rdf, semantic-web
ont-api
ONT-API (OWL-API over Apache Jena)
Stars: ✭ 20 (-58.33%)
Mutual labels:  sparql, rdf, semantic-web

CSV2RDF

Streaming, transforming CSV to RDF converter

Reads CSV/TSV data as generic CSV/RDF, transforms each row using SPARQL CONSTRUCT or DESCRIBE, and streams the output triples. The generic CSV/RDF format is based on the minimal mode of Generating RDF from Tabular Data on the Web.

Such transformation-based approach enables:

  • building resource URIs on the fly
  • fixing/remapping datatypes
  • mapping different groups of values to different RDF structures

CSV2RDF differs from tarql in the way how mapping queries use graph patterns in the WHERE clause. tarql queries operate on a table of bindings (provided as an implicit VALUES block) in which CSV column names become variable names. CSV2RDF generates an intermediary RDF graph for each CSV row (using column names as relative-URI properties) that the WHERE patterns explicitly match against.

Build

mvn clean install

That should produce an executable JAR file target/csv2rdf-2.0.0-jar-with-dependencies.jar in which dependency libraries will be included.

Usage

The CSV data is read from stdin, the resulting RDF data is written to stdout.

CSV2RDF is available as a .jar as well as a Docker image atomgraph/csv2rdf (recommended).

Parameters:

  • query-file - a text file with SPARQL 1.1 CONSTRUCT query string
  • base - the base URI for the data (also becomes the BASE URI of the SPARQL query). Property namespace is constructed by adding # to the base URI.

Options:

  • -d, --delimiter - value delimiter character, by default ,.
  • --max-chars-per-column - max characters per column value, by default 4096
  • --input-charset - CSV input encoding, by default UTF-8
  • --output-charset - RDF output encoding, by default UTF-8

Note that delimiters might have a special meaning in shell. Therefore, always enclose them in single quotes, e.g. ';' when executing CSV2RDF from shell.

If you want to retrieve the raw CSV/RDF output, use the identity transform query CONSTRUCT WHERE { ?s ?p ?o }.

Example

CSV data in parking-facilities.csv:

postDistrict,roadCode,houseNumber,name,FID,long,lat,address,postcode,parkingSpace,owner,parkingType,information
1304 København K,24,5,Adelgade 5 p_hus.0,p_hus.0,12.58228733,55.68268042,Adelgade 5,1304,92,Privat,P-Kælder,"Adelgade 5-7, Q-park."

CONSTRUCT query in parking-facilities.rq:

PREFIX schema:     <https://schema.org/> 
PREFIX geo:        <http://www.w3.org/2003/01/geo/wgs84_pos#> 
PREFIX xsd:        <http://www.w3.org/2001/XMLSchema#> 
PREFIX rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT
{
    ?parking a schema:ParkingFacility ;
        geo:lat ?lat ;
        geo:long ?long ;
        schema:name ?name ;
        schema:streetAddress ?address ;
        schema:postalCode ?postcode ;
        schema:maximumAttendeeCapacity ?spaces ;
        schema:additionalProperty ?parkingType ;
        schema:comment ?information ;
        schema:identifier ?id .
}
WHERE
{
    ?parkingRow <#FID> ?id ;
        <#name> ?name ;
        <#address> ?address ;
        <#lat> ?lat_string ;
        <#postcode> ?postcode ;
        <#parkingSpace> ?spaces_string ;
        <#parkingType> ?parkingType ;
        <#information> ?information ;
        <#long> ?long_string . 

    BIND(URI(CONCAT(STR(<>), ?id)) AS ?parking) # building URI from base URI and ID
    BIND(xsd:integer(?spaces_string) AS ?spaces)
    BIND(xsd:float(?lat_string) AS ?lat)
    BIND(xsd:float(?long_string) AS ?long)
}

Java execution from shell:

cat parking-facilities.csv | java -jar csv2rdf-2.0.0-jar-with-dependencies.jar parking-facilities.rq https://localhost/ > parking-facilities.ttl

Alternatively, Docker execution from shell:

cat parking-facilities.csv | docker run -i -a stdin -a stdout -a stderr -v "$(pwd)/parking-facilities.rq":/tmp/parking-facilities.rq atomgraph/csv2rdf /tmp/parking-facilities.rq https://localhost/ > parking-facilities.ttl

Note that using Docker you need to:

  • bind stdin/stdout/stderr streams
  • mount the query file to the container, and use the filepath from within the container as query-file

Output in parking-facilities.ttl:

<https://localhost/p_hus.0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/ParkingFacility> .
<https://localhost/p_hus.0> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "12.58228733"^^<http://www.w3.org/2001/XMLSchema#float> .
<https://localhost/p_hus.0> <https://schema.org/identifier> "p_hus.0" .
<https://localhost/p_hus.0> <https://schema.org/additionalProperty> "P-Kælder" .
<https://localhost/p_hus.0> <https://schema.org/comment> "Adelgade 5-7, Q-park." .
<https://localhost/p_hus.0> <https://schema.org/postalCode> "1304" .
<https://localhost/p_hus.0> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "55.68268042"^^<http://www.w3.org/2001/XMLSchema#float> .
<https://localhost/p_hus.0> <https://schema.org/streetAddress> "Adelgade 5" .
<https://localhost/p_hus.0> <https://schema.org/name> "Adelgade 5 p_hus.0" .
<https://localhost/p_hus.0> <https://schema.org/maximumAttendeeCapacity> "92"^^<http://www.w3.org/2001/XMLSchema#integer> .

Query examples

More mapping query examples can be found under LinkedDataHub's city-graph demo app.

Performance

Largest dataset tested so far: 2.8 GB / 3709725 rows of CSV to 21.7 GB / 151348939 triples in under 27 minutes. Hardware: x64 Windows 10 PC with Intel Core i5-7200U 2.5 GHz CPU and 16 GB RAM.

Dependencies

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].