All Projects → bloomberg → pycsvw

bloomberg / pycsvw

Licence: Apache-2.0 license
A tool to read CSV files with CSVW metadata and transform them into other formats.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pycsvw

rdf-parser-csvw
CSV on the Web parser
Stars: ✭ 15 (-53.12%)
Mutual labels:  rdf, csvw
V2
Minimalist and opinionated feed reader
Stars: ✭ 3,239 (+10021.88%)
Mutual labels:  rdf
Owmeta
Unified, simple data access python library for data & facts about C. elegans anatomy
Stars: ✭ 134 (+318.75%)
Mutual labels:  rdf
Comunica
📬 A knowledge graph querying framework for JavaScript
Stars: ✭ 183 (+471.88%)
Mutual labels:  rdf
Kbpedia
KBPedia Knowledge Graph & Knowledge Ontology (KKO)
Stars: ✭ 149 (+365.63%)
Mutual labels:  rdf
Dotnetrdf
dotNetRDF is a powerful and flexible API for working with RDF and SPARQL in .Net environments
Stars: ✭ 199 (+521.88%)
Mutual labels:  rdf
Hypergraphql
GraphQL interface for querying and serving linked data on the Web.
Stars: ✭ 120 (+275%)
Mutual labels:  rdf
LDWizard
A generic framework for simplifying the creation of linked data.
Stars: ✭ 17 (-46.87%)
Mutual labels:  data-transformation
Rdflib Jsonld
JSON-LD parser and serializer plugins for RDFLib (Python 2.6+)
Stars: ✭ 250 (+681.25%)
Mutual labels:  rdf
Grafter
Linked Data & RDF Manufacturing Tools in Clojure
Stars: ✭ 174 (+443.75%)
Mutual labels:  rdf
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+415.63%)
Mutual labels:  rdf
Server.js
A Triple Pattern Fragments server for Node.js
Stars: ✭ 149 (+365.63%)
Mutual labels:  rdf
Rdf4j
Eclipse RDF4J: scalable RDF for Java
Stars: ✭ 242 (+656.25%)
Mutual labels:  rdf
Digitalbuildings
Digital Buildings (ontology and SDK) currently being used by Google internally to manage our own buildings.
Stars: ✭ 139 (+334.38%)
Mutual labels:  rdf
pyfuseki
A library that uses Python to connect and manipulate Jena Fuseki, which provides sync and async methods.
Stars: ✭ 22 (-31.25%)
Mutual labels:  rdf
Akutan
A distributed knowledge graph store
Stars: ✭ 1,616 (+4950%)
Mutual labels:  rdf
Nspm
🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.
Stars: ✭ 156 (+387.5%)
Mutual labels:  rdf
Ontowiki
Semantic data wiki as well as Linked Data publishing engine
Stars: ✭ 192 (+500%)
Mutual labels:  rdf
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+4121.88%)
Mutual labels:  data-transformation
trio
Datatype agnostic triple store & query engine API
Stars: ✭ 78 (+143.75%)
Mutual labels:  rdf

pycsvw

Python implementation of a variant of the W3C CSV on the Web specification, primarily for efficient RDF and JSON generation from a CSV file and its metadata. The supported variant of the recommendation has some additional features, mostly around specifying RDF to be an ordered container, and also some restrictions as listed below.

Features:

  1. Specify a cell to have a rdf:List-valued object. See rdf:List valued objects for a cell for details.
  2. For data types of time, date and dateTime, cell values that are recognized by dateutil parser are accepted.

Restrictions:

  1. CSV metadata can be specified only through a separate JSON file.
  2. Only minimal_mode is supported.
  3. CSV file has to have a single header row.
  4. The attribute "format" is ignored for any data type except "boolean". Value for any cell should be valid XSD value for XSD data types. However, for date, time and dateTime, values recognized by dateutil parser are accepted.

All outputs are generated in UTF-8 encoding.

For implementation details, see details.

Usage

$ pycsvw --help
Usage: pycsvw [OPTIONS]

  Command line interface for pycsvw.

Options:
  --csv-url TEXT        URL of the CSVW
  --csv-path TEXT       System path to the CSVW
  --metadata-url TEXT   URL of the CSVW metadata
  --metadata-path TEXT  System path to the CSVW metadata
  --json-dest TEXT      Destination of the JSON file to generate
  --rdf-dest TEXT...    Pair of format and destination path of RDF e.g.
                        'turtle out.ttl'
  --temp-dir TEXT       Use as the temporary folder for (intermediate) nt
                        serialization
  --riot-path TEXT      The path to the riot command e.g.
                        '/usr/bin/jena/bin/riot'
  --help                Show this message and exit.

Example run

pycsvw --csv-path tests/examples/tree-ops-ext.csv --metadata-path tests/examples/tree-ops-ext.csv-metadata.json --rdf-dest turtle test.ttl

generates a test.ttl containing:

@prefix schema: <http://schema.org/> .
@prefix rr:    <http://www.w3.org/ns/r2rml#> .
@prefix grddl: <http://www.w3.org/2003/g/data-view#> .
@prefix wdr:   <http://www.w3.org/2007/05/powder#> .
@prefix duv:   <https://www.w3.org/TR/vocab-duv#> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix xhv:   <http://www.w3.org/1999/xhtml/vocab#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix dqv:   <http://www.w3.org/ns/dqv#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rif:   <http://www.w3.org/2007/rif#> .
@prefix sd:    <http://www.w3.org/ns/sparql-service-description#> .
@prefix qb:    <http://purl.org/linked-data/cube#> .
@prefix oa:    <http://www.w3.org/ns/oa#> .
@prefix ma:    <http://www.w3.org/ns/ma-ont#> .
@prefix xml:   <http://www.w3.org/XML/1998/namespace> .
@prefix og:    <http://ogp.me/ns#> .
@prefix rdfa:  <http://www.w3.org/ns/rdfa#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dcat:  <http://www.w3.org/ns/dcat#> .
@prefix wrds:  <http://www.w3.org/2007/05/powder-s#> .
@prefix prov:  <http://www.w3.org/ns/prov#> .
@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix csvw:  <http://www.w3.org/ns/csvw#> .
@prefix sioc:  <http://rdfs.org/sioc/ns#> .
@prefix dctypes: <http://purl.org/dc/dcmitype/> .
@prefix cc:    <http://creativecommons.org/ns#> .
@prefix rev:   <http://purl.org/stuff/rev#> .
@prefix void:  <http://rdfs.org/ns/void#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix org:   <http://www.w3.org/ns/org#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix gr:    <http://purl.org/goodrelations/v1#> .
@prefix dc11:  <http://purl.org/dc/elements/1.1/> .
@prefix as:    <https://www.w3.org/ns/activitystreams#> .
@prefix ical:  <http://www.w3.org/2002/12/cal/icaltzd#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix v:     <http://rdf.data-vocabulary.org/#> .
@prefix ldp:   <http://www.w3.org/ns/ldp#> .
@prefix ctag:  <http://commontag.org/ns#> .
@prefix dc:    <http://purl.org/dc/terms/> .

<http://example.org/tree-ops-ext#gid-6>
        <http://example.org/tree-ops-ext.csv#comments>
                " included bark" , "cavity or decay" , " trunk decay" , " root decay" , " codominant leaders" , " large leader or limb decay" , "  beware of BEES" , " previous failure root damage" ;
        <http://example.org/tree-ops-ext.csv#dbh>
                29 ;
        <http://example.org/tree-ops-ext.csv#inventory_date>
                "2010-06-01"^^xsd:date ;
        <http://example.org/tree-ops-ext.csv#kml>
                "<Point><coordinates>-122.156299,37.441151</coordinates></Point>"^^rdf:XMLLiteral ;
        <http://example.org/tree-ops-ext.csv#on_street>
                "ADDISON AV" ;
        <http://example.org/tree-ops-ext.csv#protected>
                true ;
        <http://example.org/tree-ops-ext.csv#species>
                "Robinia pseudoacacia" ;
        <http://example.org/tree-ops-ext.csv#trim_cycle>
                "Large Tree Routine Prune"@en .

<http://example.org/tree-ops-ext#gid-2>
        <http://example.org/tree-ops-ext.csv#dbh>
                11 ;
        <http://example.org/tree-ops-ext.csv#inventory_date>
                "2010-06-02"^^xsd:date ;
        <http://example.org/tree-ops-ext.csv#kml>
                "<Point><coordinates>-122.156749,37.440958</coordinates></Point>"^^rdf:XMLLiteral ;
        <http://example.org/tree-ops-ext.csv#on_street>
                "EMERSON ST" ;
        <http://example.org/tree-ops-ext.csv#protected>
                false ;
        <http://example.org/tree-ops-ext.csv#species>
                "Liquidambar styraciflua" ;
        <http://example.org/tree-ops-ext.csv#trim_cycle>
                "Large Tree Routine Prune"@en .

<http://example.org/tree-ops-ext#gid-1>
        <http://example.org/tree-ops-ext.csv#dbh>
                11 ;
        <http://example.org/tree-ops-ext.csv#inventory_date>
                "2010-10-18"^^xsd:date ;
        <http://example.org/tree-ops-ext.csv#kml>
                "<Point><coordinates>-122.156485,37.440963</coordinates></Point>"^^rdf:XMLLiteral ;
        <http://example.org/tree-ops-ext.csv#on_street>
                "ADDISON AV" ;
        <http://example.org/tree-ops-ext.csv#protected>
                false ;
        <http://example.org/tree-ops-ext.csv#species>
                "Celtis australis" ;
        <http://example.org/tree-ops-ext.csv#trim_cycle>
                "Large Tree Routine Prune"@en .
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].