All Projects → SmartDataAnalytics → RdfProcessingToolkit

SmartDataAnalytics / RdfProcessingToolkit

Licence: other
Command line interface based RDF processing toolkit to run sequences of SPARQL statements ad-hoc on RDF datasets, streams of bindings and streams of named graphs with support for processing JSON, CSV and XML using function extensions

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to RdfProcessingToolkit

cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (+63.16%)
Mutual labels:  rdf, jena
ont-api
ONT-API (OWL-API over Apache Jena)
Stars: ✭ 20 (+5.26%)
Mutual labels:  rdf, jena
rdf-delta
A system to propagate changes between RDF Datasets
Stars: ✭ 44 (+131.58%)
Mutual labels:  rdf, jena
sparql-proxy
SPARQL-proxy: provides cache, job control, and logging for any SPARQL endpoint
Stars: ✭ 26 (+36.84%)
Mutual labels:  rdf
amazon-neptune-csv-to-rdf-converter
Amazon Neptune CSV to RDF Converter is a tool for Amazon Neptune that converts property graphs stored as comma separated values into RDF graphs.
Stars: ✭ 27 (+42.11%)
Mutual labels:  rdf
bioportal web ui
A Rails application for biological ontologies
Stars: ✭ 20 (+5.26%)
Mutual labels:  rdf
shex
ShEx language issues, including new features for e.g. ShEx2.1
Stars: ✭ 24 (+26.32%)
Mutual labels:  rdf
ont-api
ONT-API (OWL-API over Apache Jena)
Stars: ✭ 31 (+63.16%)
Mutual labels:  jena
visualisation-lab
An experimental visualisation workbench built using Svelte
Stars: ✭ 17 (-10.53%)
Mutual labels:  rdf
calamus
A JSON-LD Serialization Libary for Python
Stars: ✭ 21 (+10.53%)
Mutual labels:  rdf
fibo
The Financial Industry Business Ontology (FIBO) defines the sets of things that are of interest in financial business applications and the ways that those things can relate to one another. In this way, FIBO can give meaning to any data (e.g., spreadsheets, relational databases, XML documents) that describe the business of finance.
Stars: ✭ 204 (+973.68%)
Mutual labels:  rdf
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+305.26%)
Mutual labels:  rdf
synbiohub
Web application enabling users and software to browse, upload, and share synthetic biology designs
Stars: ✭ 56 (+194.74%)
Mutual labels:  rdf
RDForm
Create and edit RDF data in a HTML form
Stars: ✭ 16 (-15.79%)
Mutual labels:  rdf
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+2036.84%)
Mutual labels:  rdf
pyHDT
Read and query HDT documents with ease in Python
Stars: ✭ 12 (-36.84%)
Mutual labels:  rdf
matcha
🍵 SPARQL-like DSL for querying in memory Linked Data Models
Stars: ✭ 18 (-5.26%)
Mutual labels:  rdf
Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Stars: ✭ 54 (+184.21%)
Mutual labels:  rdf
ontobio
python library for working with ontologies and ontology associations
Stars: ✭ 104 (+447.37%)
Mutual labels:  rdf
corese
Software platform implementing and extending the standards of the Semantic Web.
Stars: ✭ 55 (+189.47%)
Mutual labels:  rdf

RDF Processing Toolkit

RDF/SPARQL Workflows on the Command Line made easy. The toolkit provides the following commands for running SPARQL-queries on triple and quad based data

  • sparql-integrate: Ad-hoc querying and transformation of datasets featuring SPARQL-extensions for CSV, XML and JSON processing and JSON output that allows for building bash pipes in a breeze
  • ngs: Processor for named graph streams (ngs) which enables processing for collections of named graphs in streaming fashion. Process huge datasets without running into memory issues.
  • sbs: Processor for SPARQL binding streams (sbs) which enables processing of SPARQL result sets in streaming fashion. Most prominently for use in aggregating the output of a ngs map operation.

Check this documentation for the supported SPARQL extensions with many examples

Example Usage

  • sparql-integrate allows one to load multiple RDF files and run multiple queries on them in a single invocation. Further prefixes from a snapshot of prefix.cc are predefined and we made the SELECT keyword of SPARQL optional in order to make scripting less verbose. The --jq flag enables JSON output for interoperability with the conventional jq tool
sparql-integrate loadFile.rdf update.sparql loadAnotherFile.rdf query.sparql

sparql-integrate --jq file.ttl '?s { ?s a foaf:Person }' | jq '.[].s'
  • ngs is your well known bash tooling such as head, tail, wc adapted to named graphs instead of lines of text
# Group RDF into graph based on consecutive subjects and for each named graph count the number of triples
cat file.ttl | ngs subjects | ngs map --sparql 'CONSTRUCT { ?s eg:triples ?c} { SELECT ?s COUNT(*) { ?s ?p ?o } GROUP ?s }

# Count number of named graphs
ngs wc file.trig

# Output the first 3 graphs produced by another command
./produce-graphs.sh | ngs head -n 3

Example Use Cases

  • Lodservatory implements SPARQL endpoint monitoring uses these tools in this script called from this git action.
  • Linked Sparql Queries provides tools to RDFize SPARQL query logs and run benchmark on the resulting RDF. The triples related to a query represent an instance of a sophisticated domain model and are grouped in a named graph. Depending on the input size one can end up with millions of named graphs describing queries amounting to billions of triples. With ngs one can easily extract complete samples of the queries' models without a related triple being left behind.

Building

The build requires maven.

mvn clean install

The all-in-one jar is built in the rdf-processing-toolkit-bundle folder, which is also the jar file available in the Releases Section.

java -cp rdf-processing-toolkit-bundle/target/rdf-processing-toolkit-bundle-VERSION-jar-with-dependencies.jar rpt

Installing the Debian packages can be easily accomplished using:

sudo dpkg -i $(find . -name "rdf-processing-toolkit*.deb")

The bare-metal approach is to manually start the tool from the 'rdf-processing-toolkit-cli/target` folder using:

java -cp ".:lib/*" "-Dloader.main=org.aksw.rdf_processing_toolkit.cli.main.MainCliRdfProcessingToolkit" "org.springframework.boot.loader.PropertiesLauncher" "your" "args"

License

The source code of this repo is published under the Apache License Version 2.0. Dependencies may be licensed under different terms. When in doubt please refer to the licenses of the dependencies declared in the pom.xml files.

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].