All Projects → RDFLib → rdflib-hdt

RDFLib / rdflib-hdt

Licence: MIT license
A Store back-end for rdflib to allow for reading and querying HDT documents

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to rdflib-hdt

cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (+72.22%)
Mutual labels:  sparql, rdf
ontobio
python library for working with ontologies and ontology associations
Stars: ✭ 104 (+477.78%)
Mutual labels:  sparql, rdf
pyHDT
Read and query HDT documents with ease in Python
Stars: ✭ 12 (-33.33%)
Mutual labels:  rdf, hdt
graph-explorer
Graph Explorer can be used to explore RDF graphs in SPARQL endpoints or on the web.
Stars: ✭ 30 (+66.67%)
Mutual labels:  sparql, rdf
semicon
A collection of icons for the Semantic Web and Linked Open Data world.
Stars: ✭ 20 (+11.11%)
Mutual labels:  rdf, store
semagrow
A SPARQL query federator of heterogeneous data sources
Stars: ✭ 27 (+50%)
Mutual labels:  sparql, rdf
amazon-neptune-csv-to-rdf-converter
Amazon Neptune CSV to RDF Converter is a tool for Amazon Neptune that converts property graphs stored as comma separated values into RDF graphs.
Stars: ✭ 27 (+50%)
Mutual labels:  sparql, rdf
skos-play
SKOS-Play allows to print SKOS files in HTML or PDF. It also embeds xls2rdf to generate RDF from Excel.
Stars: ✭ 58 (+222.22%)
Mutual labels:  sparql, rdf
matcha
🍵 SPARQL-like DSL for querying in memory Linked Data Models
Stars: ✭ 18 (+0%)
Mutual labels:  sparql, rdf
corese
Software platform implementing and extending the standards of the Semantic Web.
Stars: ✭ 55 (+205.56%)
Mutual labels:  sparql, rdf
joinup-dev
The Joinup project moved to https://git.fpfis.eu/ec-europa/digit-joinup-reference
Stars: ✭ 41 (+127.78%)
Mutual labels:  sparql, rdf
viziquer
Tool for Search in Structured Semantic Data
Stars: ✭ 12 (-33.33%)
Mutual labels:  sparql, rdf
everything
The semantic desktop search engine
Stars: ✭ 22 (+22.22%)
Mutual labels:  sparql, rdf
mobi
Mobi is a decentralized, federated, and distributed graph data platform for teams and communities to publish and discover data, data models, and analytics that are instantly consumable.
Stars: ✭ 41 (+127.78%)
Mutual labels:  sparql, rdf
sparklis
Sparklis is a query builder in natural language that allows people to explore and query SPARQL endpoints with all the power of SPARQL and without any knowledge of SPARQL.
Stars: ✭ 28 (+55.56%)
Mutual labels:  sparql, rdf
sparql-proxy
SPARQL-proxy: provides cache, job control, and logging for any SPARQL endpoint
Stars: ✭ 26 (+44.44%)
Mutual labels:  sparql, rdf
tentris
Tentris is a tensor-based RDF triple store with SPARQL support.
Stars: ✭ 34 (+88.89%)
Mutual labels:  sparql, rdf
SolRDF
An RDF plugin for Solr
Stars: ✭ 115 (+538.89%)
Mutual labels:  sparql, rdf
Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Stars: ✭ 54 (+200%)
Mutual labels:  sparql, rdf
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+2155.56%)
Mutual labels:  sparql, rdf

rdflib-hdt

Python tests PyPI version

A Store back-end for rdflib to allow for reading and querying HDT documents.

Online Documentation

Requirements

  • Python version 3.6.4 or higher
  • pip
  • gcc/clang with c++11 support
  • Python Development headers

You should have the Python.h header available on your system.
For example, for Python 3.6, install the python3.6-dev package on Debian/Ubuntu systems.

Installation

Installation using pipenv or a virtualenv is strongly advised!

PyPi installation (recommended)

# you can install using pip
pip install rdflib-hdt

# or you can use pipenv
pipenv install rdflib-hdt

Manual installation

Requirement: pipenv

git clone https://github.com/Callidon/pyHDT
cd pyHDT/
./install.sh

Getting started

You can use the rdflib-hdt library in two modes: as an rdflib Graph or as a raw HDT document.

Graph usage (recommended)

from rdflib import Graph
from rdflib_hdt import HDTStore
from rdflib.namespace import FOAF

# Load an HDT file. Missing indexes are generated automatically
# You can provide the index file by putting it in the same directory as the HDT file.
store = HDTStore("test.hdt")

# Display some metadata about the HDT document itself
print(f"Number of RDF triples: {len(store)}")
print(f"Number of subjects: {store.nb_subjects}")
print(f"Number of predicates: {store.nb_predicates}")
print(f"Number of objects: {store.nb_objects}")
print(f"Number of shared subject-object: {store.nb_shared}")

# Create an RDFlib Graph with the HDT document as a backend
graph = Graph(store=store)

# Fetch all triples that matches { ?s foaf:name ?o }
# Use None to indicates variables
for s, p, o in graph.triples((None, FOAF("name"), None)):
  print(triple)

Using the RDFlib API, you can also execute SPARQL queries over an HDT document. If you do so, we recommend that you first call the optimize_sparql function, which optimize the RDFlib SPARQL query engine in the context of HDT documents.

from rdflib import Graph
from rdflib_hdt import HDTStore, optimize_sparql

# Calling this function optimizes the RDFlib SPARQL engine for HDT documents
optimize_sparql()

graph = Graph(store=HDTStore("test.hdt"))

# You can execute SPARQL queries using the regular RDFlib API
qres = graph.query("""
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?name ?friend WHERE {
    ?a foaf:knows ?b.
    ?a foaf:name ?name.
    ?b foaf:name ?friend.
  }""")

for row in qres:
  print(f"{row.name} knows {row.friend}")

HDT Document usage

from rdflib_hdt import HDTDocument
from rdflib.namespace import FOAF

# Load an HDT file. Missing indexes are generated automatically.
# You can provide the index file by putting it in the same directory as the HDT file.
document = HDTDocument("test.hdt")

# Display some metadata about the HDT document itself
print(f"Number of RDF triples: {document.total_triples}")
print(f"Number of subjects: {document.nb_subjects}")
print(f"Number of predicates: {document.nb_predicates}")
print(f"Number of objects: {document.nb_objects}")
print(f"Number of shared subject-object: {document.nb_shared}")

# Fetch all triples that matches { ?s foaf:name ?o }
# Use None to indicates variables
triples, cardinality = document.search((None, FOAF("name"), None))

print(f"Cardinality of (?s foaf:name ?o): {cardinality}")
for s, p, o in triples:
  print(triple)

# The search also support limit and offset
triples, cardinality = document.search((None, FOAF("name"), None), limit=10, offset=100)
# etc ...

An HDT document also provides support for evaluating joins over a set of triples patterns.

from rdflib_hdt import HDTDocument
from rdflib import Variable
from rdflib.namespace import FOAF, RDF

document = HDTDocument("test.hdt")

# find the names of two entities that know each other
tp_a = (Variable("a"), FOAF("knows"), Variable("b"))
tp_b = (Variable("a"), FOAF("name"), Variable("name"))
tp_c = (Variable("b"), FOAF("name"), Variable("friend"))
query = set([tp_a, tp_b, tp_c])

iterator = document.search_join(query)
print(f"Estimated join cardinality: {len(iterator)}")

# Join results are produced as ResultRow, like in the RDFlib SPARQL API
for row in iterator:
  print(f"{row.name} knows {row.friend}")

Handling non UTF-8 strings in python

If the HDT document has been encoded with a non UTF-8 encoding the previous code won't work correctly and will result in a UnicodeDecodeError. More details on how to convert string to str from C++ to Python here

To handle this, we doubled the API of the HDT document by adding:

  • search_triples_bytes(...) return an iterator of triples as (py::bytes, py::bytes, py::bytes)
  • search_join_bytes(...) return an iterator of sets of solutions mapping as py::set(py::bytes, py::bytes)
  • convert_tripleid_bytes(...) return a triple as: (py::bytes, py::bytes, py::bytes)
  • convert_id_bytes(...) return a py::bytes

Parameters and documentation are the same as the standard version

from rdflib_hdt import HDTDocument

document = HDTDocument("test.hdt")
it = document.search_triple_bytes("", "", "")

for s, p, o in it:
  print(s, p, o) # print b'...', b'...', b'...'
  # now decode it, or handle any error
  try:
    s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')
  except UnicodeDecodeError as err:
    # try another other codecs, ignore error, etc
    pass
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].