All Projects → Callidon → pyHDT

Callidon / pyHDT

Licence: MIT license
Read and query HDT documents with ease in Python

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to pyHDT

hdt-cpp
HDT C++ Library and Tools
Stars: ✭ 94 (+683.33%)
Mutual labels:  rdf, hdt
rdflib-hdt
A Store back-end for rdflib to allow for reading and querying HDT documents
Stars: ✭ 18 (+50%)
Mutual labels:  rdf, hdt
basex-rdf
RDF parsing for BaseX
Stars: ✭ 16 (+33.33%)
Mutual labels:  rdf
SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+983.33%)
Mutual labels:  rdf
tentris
Tentris is a tensor-based RDF triple store with SPARQL support.
Stars: ✭ 34 (+183.33%)
Mutual labels:  rdf
Advisor
A Hearthstone Deck Tracker plugin which tries to guess the opponent's deck while playing and shows it's supposed cards.
Stars: ✭ 16 (+33.33%)
Mutual labels:  hdt
mayktso
🌌 mayktso: encounters at an endpoint
Stars: ✭ 19 (+58.33%)
Mutual labels:  rdf
knowledge-graph-change-language
Tools for working with KGCL
Stars: ✭ 14 (+16.67%)
Mutual labels:  rdf
rio
RDF parsers library
Stars: ✭ 56 (+366.67%)
Mutual labels:  rdf
LD-Connect
LD Connect is a Linked Data portal for IOS Press in collaboration with the STKO Lab at UC Santa Barbara.
Stars: ✭ 0 (-100%)
Mutual labels:  rdf
sparklis
Sparklis is a query builder in natural language that allows people to explore and query SPARQL endpoints with all the power of SPARQL and without any knowledge of SPARQL.
Stars: ✭ 28 (+133.33%)
Mutual labels:  rdf
QuitStore
🖧 Quads in Git - Distributed Version Control for RDF Knowledge Bases
Stars: ✭ 87 (+625%)
Mutual labels:  rdf
ProvToolbox
Java library to create and convert W3C PROV data model representations
Stars: ✭ 62 (+416.67%)
Mutual labels:  rdf
SolRDF
An RDF plugin for Solr
Stars: ✭ 115 (+858.33%)
Mutual labels:  rdf
ControlledVocabularyManager
Rails application with Blazegraph for managing controlled vocabularies in RDF.
Stars: ✭ 20 (+66.67%)
Mutual labels:  rdf
everything
The semantic desktop search engine
Stars: ✭ 22 (+83.33%)
Mutual labels:  rdf
OLGA
an Ontology SDK
Stars: ✭ 36 (+200%)
Mutual labels:  rdf
rdf2smw
Convert RDF to Semantic MediaWiki facts in MediaWiki XML format, with a standalone commandline tool
Stars: ✭ 18 (+50%)
Mutual labels:  rdf
m-ld-js
m-ld Javascript engine
Stars: ✭ 18 (+50%)
Mutual labels:  rdf
pyLODE
An OWL ontology documentation tool using Python and templating, based on LODE
Stars: ✭ 116 (+866.67%)
Mutual labels:  rdf

pyHDT

Build Status Documentation Status PyPI version

pyHDT is joining the RDFlib family as part of the rdflib 6.0 release! The development continues at rdflib-hdt, and this repository is going into archive.

Read and query HDT document with ease in Python

Online Documentation

Requirements

  • Python version 3.6.4 or higher
  • pip
  • gcc/clang with c++11 support
  • Python Development headers

You should have the Python.h header available on your system.
For example, for Python 3.6, install the python3.6-dev package on Debian/Ubuntu systems.

Then, install the pybind11 library

pip install pybind11

Installation

Installation in a virtualenv is strongly advised!

Pip install (recommended)

pip install hdt

Manual installation

git clone https://github.com/Callidon/pyHDT
cd pyHDT/
./install.sh

Getting started

from hdt import HDTDocument

 # Load an HDT file.
 # Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")

# Display some metadata about the HDT document itself
print("nb triples: %i" % document.total_triples)
print("nb subjects: %i" % document.nb_subjects)
print("nb predicates: %i" % document.nb_predicates)
print("nb objects: %i" % document.nb_objects)
print("nb shared subject-object: %i" % document.nb_shared)

# Fetch all triples that matches { ?s ?p ?o }
# Use empty strings ("") to indicates variables
triples, cardinality = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)

# Search also support limit and offset
triples, cardinality = document.search_triples("", "", "", limit=10, offset=100)
# etc ...

Handling non UTF-8 strings in python

If the HDT document has been encoded with a non UTF-8 encoding the previous code won't work correctly and will result in a UnicodeDecodeError. More details on how to convert string to str from c++ to python here

To handle this we doubled the API of the HDT document by adding:

  • search_triples_bytes(...) return an iterator of triples as (py::bytes, py::bytes, py::bytes)
  • search_join_bytes(...) return an iterator of sets of solutions mapping as py::set(py::bytes, py::bytes)
  • convert_tripleid_bytes(...) return a triple as: (py::bytes, py::bytes, py::bytes)
  • convert_id_bytes(...) return a py::bytes

Parameters and documentation are the same as the standard version

from hdt import HDTDocument

 # Load an HDT file.
 # Missing indexes are generated automatically, add False as the second argument to disable them
document = HDTDocument("test.hdt")
it = document.search_triple_bytes("", "", "")

for s, p, o in it:
  print(s, p, o) # print b'...', b'...', b'...'
  # now decode it, or handle any error
  try:
    s, p, o = s.decode('UTF-8'), p.decode('UTF-8'), o.decode('UTF-8')
  except UnicodeDecodeError as err:
    # try another other codecs
    pass
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].