All Projects → o19s → skipchunk

o19s / skipchunk

Licence: MIT license
Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr

Programming Languages

python
139335 projects - #7 most used programming language
XSLT
1337 projects
javascript
184084 projects - #8 most used programming language
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to skipchunk

gakg
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.
Stars: ✭ 21 (+16.67%)
Mutual labels:  knowledge-graph
searchhub
Fusion demo app searching open-source project data from the Apache Software Foundation
Stars: ✭ 42 (+133.33%)
Mutual labels:  solr
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+2155.56%)
Mutual labels:  knowledge-graph
CaRE
EMNLP 2019: CaRe: Open Knowledge Graph Embeddings
Stars: ✭ 34 (+88.89%)
Mutual labels:  knowledge-graph
yang-db
YANGDB Open-source, Scalable, Non-native Graph database (Powered by Elasticsearch)
Stars: ✭ 92 (+411.11%)
Mutual labels:  knowledge-graph
news-graph
Key information extraction from text and graph visualization
Stars: ✭ 83 (+361.11%)
Mutual labels:  knowledge-graph
kg-reeval
ACL 2020: A Re-evaluation of Knowledge Graph Completion Methods
Stars: ✭ 117 (+550%)
Mutual labels:  knowledge-graph
vacomall
☀️☀️ 基于 dubbo 实现的分布式电商平台。
Stars: ✭ 42 (+133.33%)
Mutual labels:  solr
amie plus
AMIE+ association rule mining
Stars: ✭ 24 (+33.33%)
Mutual labels:  knowledge-graph
neno
NENO is a note-taking app that helps you create your personal knowledge graph.
Stars: ✭ 65 (+261.11%)
Mutual labels:  knowledge-graph
yasa
Yet Another Solr Admin
Stars: ✭ 48 (+166.67%)
Mutual labels:  solr
multi-select-facet
An example of multi-select facet with Solr, Vue and Go
Stars: ✭ 30 (+66.67%)
Mutual labels:  solr
ltr-tools
Set of command line tools for Learning To Rank
Stars: ✭ 13 (-27.78%)
Mutual labels:  solr
django-solr
Solr Search Engine ORM for Django
Stars: ✭ 24 (+33.33%)
Mutual labels:  solr
solr-zkutil
Solr Cloud and ZooKeeper CLI
Stars: ✭ 14 (-22.22%)
Mutual labels:  solr
KGReasoning
Multi-Hop Logical Reasoning in Knowledge Graphs
Stars: ✭ 197 (+994.44%)
Mutual labels:  knowledge-graph
Capricorn
提供强大的NLP能力, low-code实现chatbot
Stars: ✭ 14 (-22.22%)
Mutual labels:  knowledge-graph
kglib
TypeDB-ML is the Machine Learning integrations library for TypeDB
Stars: ✭ 523 (+2805.56%)
Mutual labels:  knowledge-graph
ChineseTextAnalysisResouce
中文文本分析相关资源汇总
Stars: ✭ 71 (+294.44%)
Mutual labels:  knowledge-graph
Knowledge Graph Wander
A collection of papers, codes, projects, tutorials ... for Knowledge Graph and other NLP methods
Stars: ✭ 26 (+44.44%)
Mutual labels:  knowledge-graph

Skipchunk

Pypi

Travis build status

Documentation Status

Easy search autosuggest with NLP magic.

Out of the box it provides a hassle-free autosuggest for any corpus from scratch, and latent knowledge graph extraction and exploration.

Install

pip install skipchunk
python -m spacy download 'en_core_web_lg'
python -m nltk.downloader wordnet

You also need to have Solr or Elasticsearch installed and running somewhere!

The current Solr supported version is 8.4.1, but it might work on other versions.

The current Elasticsearch supported version is 7.6.2, but it might work on other versions.

Use It!

See the ./example/ folder for an end-to-end OSC blog load:

Solr

Start Solr first! Doesn't work with Solr cloud yet, but we're working on it. You'll need to start solr using skipchunk's solr_home directory for now.

Then run this: python solr-blog-example.py

Elasticsearch

Start Elasticsearch first!

Then run this: python elasticsearch-blog-example.py

Features

  • Identifies and groups the noun phrases and verb phrases in a corpus
  • Indexes these phrases in Solr or Elasticsearch for a really good out-of-the-box autosuggest
  • Structures the phrases as a graph so that concept-relationship-concept can be easily found
  • Meant to handle batched updates as part of a full stack search platform

Library API

Engine configuration

You need an engine_config, as a dict, to create skipchunk. The dict must contain the following entries

  • host (the fully qualified URL of the engine web API endpoint)
  • name (the name of the graph)
  • path (the on-disk location of stateful data that will be kept)
  • engine_name (either "solr" or "elasticsearch")

Solr engine config example

    engine_config_solr = {
        "host":"http://localhost:8983/solr/",
        "name":"osc-blog",
        "path":"./skipchunk_data",
        "engine_name":"solr"
    }

Elasticsearch engine config example

    engine_config_elasticsearch = {
        "host":"http://localhost:9200/",
        "name":"osc-blog",
        "path":"./skipchunk_data",
        "engine_name":"elasticsearch"
    }

Skipchunk Initialization

When initializing Skipchunk, you will need to provide the constructor with the following parameters

  • engine_config (the dict containing search engine connection details)
  • spacy_model="en_core_web_lg" (the spacy model to use to parse text)
  • minconceptlength=1 (the minimum number of words that can appear in a noun phrase)
  • maxconceptlength=3 (the maximum number of words that can appear in a noun phrase)
  • minpredicatelength=1 (the minimum number of words that can appear in a verb phrase)
  • maxpredicatelength=3 (the maximum number of words that can appear in a verb phrase)
  • minlabels=1 (the number of times a concept/predicate must appear before it is recognized and kept. The lower this number, the more concepts will be kept - so be careful with large content sets!)
  • cache_documents=False
  • cache_pickle=False

Skipchunk Methods

  • tuplize(filename=source,fields=['title','content',...]) (Produces a list of (text,document) tuples ready for processing by the enrichment.)
  • enrich(tuples) (Enriching can take a long time if you provide lots of text. Consider batching at 10k docs at a time.)
  • save (Saves to pickle)
  • load (Loads from pickle)

Graph API

After enrichment, you can then index the graph into the engine

  • index(skipchunk:Skipchunk) (Updates the knowledge graph in the search engine)
  • delete (Deletes a knowledge graph - be careful!)

After indexing, you can call these methods to get autocompleted concepts or walk the knowledge graph

  • conceptVerbConcepts(concept:str,verb:str,mincount=1,limit=100) -> list ( Accepts a verb to find the concepts appearing in the same context)
  • conceptsNearVerb(verb:str,mincount=1,limit=100) -> list ( Accepts a verb to find the concepts appearing in the same context)
  • verbsNearConcept(concept:str,mincount=1,limit=100) -> list ( Accepts a concept to find the verbs appearing in the same context)
  • suggestConcepts(prefix:str,build=False) -> list ( Suggests a list of concepts given a prefix)
  • suggestPredicates(prefix:str,build=False) -> list ( Suggests a list of predicates given a prefix)
  • summarize(mincount=1,limit=100) -> list ( Summarizes a core)
  • graph(subject:str,objects=5,branches=10) -> list ( Gets the subject-predicate-object neighborhood graph for a subject)

Credits

Developed by Max Irwin, OpenSource Connections https://opensourceconnections.com

All the blog posts contained in the example directory are copyright OpenSource Connections, and may not be used nor redistributed without permission

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].