All Projects → willf → Inverted_index

willf / Inverted_index

Licence: bsd-2-clause
A simple in memory inverted index in Python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Inverted index

Teemo
A Domain Name & Email Address Collection Tool
Stars: ✭ 595 (+4858.33%)
Mutual labels:  search-engine
Minisearch
Tiny and powerful JavaScript full-text search engine for browser and Node
Stars: ✭ 737 (+6041.67%)
Mutual labels:  search-engine
Dawnlightsearch
A Linux version of Everything Search Engine.
Stars: ✭ 26 (+116.67%)
Mutual labels:  search-engine
Meta
A Modern C++ Data Sciences Toolkit
Stars: ✭ 600 (+4900%)
Mutual labels:  search-engine
Riot
Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Stars: ✭ 6,025 (+50108.33%)
Mutual labels:  search-engine
Relevancyfeedback
Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search
Stars: ✭ 19 (+58.33%)
Mutual labels:  search-engine
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (+4658.33%)
Mutual labels:  search-engine
Yub
yub.js - A command-line for the web
Stars: ✭ 10 (-16.67%)
Mutual labels:  search-engine
Bertsearch
Elasticsearch with BERT for advanced document search.
Stars: ✭ 684 (+5600%)
Mutual labels:  search-engine
Blast
Blast is a full text search and indexing server, written in Go, built on top of Bleve.
Stars: ✭ 934 (+7683.33%)
Mutual labels:  search-engine
Manticoresearch
Database for search
Stars: ✭ 610 (+4983.33%)
Mutual labels:  search-engine
Search cop
Search engine like fulltext query support for ActiveRecord
Stars: ✭ 660 (+5400%)
Mutual labels:  search-engine
Covid 19 Bert Researchpapers Semantic Search
BERT semantic search engine for searching literature research papers for coronavirus covid-19 in google colab
Stars: ✭ 23 (+91.67%)
Mutual labels:  search-engine
Typesense
Fast, typo tolerant, fuzzy search engine for building delightful search experiences ⚡ 🔍 ✨ An Open Source alternative to Algolia and an Easier-to-Use alternative to ElasticSearch.
Stars: ✭ 8,644 (+71933.33%)
Mutual labels:  search-engine
Censys Ruby
Ruby API client for the Censys internet-wide network-scan search engine
Stars: ✭ 8 (-33.33%)
Mutual labels:  search-engine
Warez
All your base are belong to us!
Stars: ✭ 584 (+4766.67%)
Mutual labels:  search-engine
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+6416.67%)
Mutual labels:  search-engine
Infinispan
Infinispan is an open source data grid platform and highly scalable NoSQL cloud data store.
Stars: ✭ 862 (+7083.33%)
Mutual labels:  search-engine
Better Search
Better Search WordPress plugin
Stars: ✭ 9 (-25%)
Mutual labels:  search-engine
Search Ui
🔍 A set of UI components to build a fully customized search!
Stars: ✭ 24 (+100%)
Mutual labels:  search-engine

Inverted Index

A simple in-memory inverted index system, with a modest query language.

i = inverted_index.Index()
i.index(1, "this is the day they give babies away with half a pound of tea")
i.index(1, "if you know any ladies who need any babies just send them round to ")
i.index(2, "babies are born in the circle of the sun")
results, err = i.query("babies")
print(results)
{1,2}
results, err = i.query("babies AND ladies")
print(results)
{1}
i.index(3, "WHERE ARE THE BABIES", tokenizer=lambda s:s.lower().split())
results, err = i.query("babies")
print(results)
{1,2,3}
i.unindex(3)
results, err = i.query("babies")
print(results)
{1,2}

Any hashable object can be the "document", and a tokenizer can be specified to tokenize the text to index. There are also add_token and add_tokens methods to directly index on individual tokens.

The query language is very simple: it understands AND and OR, NOT, and parentheses. For example:

term OR term
term AND term OR term
(term AND term) OR term
NOT term
NOT term AND (term OR term)

AND, OR, and NOT have equal precedence, so use parentheses to disambiguate.

I'm pretty sure you don't want to use this in production code :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].