All Projects → bvaughn → Js Search

bvaughn / Js Search

Licence: mit
JS Search is an efficient, client-side search library for JavaScript and JSON objects

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to Js Search

Js Worker Search
JavaScript client-side search API with web-worker support
Stars: ✭ 345 (-82.03%)
Mutual labels:  search, indexing, database, performance
Filemasta
A search application to explore, discover and share online files
Stars: ✭ 571 (-70.26%)
Mutual labels:  search, indexing, database
Spimedb
EXPLORE & EDIT REALITY
Stars: ✭ 14 (-99.27%)
Mutual labels:  search, database
Deepdatabase
A relational database engine using B+ tree indexing
Stars: ✭ 32 (-98.33%)
Mutual labels:  indexing, database
Powa Web
PoWA user interface
Stars: ✭ 66 (-96.56%)
Mutual labels:  database, performance
Euclidesdb
A multi-model machine learning feature embedding database
Stars: ✭ 615 (-67.97%)
Mutual labels:  search, database
Libmdbx
One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.
Stars: ✭ 729 (-62.03%)
Mutual labels:  database, performance
Pgtune
Pgtune - tuning PostgreSQL config by your hardware
Stars: ✭ 1,078 (-43.85%)
Mutual labels:  database, performance
Pgm Index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-74.01%)
Mutual labels:  indexing, database
Pumpkindb
Immutable Ordered Key-Value Database Engine
Stars: ✭ 1,219 (-36.51%)
Mutual labels:  indexing, database
Search And Replace
A simple search for find strings in your WordPress database and replace the string.
Stars: ✭ 76 (-96.04%)
Mutual labels:  search, database
Objectbox C
ObjectBox C and C++: super-fast database for objects and structs
Stars: ✭ 91 (-95.26%)
Mutual labels:  database, performance
Manticoresearch
Database for search
Stars: ✭ 610 (-68.23%)
Mutual labels:  search, database
Hypopg
Hypothetical Indexes for PostgreSQL
Stars: ✭ 594 (-69.06%)
Mutual labels:  indexing, database
Active record doctor
Identify database issues before they hit production.
Stars: ✭ 865 (-54.95%)
Mutual labels:  database, performance
Ansible Role Memcached
Ansible Role - Memcached
Stars: ✭ 54 (-97.19%)
Mutual labels:  database, performance
Pg stat kcache
Gather statistics about physical disk access and CPU consumption done by backends.
Stars: ✭ 106 (-94.48%)
Mutual labels:  database, performance
Orientdb
OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.
Stars: ✭ 4,394 (+128.85%)
Mutual labels:  database, performance
Maghead
The fastest pure PHP database framework with a powerful static code generator, supports horizontal scale up, designed for PHP7
Stars: ✭ 483 (-74.84%)
Mutual labels:  database, performance
Cetus
Cetus is a high performance middleware that provides transparent routing between your application and any backend MySQL Servers.
Stars: ✭ 1,199 (-37.55%)
Mutual labels:  database, performance

Installation | Overview | Tokenization | Stemming | Stop Words | Search Index | Index Strategy

Js Search: client-side search library

Js Search enables efficient client-side searches of JavaScript and JSON objects. It is ES5 compatible and does not require jQuery or any other third-party libraries.

Js Search began as a lightweight implementation of Lunr JS, offering runtime performance improvements and a smaller file size. It has since expanded to include a rich feature set- supporting stemming, stop-words, and TF-IDF ranking.

Here are some JS Perf benchmarks comparing the two search libraries. (Thanks to olivernn for tweaking the Lunr side for a better comparison!)

If you're looking for a simpler, web-worker optimized JS search utility check out js-worker-search.

Installation

You can install using either Bower or NPM like so:

npm install js-search
bower install js-search

Overview

At a high level you configure Js Search by telling it which fields it should index for searching and then add the objects to be searched.

For example, a simple use of JS Search would be as follows:

import * as JsSearch from 'js-search';

var theGreatGatsby = {
  isbn: '9781597226769',
  title: 'The Great Gatsby',
  author: {
    name: 'F. Scott Fitzgerald'
  },
  tags: ['book', 'inspirational']
};
var theDaVinciCode = {
  isbn: '0307474275',
  title: 'The DaVinci Code',
  author: {
    name: 'Dan Brown'
  },
  tags: ['book', 'mystery']
};
var angelsAndDemons = {
  isbn: '074349346X',
  title: 'Angels & Demons',
  author: {
    name: 'Dan Brown',
  },
  tags: ['book', 'mystery']
};

var search = new JsSearch.Search('isbn');
search.addIndex('title');
search.addIndex(['author', 'name']);
search.addIndex('tags')

search.addDocuments([theGreatGatsby, theDaVinciCode, angelsAndDemons]);

search.search('The');    // [theGreatGatsby, theDaVinciCode]
search.search('scott');  // [theGreatGatsby]
search.search('dan');    // [angelsAndDemons, theDaVinciCode]
search.search('mystery') // [angelsAndDemons, theDaVinciCode]

Tokenization

Tokenization is the process of breaking text (e.g. sentences) into smaller, searchable tokens (e.g. words or parts of words). Js Search provides a basic tokenizer that should work well for English but you can provide your own like so:

search.tokenizer = {
  tokenize( text /* string */ ) {
    // Convert text to an Array of strings and return the Array
  }
};

Stemming

Stemming is the process of reducing search tokens to their root (or "stem") so that searches for different forms of a word will still yield results. For example "search", "searching" and "searched" can all be reduced to the stem "search".

Js Search does not implement its own stemming library but it does support stemming through the use of third-party libraries.

To enable stemming, use the StemmingTokenizer like so:

var stemmer = require('porter-stemmer').stemmer;

search.tokenizer =
	new JsSearch.StemmingTokenizer(
        stemmer, // Function should accept a string param and return a string
	    new JsSearch.SimpleTokenizer());

Stop Words

Stop words are very common (e.g. a, an, and, the, of) and are often not semantically meaningful. By default Js Search does not filter these words, but filtering can be enabled by using the StopWordsTokenizer like so:

search.tokenizer =
	new JsSearch.StopWordsTokenizer(
    	new JsSearch.SimpleTokenizer());

By default Js Search uses a slightly modified version of the Google History stop words listed on www.ranks.nl/stopwords. You can modify this list of stop words by adding or removing values from the JsSearch.StopWordsMap object like so:

JsSearch.StopWordsMap.the = false; // Do not treat "the" as a stop word
JsSearch.StopWordsMap.bob = true;  // Treat "bob" as a stop word

Note that stop words are lower case and so using a case-sensitive sanitizer may prevent some stop words from being removed.

Configuring the search index

There are two search indices packaged with js-search.

Term frequency–inverse document frequency (or TF-IDF) is a numeric statistic intended to reflect how important a word (or words) are to a document within a corpus. The TF-IDF value increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. This helps to adjust for the fact that some words (e.g. and, or, the) appear more frequently than others.

By default Js Search supports TF-IDF ranking but this can be disabled for performance reasons if it is not required. You can specify an alternate ISearchIndex implementation in order to disable TF-IDF, like so:

// default
search.searchIndex = new JsSearch.TfIdfSearchIndex();

// Search index capable of returning results matching a set of tokens
// but without any meaningful rank or order.
search.searchIndex = new JsSearch.UnorderedSearchIndex();

Configuring the index strategy

There are three index strategies packaged with js-search.

PrefixIndexStrategy indexes for prefix searches. (e.g. the term "cat" is indexed as "c", "ca", and "cat" allowing prefix search lookups).

AllSubstringsIndexStrategy indexes for all substrings. In other word "c", "ca", "cat", "a", "at", and "t" all match "cat".

ExactWordIndexStrategy indexes for exact word matches. For example "bob" will match "bob jones" (but "bo" will not).

By default Js Search supports prefix indexing but this is configurable. You can specify an alternate IIndexStrategy implementation in order to disable prefix indexing, like so:

// default
search.indexStrategy = new JsSearch.PrefixIndexStrategy();

// this index strategy is built for all substrings matches.
search.indexStrategy = new JsSearch.AllSubstringsIndexStrategy();

// this index strategy is built for exact word matches.
search.indexStrategy = new JsSearch.ExactWordIndexStrategy();
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].