All Projects → sajari → Fuzzy

sajari / Fuzzy

Licence: mit
Spell checking and fuzzy search suggestion written in Go

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Fuzzy

Hallelujahim
hallelujahIM(哈利路亚 英文输入法) is an intelligent English input method with auto-suggestions and spell check features, Mac only.
Stars: ✭ 1,334 (+360%)
Mutual labels:  spell-check, autocomplete
Email Autocomplete
A jQuery plugin that suggests and autocompletes the domain in email fields.
Stars: ✭ 265 (-8.62%)
Mutual labels:  autocomplete
autocompletex
redis autocomplete for elixir
Stars: ✭ 22 (-92.41%)
Mutual labels:  autocomplete
angular-ng-autocomplete
NPM package for Angular: https://www.npmjs.com/package/angular-ng-autocomplete
Stars: ✭ 123 (-57.59%)
Mutual labels:  autocomplete
vim-hugo-helper
A small Vim plugin with a set of helpers for Hugo https://gohugo.io
Stars: ✭ 82 (-71.72%)
Mutual labels:  spell-check
Material Ui Superselectfield
multiselection autocomplete dropdown component for Material-UI
Stars: ✭ 260 (-10.34%)
Mutual labels:  autocomplete
Elasticsearch-Autocomplete-API-Sample
Building Autocomplete API with Completion Suggester in ASP.NET Core sample project
Stars: ✭ 20 (-93.1%)
Mutual labels:  autocomplete
Autocomplete
Accessible autocomplete component for vanilla JavaScript and Vue.
Stars: ✭ 277 (-4.48%)
Mutual labels:  autocomplete
Graphql For Vscode
GraphQL syntax highlighting, linting, auto-complete, and more!
Stars: ✭ 265 (-8.62%)
Mutual labels:  autocomplete
AndroidIDE
AndroidIDE is an IDE for Android to develop full featured Android apps on Android smartphones.
Stars: ✭ 98 (-66.21%)
Mutual labels:  autocomplete
django-select2
This is a Django integration for Select2
Stars: ✭ 73 (-74.83%)
Mutual labels:  autocomplete
react-dadata-box
React component for use DaData service API (suggestions)
Stars: ✭ 25 (-91.38%)
Mutual labels:  autocomplete
Coc Flutter
flutter support for (Neo)vim
Stars: ✭ 259 (-10.69%)
Mutual labels:  autocomplete
idea-php-advanced-autocomplete
Plugin for PhpStorm IDE. Adds auto-completion support for various built-in PHP functions, where parameter is a string literal.
Stars: ✭ 57 (-80.34%)
Mutual labels:  autocomplete
Picocli
Picocli is a modern framework for building powerful, user-friendly, GraalVM-enabled command line apps with ease. It supports colors, autocompletion, subcommands, and more. In 1 source file so apps can include as source & avoid adding a dependency. Written in Java, usable from Groovy, Kotlin, Scala, etc.
Stars: ✭ 3,286 (+1033.1%)
Mutual labels:  autocomplete
mongoose-graphql-pagination
GraphQL cursor pagination (Relay-like) for Mongoose models.
Stars: ✭ 29 (-90%)
Mutual labels:  autocomplete
macOS-global-autocomplete
📃 System-wide autocompleting that learns what you type and works in any app! (also slightly scary maybe don't use this...)
Stars: ✭ 26 (-91.03%)
Mutual labels:  autocomplete
Graphqurl
curl for GraphQL with autocomplete, subscriptions and GraphiQL. Also a dead-simple universal javascript GraphQL client.
Stars: ✭ 3,012 (+938.62%)
Mutual labels:  autocomplete
Selectmenu
Simple, easily and diversity menu solution
Stars: ✭ 284 (-2.07%)
Mutual labels:  autocomplete
React Native Mentions
Mentions textbox for React Native. Works on both ios and android. 🐳
Stars: ✭ 277 (-4.48%)
Mutual labels:  autocomplete

Fuzzy

Build Status

Fuzzy is a very fast spell checker and query suggester written in Golang.

Motivation:

  • Sajari uses very large queries (hundreds of words) but needs to respond sub-second to these queries where possible. Common spell check algorithms are quite slow or very resource intensive.
  • The aim was to achieve spell checks in sub 100usec per word (10,000 / second single core) with at least 60% accuracy and multi-language support.
  • Currently we see sub 40usec per word and ~70% accuracy for a Levenshtein distance of 2 chars on a 2012 macbook pro (english test set comes from Peter Norvig's article, see http://norvig.com/spell-correct.html).
  • A 500 word query can be spell checked in ~0.02 sec / cpu cores, which is good enough for us.

Notes:

  • It is currently executed as a single goroutine per lookup, so undoubtedly this could be much faster using multiple cores, but currently the speed is quite good.
  • Accuracy is hit slightly because several correct words don't appear at all in the training text (data/big.txt).
  • Fuzzy is a "Symmetric Delete Spelling Corrector", which relates to some blogs by Wolf Garbe at Faroo.com (see http://blog.faroo.com/2012/06/07/improved-edit-distance-based-spelling-correction/)

Config:

  • Generally no config is required, but you can tweak the model for your application.
  • "threshold" is the trigger point when a word becomes popular enough to build lookup keys for it. Setting this to "1" means any instance of a given word makes it a legitimate spelling. This typically corrects the most errors, but can also cause false positives if incorrect spellings exist in the training data. It also causes a much larger index to be built. By default this is set to 4.
  • "depth" is the Levenshtein distance the model builds lookup keys for. For spelling correction, a setting of "2" is typically very good. At a distance of "3" the potential number of words is much, much larger, but adds little benefit to accuracy. For query prediction a larger number can be useful, but again is much more expensive. A depth of "1" and threshold of "1" for the 1st Norvig test set gives ~70% correction accuracy at ~5usec per check (e.g. ~200kHz), for many applications this will be good enough. At depths > 2, the false positives begin to hurt the accuracy.

Future improvements:

  • Make some of the expensive processes concurrent.
  • Add spelling checks for different languages. If you have misspellings in different languages please add them or send to us.
  • Allow the term-score map to be read from an external term set (e.g. integrating this currently may double up on keeping a term count).
  • Currently there is no method to delete lookup keys, so potentially this may cause bloating over time if the dictionary changes signficantly.
  • Add right to left deletion beyond Levenshtein config depth (e.g. don't process all deletes accept for query predictors).

Usage:

  • Below is some example code showing how to use the package.
  • An example showing how to train with a static set of words is contained in the fuzzy_test.go file, which uses the "big.text" file to create an english dictionary.
  • To integrate with your application (e.g. custom dictionary / word popularity), use the single word and multiword training functions shown in the example below. Each time you add a new instance of a given word, pass it to this function. The model will keep a count and
  • We haven't tested with other langauges, but this should work fine. Please let us know how you go? [email protected]
package main 

import(
	"github.com/sajari/fuzzy"
	"fmt"
)

func main() {
	model := fuzzy.NewModel()

	// For testing only, this is not advisable on production
	model.SetThreshold(1)

	// This expands the distance searched, but costs more resources (memory and time). 
	// For spell checking, "2" is typically enough, for query suggestions this can be higher
	model.SetDepth(5)

	// Train multiple words simultaneously by passing an array of strings to the "Train" function
	words := []string{"bob", "your", "uncle", "dynamite", "delicate", "biggest", "big", "bigger", "aunty", "you're"}
	model.Train(words)
	
	// Train word by word (typically triggered in your application once a given word is popular enough)
	model.TrainWord("single")

	// Check Spelling
	fmt.Println("\nSPELL CHECKS")
	fmt.Println("	Deletion test (yor) : ", model.SpellCheck("yor"))
	fmt.Println("	Swap test (uncel) : ", model.SpellCheck("uncel"))
	fmt.Println("	Replace test (dynemite) : ", model.SpellCheck("dynemite"))
	fmt.Println("	Insert test (dellicate) : ", model.SpellCheck("dellicate"))
	fmt.Println("	Two char test (dellicade) : ", model.SpellCheck("dellicade"))

	// Suggest completions
	fmt.Println("\nQUERY SUGGESTIONS")
	fmt.Println("	\"bigge\". Did you mean?: ", model.Suggestions("bigge", false))
	fmt.Println("	\"bo\". Did you mean?: ", model.Suggestions("bo", false))
	fmt.Println("	\"dyn\". Did you mean?: ", model.Suggestions("dyn", false))

	// Autocomplete suggestions
	suggested, _ := model.Autocomplete("bi")
	fmt.Printf("	\"bi\". Suggestions: %v", suggested)

}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].