All Projects → justinwilaby → spellchecker-wasm

justinwilaby / spellchecker-wasm

Licence: MIT license
SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell

Programming Languages

rust
11053 projects
javascript
184084 projects - #8 most used programming language
typescript
32286 projects
C#
18002 projects
HTML
75241 projects

Projects that are alternatives of or similar to spellchecker-wasm

Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+4195.65%)
Mutual labels:  spellcheck, levenshtein, levenshtein-distance, spelling, spell-check, spelling-correction, symspell
LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (+13.04%)
Mutual labels:  spellcheck, levenshtein, levenshtein-distance, spelling, spell-check, spelling-correction
WordSegmentationDP
Word Segmentation with Dynamic Programming
Stars: ✭ 18 (-60.87%)
Mutual labels:  spellcheck, spell-check, spelling-correction, spellchecker, symspell
SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-39.13%)
Mutual labels:  spellcheck, spelling, spell-check, spelling-correction, symspell
spell
Spelling correction and string segmentation written in Go
Stars: ✭ 24 (-47.83%)
Mutual labels:  spellcheck, spelling, spell-check, spelling-correction, symspell
spacy hunspell
✏️ Hunspell extension for spaCy 2.0.
Stars: ✭ 94 (+104.35%)
Mutual labels:  spelling, spell-check, spelling-correction, spellchecker
Did you mean
The gem that has been saving people from typos since 2014
Stars: ✭ 1,786 (+3782.61%)
Mutual labels:  spellcheck, spelling, spell-check, spelling-correction
customized-symspell
Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (+10.87%)
Mutual labels:  levenshtein-distance, spelling-correction, spellchecker, symspell
Symspellcompound
SymSpellCompound: compound aware automatic spelling correction
Stars: ✭ 61 (+32.61%)
Mutual labels:  spellcheck, levenshtein, spell-check
check-spelling
Spelling checker action
Stars: ✭ 139 (+202.17%)
Mutual labels:  spellcheck, spelling, spell-check
contextualSpellCheck
✔️Contextual word checker for better suggestions
Stars: ✭ 274 (+495.65%)
Mutual labels:  spellcheck, spelling-correction, spellchecker
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+813.04%)
Mutual labels:  spellcheck, levenshtein, spell-check
Wecantspell.hunspell
A port of Hunspell v1 for .NET and .NET Standard
Stars: ✭ 61 (+32.61%)
Mutual labels:  spellcheck, spell-check
Dictionaries
Hunspell dictionaries in UTF-8
Stars: ✭ 591 (+1184.78%)
Mutual labels:  spellcheck, spell-check
Hunspell
The most popular spellchecking library.
Stars: ✭ 1,196 (+2500%)
Mutual labels:  spellcheck, spell-check
Pylanguagetool
Python Library and CLI for the LanguageTool JSON API
Stars: ✭ 62 (+34.78%)
Mutual labels:  spellcheck, spell-check
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+30.43%)
Mutual labels:  levenshtein, levenshtein-distance
cyberdic
An auxiliary spellcheck dictionary that corresponds with the Bishop Fox Cybersecurity Style Guide
Stars: ✭ 63 (+36.96%)
Mutual labels:  spellcheck, spelling
Spelling
Tools for Spell Checking in R
Stars: ✭ 82 (+78.26%)
Mutual labels:  spellcheck, spell-check
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+119.57%)
Mutual labels:  spell-check, spellchecker

Spellchecker + WebAssembly

When you absolutely, positively have to have the fastest spellchecker in the room, accept no substitutes.

Build Status Coverage Status

  • Fast - Based on SymSpell v6.6 with bigram support.
  • Plug and play - Ready to go out of the box (batteries included).

Spellcheck-wasm is an extremely fast spellchecker for WebAssembly complete with tooling for leveraging Worker threads to guarantee lightning fast processing of a single word or very large documents without the use of native Node plugins. Sub-millisecond benchmarks bring near native speeds to spellchecking in Node.

Spellcheck-wasm uses a zero dependency Rust port of the extremely popular SymSpell engine with several optimizations for WebAssembly.

Electron
Node
Browsers
Workers
Cli

Installation

npm i -s spellchecker-wasm

As an interactive CLI

npm i -g spellchecker-wasm

Then use spellcheck to enter interactive mode. For supported arguments, run spellcheck --help.

Usage in Electron

// Within the preload script of your BrowserWindow instance
const { webFrame } = require('electron');
const { SpellcheckerWasm }  = require('spellchecker-wasm');

const wasmPath = require.resolve('spellchecker-wasm/lib/spellcheck-wasm.wasm');
const dictionaryLocation = require.resolve('spellchecker-wasm/lib/frequency_dictionary_en_82_765.txt');
const spellchecker = new SpellcheckerWasm();

spellchecker.prepareSpellchecker(wasmPath, dictionaryLocation)
    .then(() => {
        let suggestions;
        spellchecker.resultsHandler = results => {
            suggestions = results;
        };

        webFrame.setSpellCheckProvider('en-US', {
            spellCheck(words, callback) {
                const misspelledWords = [];
                words.forEach(word => {
                    spellchecker.checkSpelling(word); // synchronous
                    if (suggestions.length) {
                        misspelledWords.push(word);
                    }
                });
                callback(misspelledWords);
            }
        })
    })

Usage in Node

import { SpellcheckerWasm } from 'spellchecker-wasm';
const wasmPath = require.resolve('spellchecker-wasm/lib/spellchecker-wasm.wasm');
const dictionaryLocation = require.resolve('spellchecker-wasm/lib/frequency_dictionary_en_82_765.txt');
// Optional bigram support for compound lookups - add only when needed
const bigramLocation = require.resolve('spellchecker-wasm/lib/frequency_bigramdictionary_en_243_342.txt');

const spellchecker = new SpellcheckerWasm(resultHandler);
spellChecker.prepareSpellchecker(wasmPath, dictionaryLocation, bigramLocation)
    .then(() => {
        ['tiss', 'gves', 'practiclly', 'instent', 'relevent', 'resuts'].forEach(word => spellchecker.checkSpelling(word));
        spellchecker.checkSpellingCompound('tiss cheks th entir sentance')
    });

function resultHandler(results) {
    // Results are given in the same order they are sent.
    // The most relevant results are order lower in the results index.
    process.stdout.write(results.map(r => r.term));
}

Usage as a Node Worker

const { SpellcheckerWasm } = require('../lib/nodejs/SpellcheckerWasm.js');

const wasmPath = require.resolve('spellchecker-wasm/lib/spellchecker-wasm.wasm');
const dictionaryLocation = require.resolve('spellchecker-wasm/lib/frequency_dictionary_en_82_765.txt');
// Optional bigram support for compound lookups - add only when needed
const bigramLocation = require.resolve('spellchecker-wasm/lib/frequency_bigramdictionary_en_243_342.txt');

let resultHandler = (results) => {process.stdout.write(results.map(r => r.term) + '\n');};
let spellcheckerWasm = new SpellcheckerWasm(resultHandler);
spellcheckerWasm.prepareSpellchecker(wasmPath, dictionaryLocation, bigramLocation)
    .then(() => {
        process.stdout.write('Ready\n');
        process.stdin.on('data', data => {
            spellcheckerWasm.checkSpelling('' + data);
        });
    })
    .catch((e) => {
        process.stdout.write(`Error initializing the SpellChecker\n${e}\n`);
    });

Usage in the Browser

import { SpellcheckerWasm } from 'spellchecker-wasm/lib/browser/index.js';

let resultHandler = (results) => console.log("Results : ", results.map(result => result.term));

async function initializeSpellchecker() {
    const wasm = await fetch('spellchecker-wasm/lib/spellchecker-wasm.wasm');
    const dictionary = await fetch('spellchecker-wasm/lib/frequency_dictionary_en_82_765.txt');
    const bigramLocation = await fetch('spellchecker-wasm/lib/frequency_bigramdictionary_en_243_342.txt'); // Optional

    const spellchecker = new SpellcheckerWasm(resultHandler);
    await spellchecker.prepareSpellchecker(wasm, dictionary, bigramLocation);
    return spellchecker;
}

initializeSpellchecker().then(spellchecker => {
    ['tiss', 'gves', 'practiclly', 'instent', 'relevent', 'resuts'].forEach(word => spellchecker.checkSpelling(word));
    spellchecker.checkSpellingCompound('tiss cheks th entir sentance');
});

Common use cases

Differentiating between a correct word and a word with no suggestions

By default, the spellchecker will return no results for both 'there' and 'thereeeee'. The former is correct and so will not produce suggestions. The latter word is obviously a mistake, but its distance from any word in the dictionary is greater than the maxEditDistance.

To distinguish between the two, one can use the includeUnknown option :

let lastResults;
const resultsHandler = results => {
    lastResults = results;
};

spellchecker.checkSpelling('there');
// lastResults.length === 0

spellchecker.checkSpelling('thereeeee');
// lastResults.length === 0

spellchecker.checkSpelling('thereeeee', {
    includeUnknown: true,
    maxEditDistance: 2,
    verbosity: 2,
    includeSelf: false
});
// lastResults.length === 1

Allowing for deeper word searches

Given that the default maxEditDistance, which controls up to which edit distance words from the dictionary should be treated as suggestions, is 2, words such as cofvvvfee will not return suggestions.

This can be remedied as follows:

let lastResults;
const resultsHandler = results => {
    lastResults = results;
};

spellchecker.checkSpelling('cofvvvfee');
// lastResults.length === 0

spellchecker.checkSpelling('cofvvvfee', {
    includeUnknown: false,
    maxEditDistance: 4,
    verbosity: 1,
    includeSelf: false
});
// lastResults.length === 1, lastResults[0] --> 'coffee'

Caveat : the maxEditDistance parameter that is passed to checkSpelling must be less-than-or-equal to the dictionaryEditDistance parameter of prepareSpellchecker. E.g. :

// BAD!
await spellchecker.prepareSpellchecker(wasmPath, dictionaryLocation); // Default value of dictionaryEditDistance is 2
let lastResults;
const resultsHandler = results => {
    lastResults = results;
};
spellchecker.checkSpelling('cofvvvfee', {
    includeUnknown: false,
    maxEditDistance: 4,
    verbosity: 1,
    includeSelf: false
});
// ERROR!
// Good
await spellchecker.prepareSpellchecker(wasmPath, dictionaryLocation, null, {countThreshold: 2, dictionaryEditDistance: 4});
let lastResults;
const resultsHandler = results => {
    lastResults = results;
};
spellchecker.checkSpelling('cofvvvfee', {
    includeUnknown: false,
    maxEditDistance: 4,
    verbosity: 1,
    includeSelf: false
});
// lastResults.length === 1

Controlling the amount and ordering of returned suggestions

The verbosity parameter to checkSpelling can be used to tweak the amount of suggestions returned. Its supported values are :

verbosity:
    0: (top) returns only the suggestion with the highest term frequency of the suggestions of smallest edit distance found,
    1: (closest) returns all suggestions of smallest edit distance found, suggestions ordered by term frequency,
    2: (all) returns all suggestions within maxEditDistance, suggestions ordered by edit distance, then by term frequency,

Building from source

Prerequisites

This project requires rust v1.30+ since it contains the wasm32-unknown-unknown target out of the box.

Install rust:

curl https://sh.rustup.rs -sSf | sh

Install the stable compiler and switch to it.

rustup install stable
rustup default stable

Install the wasm32-unknown-unknown target.

rustup target add wasm32-unknown-unknown --toolchain stable

Install node with npm then run the following command from the project root.

npm install

Install the wasm-bindgen-cli tool

cargo install wasm-bindgen-cli

The project can now be built using:

npm run build

The artifacts from the build will be located in the /libs directory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].