Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

The Hunspell binding for NodeJS that exposes as much of Hunspell as possible and also adds new features. Hunspell is a first class spellcheck library used by Google, Apple, and Mozilla.

Stars: ✭ 229 (-38.11%)

Mutual labels: spellcheck

Nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Stars: ✭ 309 (-16.49%)

Mutual labels: spellcheck

identypo

identypo is a Go static analysis tool to find typos in identifiers (functions, function calls, variables, constants, type declarations, packages, labels).

Stars: ✭ 26 (-92.97%)

Mutual labels: spellcheck

SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Stars: ✭ 28 (-92.43%)

Mutual labels: spellcheck

voikko-rs

Rust bindings for the Voikko library

Stars: ✭ 16 (-95.68%)

Mutual labels: spellcheck

LinSpell

Fast approximate strings search & spelling correction

Stars: ✭ 52 (-85.95%)

Mutual labels: spellcheck

Emacs-LanguageTool.el

LanguageTool suggestions integrated within Emacs

Stars: ✭ 44 (-88.11%)

Mutual labels: spellcheck

neuspell

NeuSpell: A Neural Spelling Correction Toolkit

Stars: ✭ 524 (+41.62%)

Mutual labels: spellcheck

hanspell

(주)다음과 부산대학교 인공지능연구실/(주)나라인포테크의 웹 서비스를 이용한 한글 맞춤법 검사기.

Stars: ✭ 72 (-80.54%)

Mutual labels: spellcheck

flake8-spellcheck

❄️ Spellcheck variables, classnames, comments, docstrings etc

Stars: ✭ 71 (-80.81%)

Mutual labels: spellcheck

yaspeller-ci

Fast spelling check for Travis CI

Stars: ✭ 60 (-83.78%)

Mutual labels: spellcheck

Pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/

Stars: ✭ 336 (-9.19%)

Mutual labels: spellcheck

Proofreader

Simple text proofreader based on 'write-good' (hemingway-app-like suggestions) and 'nodehun' (spelling).

Stars: ✭ 285 (-22.97%)

Mutual labels: spellcheck

wellspell.addin

R Package - Quick Spellcheck Addin for RStudio

Stars: ✭ 22 (-94.05%)

Mutual labels: spellcheck

View All Similar Projects ➔

JamSpell

JamSpell is a spell checking library with following features:

accurate - it considers words surroundings (context) for better correction
fast - near 5K words per second
multi-language - it's written in C++ and available for many languages with swig bindings

Colab example

JamSpellPro

jamspell.com - check out a new jamspell version with following features

Improved accuracy (catboost gradient boosted decision trees candidates ranking model)
Splits merged words
Pre-trained models for many languages (small, medium, large) for:
en, ru, de, fr, it, es, tr, uk, pl, nl, pt, hi, no
Ability to add words / sentences at runtime
Fine-tuning / additional training
Memory optimization for training large models
Static dictionary support
Built-in Java, C#, Ruby support
Windows support

Content

Benchmarks
Usage
- Python
- C++
- Other languages
- HTTP API
Train

Benchmarks

	Errors	Top 7 Errors	Fix Rate	Top 7 Fix Rate	Broken	Speed (words/second)
JamSpell	3.25%	1.27%	79.53%	84.10%	0.64%	4854
Norvig	7.62%	5.00%	46.58%	66.51%	0.69%	395
Hunspell	13.10%	10.33%	47.52%	68.56%	7.14%	163
Dummy	13.14%	13.14%	0.00%	0.00%	0.00%	-

Model was trained on 300K wikipedia sentences + 300K news sentences (english). 95% was used for train, 5% was used for evaluation. Errors model was used to generate errored text from the original one. JamSpell corrector was compared with Norvig's one, Hunspell and a dummy one (no corrections).

We used following metrics:

Errors - percent of words with errors after spell checker processed
Top 7 Errors - percent of words missing in top7 candidated
Fix Rate - percent of errored words fixed by spell checker
Top 7 Fix Rate - percent of errored words fixed by one of top7 candidates
Broken - percent of non-errored words broken by spell checker
Speed - number of words per second

To ensure that our model is not too overfitted for wikipedia+news we checked it on "The Adventures of Sherlock Holmes" text:

	Errors	Top 7 Errors	Fix Rate	Top 7 Fix Rate	Broken	Speed (words per second)
JamSpell	3.56%	1.27%	72.03%	79.73%	0.50%	5524
Norvig	7.60%	5.30%	35.43%	56.06%	0.45%	647
Hunspell	9.36%	6.44%	39.61%	65.77%	2.95%	284
Dummy	11.16%	11.16%	0.00%	0.00%	0.00%	-

More details about reproducing available in "Train" section.

Usage

Python

Install swig3 (usually it is in your distro package manager)
Install jamspell:

pip install jamspell

Download or train language model
Use it:

import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('I am the begt spell cherken!')
# u'I am the best spell checker!'

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 3)
# (u'best', u'beat', u'belt', u'bet', u'bent', ... )

corrector.GetCandidates(['i', 'am', 'the', 'begt', 'spell', 'cherken'], 5)
# (u'checker', u'chicken', u'checked', u'wherein', u'coherent', ...)

C++

Add jamspell and contrib dirs to your project
Use it:

#include <jamspell/spell_corrector.hpp>

int main(int argc, const char** argv) {

    NJamSpell::TSpellCorrector corrector;
    corrector.LoadLangModel("model.bin");

    corrector.FixFragment(L"I am the begt spell cherken!");
    // "I am the best spell checker!"

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "best", "beat", "belt", "bet", "bent", ... )

    corrector.GetCandidates({L"i", L"am", L"the", L"begt", L"spell", L"cherken"}, 3);
    // "checker", "chicken", "checked", "wherein", "coherent", ... )
    return 0;
}

Other languages

You can generate extensions for other languages using swig tutorial. The swig interface file is jamspell.i. Pull requests with build scripts are welcome.

HTTP API

Install cmake
Clone and build jamspell (it includes http server):

git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make

Download or train language model
Run http server:

./web_server/web_server en.bin localhost 8080

GET Request example:

$ curl "http://localhost:8080/fix?text=I am the begt spell cherken"
I am the best spell checker

POST Request example

$ curl -d "I am the begt spell cherken" http://localhost:8080/fix
I am the best spell checker

Candidate example

curl "http://localhost:8080/candidates?text=I am the begt spell cherken"
# or
curl -d "I am the begt spell cherken" http://localhost:8080/candidates

{
    "results": [
        {
            "candidates": [
                "best",
                "beat",
                "belt",
                "bet",
                "bent",
                "beet",
                "beit"
            ],
            "len": 4,
            "pos_from": 9
        },
        {
            "candidates": [
                "checker",
                "chicken",
                "checked",
                "wherein",
                "coherent",
                "cheered",
                "cherokee"
            ],
            "len": 7,
            "pos_from": 20
        }
    ]
}

Here pos_from - misspelled word first letter position, len - misspelled word len

Train

To train custom model you need:

Install cmake
Clone and build jamspell:

git clone https://github.com/bakwc/JamSpell.git
cd JamSpell
mkdir build
cd build
cmake ..
make

Prepare a utf-8 text file with sentences to train at (eg. sherlockholmes.txt) and another file with language alphabet (eg. alphabet_en.txt)
Train model:

./main/jamspell train ../test_data/alphabet_en.txt ../test_data/sherlockholmes.txt model_sherlock.bin

To evaluate spellchecker you can use evaluate/evaluate.py script:

python evaluate/evaluate.py -a alphabet_file.txt -jsp your_model.bin -mx 50000 your_test_data.txt

You can use evaluate/generate_dataset.py to generate you train/test data. It supports txt files, Leipzig Corpora Collection format and fb2 books.

Download models

Here is a few simple models. They trained on 300K news + 300k wikipedia sentences. We strongly recommend to train your own model, at least on a few million sentences to achieve better quality. See Train section above.

en.tar.gz (35Mb)
fr.tar.gz (31Mb)
ru.tar.gz (38Mb)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 370

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗