Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → barrust → Pyspellchecker

barrust / Pyspellchecker

Licence: mit

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/

Programming Languages

python

139335 projects - #7 most used programming language

Labels

spellcheck

Projects that are alternatives of or similar to Pyspellchecker

flake8-spellcheck

❄️ Spellcheck variables, classnames, comments, docstrings etc

Stars: ✭ 71 (-78.87%)

Mutual labels: spellcheck

identypo

identypo is a Go static analysis tool to find typos in identifiers (functions, function calls, variables, constants, type declarations, packages, labels).

Stars: ✭ 26 (-92.26%)

Mutual labels: spellcheck

SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Stars: ✭ 28 (-91.67%)

Mutual labels: spellcheck

neuspell

NeuSpell: A Neural Spelling Correction Toolkit

Stars: ✭ 524 (+55.95%)

Mutual labels: spellcheck

LinSpell

Fast approximate strings search & spelling correction

Stars: ✭ 52 (-84.52%)

Mutual labels: spellcheck

yaspeller-ci

Fast spelling check for Travis CI

Stars: ✭ 60 (-82.14%)

Mutual labels: spellcheck

Php Spellchecker

🐘🎓📝 PHP Library providing an easy way to spellcheck multiple sources of text by many spellcheckers

Stars: ✭ 213 (-36.61%)

Mutual labels: spellcheck

Proofreader

Simple text proofreader based on 'write-good' (hemingway-app-like suggestions) and 'nodehun' (spelling).

Stars: ✭ 285 (-15.18%)

Mutual labels: spellcheck

check-spelling

Spelling checker action

Stars: ✭ 139 (-58.63%)

Mutual labels: spellcheck

wellspell.addin

R Package - Quick Spellcheck Addin for RStudio

Stars: ✭ 22 (-93.45%)

Mutual labels: spellcheck

WordSegmentationDP

Word Segmentation with Dynamic Programming

Stars: ✭ 18 (-94.64%)

Mutual labels: spellcheck

spellchecker-wasm

SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell

Stars: ✭ 46 (-86.31%)

Mutual labels: spellcheck

ispell-lt

Lithuanian spellchecking dictionary

Stars: ✭ 26 (-92.26%)

Mutual labels: spellcheck

spell

Spelling correction and string segmentation written in Go

Stars: ✭ 24 (-92.86%)

Mutual labels: spellcheck

cyberdic

An auxiliary spellcheck dictionary that corresponds with the Bishop Fox Cybersecurity Style Guide

Stars: ✭ 63 (-81.25%)

Mutual labels: spellcheck

Nodehun

The Hunspell binding for NodeJS that exposes as much of Hunspell as possible and also adds new features. Hunspell is a first class spellcheck library used by Google, Apple, and Mozilla.

Stars: ✭ 229 (-31.85%)

Mutual labels: spellcheck

contextualSpellCheck

✔️Contextual word checker for better suggestions

Stars: ✭ 274 (-18.45%)

Mutual labels: spellcheck

Nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Stars: ✭ 309 (-8.04%)

Mutual labels: spellcheck

hanspell

(주)다음과 부산대학교 인공지능연구실/(주)나라인포테크의 웹 서비스를 이용한 한글 맞춤법 검사기.

Stars: ✭ 72 (-78.57%)

Mutual labels: spellcheck

Emacs-LanguageTool.el

LanguageTool suggestions integrated within Emacs

Stars: ✭ 44 (-86.9%)

Mutual labels: spellcheck

View All Similar Projects ➔

pyspellchecker

.. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://opensource.org/licenses/MIT/ :alt: License .. image:: https://img.shields.io/github/release/barrust/pyspellchecker.svg :target: https://github.com/barrust/pyspellchecker/releases :alt: GitHub release .. image:: https://github.com/barrust/pyspellchecker/workflows/Python%20package/badge.svg :target: https://github.com/barrust/pyspellchecker/actions?query=workflow%3A%22Python+package%22 :alt: Build Status .. image:: https://codecov.io/gh/barrust/pyspellchecker/branch/master/graph/badge.svg?token=OdETiNgz9k :target: https://codecov.io/gh/barrust/pyspellchecker :alt: Test Coverage .. image:: https://badge.fury.io/py/pyspellchecker.svg :target: https://badge.fury.io/py/pyspellchecker :alt: PyPi Package .. image:: http://pepy.tech/badge/pyspellchecker :target: http://pepy.tech/count/pyspellchecker :alt: Downloads

Pure Python Spell Checking based on Peter Norvig's <https://norvig.com/spell-correct.html>__ blog post on setting up a simple spell checking algorithm.

It uses a Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance>__ algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

pyspellchecker supports multiple languages including English, Spanish, German, French, and Portuguese. For information on how the dictionaries were created and how they can be updated and improved, please see the Dictionary Creation and Updating section of the readme!

pyspellchecker supports Python 3

pyspellchecker allows for the setting of the Levenshtein Distance (up to two) to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2. See the quickstart to find how one can change the distance parameter.

Installation

The easiest method to install is using pip:

.. code:: bash

pip install pyspellchecker

To install from source:

.. code:: bash

git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

For python 2.7 support, install release 0.5.6 <https://github.com/barrust/pyspellchecker/releases/tag/v0.5.6>__ but note that no future updates will support python 2.

.. code:: bash

pip install pyspellchecker==0.5.6

Quickstart

After installation, using pyspellchecker should be fairly straight forward:

.. code:: python

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

If the Word Frequency list is not to your liking, you can add additional text to generate a more appropriate list for your use case.

.. code:: python

from spellchecker import SpellChecker

spell = SpellChecker()  # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')

# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google'])  # will return both now!

If the words that you wish to check are long, it is recommended to reduce the distance to 1. This can be accomplished either when initializing the spell check class or after the fact.

.. code:: python

from spellchecker import SpellChecker

spell = SpellChecker(distance=1)  # set at initialization

# do some work on longer words

spell.distance = 2  # set the distance parameter back to the default

Non-English Dictionaries

pyspellchecker supports several default dictionaries as part of the default package. Each is simple to use when initializing the dictionary:

.. code:: python

from spellchecker import SpellChecker

english = SpellChecker()  # the default is English (language='en')
spanish = SpellChecker(language='es')  # use the Spanish Dictionary
russian = SpellChecker(language='ru')  # use the Russian Dictionary

The currently supported dictionaries are:

English - 'en'
Spanish - 'es'
French - 'fr'
Portuguese - 'pt'
German - 'de'
Russian - 'ru'

Dictionary Creation and Updating

The creation of the dictionaries is, unfortunately, not an exact science. I have provided a script that, given a text file of sentences (in this case from OpenSubtitles <http://opus.nlpl.eu/OpenSubtitles2018.php>__) it will generate a word frequency list based on the words found within the text. The script then attempts to clean up the word frequency by, for example, removing words with invalid characters (usually from other languages), removing low count terms (misspellings?) and attempts to enforce rules as available (no more than one accent per word in Spanish). Then it removes words from a list of known words that are to be removed. It then adds words into the dictionary that are known to be missing or were removed for being too low frequency.

The script can be found here: ``scripts/build_dictionary.py. The original word frequency list parsed from OpenSubtitles can be found in thescripts/data/``` folder along with each language's include and exclude text files.

Any help in updating and maintaining the dictionaries would be greatly desired. To do this, a discussion <https://github.com/barrust/pyspellchecker/discussions>__ could be started on GitHub or pull requests to update the include and exclude files could be added.

Additional Methods

On-line documentation <http://pyspellchecker.readthedocs.io/en/latest/>__ is available; below contains the cliff-notes version of some of the available functions:

correction(word): Returns the most probable result for the misspelled word

candidates(word): Returns a set of possible candidates for the misspelled word

known([words]): Returns those words that are in the word frequency list

unknown([words]): Returns those words that are not in the frequency list

word_probability(word): The frequency of the given word out of all words in the frequency list

The following are less likely to be needed by the user but are available: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

edit_distance_1(word): Returns a set of all strings at a Levenshtein Distance of one based on the alphabet of the selected language

edit_distance_2(word): Returns a set of all strings at a Levenshtein Distance of two based on the alphabet of the selected language

Credits

Peter Norvig <https://norvig.com/spell-correct.html>__ blog post on setting up a simple spell checking algorithm
P Lison and J Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 336

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗