All Projects → wolfgarbe → Symspellcompound

wolfgarbe / Symspellcompound

SymSpellCompound: compound aware automatic spelling correction

Projects that are alternatives of or similar to Symspellcompound

LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (-14.75%)
Mutual labels:  spellcheck, fuzzy-search, levenshtein, spell-check
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+588.52%)
Mutual labels:  fuzzy-search, spellcheck, levenshtein, spell-check
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+3139.34%)
Mutual labels:  fuzzy-search, spellcheck, levenshtein, spell-check
spellchecker-wasm
SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell
Stars: ✭ 46 (-24.59%)
Mutual labels:  spellcheck, levenshtein, spell-check
SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-54.1%)
Mutual labels:  spellcheck, fuzzy-search, spell-check
Did you mean
The gem that has been saving people from typos since 2014
Stars: ✭ 1,786 (+2827.87%)
Mutual labels:  spellcheck, spell-check
Dspellcheck
Notepad++ Spell-checking Plug-in
Stars: ✭ 144 (+136.07%)
Mutual labels:  spellcheck, spell-check
Dictionaries
Hunspell dictionaries in UTF-8
Stars: ✭ 591 (+868.85%)
Mutual labels:  spellcheck, spell-check
Fuzzball.js
Easy to use and powerful fuzzy string matching, port of fuzzywuzzy.
Stars: ✭ 225 (+268.85%)
Mutual labels:  fuzzy-search, levenshtein
Wecantspell.hunspell
A port of Hunspell v1 for .NET and .NET Standard
Stars: ✭ 61 (+0%)
Mutual labels:  spellcheck, spell-check
Jellyfish
🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+2475.41%)
Mutual labels:  fuzzy-search, levenshtein
levenshtein.c
Levenshtein algorithm in C
Stars: ✭ 77 (+26.23%)
Mutual labels:  fuzzy-search, levenshtein
Spelling
Tools for Spell Checking in R
Stars: ✭ 82 (+34.43%)
Mutual labels:  spellcheck, spell-check
Hunspell
The most popular spellchecking library.
Stars: ✭ 1,196 (+1860.66%)
Mutual labels:  spellcheck, spell-check
Misspell Fixer
Simple tool for fixing common misspellings, typos in source code
Stars: ✭ 154 (+152.46%)
Mutual labels:  spellcheck, spell-check
Pylanguagetool
Python Library and CLI for the LanguageTool JSON API
Stars: ✭ 62 (+1.64%)
Mutual labels:  spellcheck, spell-check
WordSegmentationDP
Word Segmentation with Dynamic Programming
Stars: ✭ 18 (-70.49%)
Mutual labels:  spellcheck, spell-check
spell
Spelling correction and string segmentation written in Go
Stars: ✭ 24 (-60.66%)
Mutual labels:  spellcheck, spell-check
check-spelling
Spelling checker action
Stars: ✭ 139 (+127.87%)
Mutual labels:  spellcheck, spell-check
Ugrep
🔍NEW ugrep v3.1: ultra fast grep with interactive query UI and fuzzy search: search file systems, source code, text, binary files, archives (cpio/tar/pax/zip), compressed files (gz/Z/bz2/lzma/xz/lz4), documents and more. A faster, user-friendly and compatible grep replacement.
Stars: ✭ 626 (+926.23%)
Mutual labels:  fuzzy-search

SymSpellCompound

SymSpellCompound has been integrated into SymSpell. Please visit the SymSpell repository!


Compound aware automatic spelling correction

SymSpellCompound supports compound aware automatic spelling correction of multi-word input strings.
It is built on top of SymSpell's 1 million times faster spelling correction algorithm.

1. Compound splitting & decompounding

SymSpell assumed every input string as single term. SymSpellCompound supports compound splitting / decompounding with three cases:

  1. mistakenly inserted space within a correct word led to two incorrect terms
  2. mistakenly omitted space between two correct words led to one incorrect combined term
  3. multiple input terms with/without spelling errors

Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.

2. Automatic spelling correction

  • Large document collections make manual correction infeasible and require unsupervised, fully-automatic spelling correction.
  • In conventional spelling correction of a single token, the user is presented with spelling correction suggestions.
    For automatic spelling correction of long multi-word text the the algorithm itself has to make an educated choice.

Examples:

- whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him
+ where is the love he had dated for much of the past who couldn't read in sixth grade and inspired him  (9 edits)

- in te dhird qarter oflast jear he hadlearned ofca sekretplan y iran
+ in the third quarter of last year he had learned of a secret plan by iran  (10 edits)

- the bigjest playrs in te strogsommer film slatew ith plety of funn
+ the biggest players in the strong summer film slate with plenty of fun  (9 edits)

- Can yu readthis messa ge despite thehorible sppelingmsitakes
+ can you read this message despite the horrible spelling mistakes  (9 edits)

Performance

0.2 milliseconds / word
5000 words / second (single core on 2012 Macbook Pro)

Applications

  • Query correction (10–15% of queries contain misspelled terms),
  • Chatbots,
  • OCR post-processing,
  • Automated proofreading.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].