SymspellSymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+3774.51%)
spellchecker-wasmSpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell
Stars: ✭ 46 (-9.8%)
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-45.1%)
LinSpellFast approximate strings search & spelling correction
Stars: ✭ 52 (+1.96%)
spellSpelling correction and string segmentation written in Go
Stars: ✭ 24 (-52.94%)
TextdistanceCompute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+4949.02%)
spacy hunspell✏️ Hunspell extension for spaCy 2.0.
Stars: ✭ 94 (+84.31%)
Nkocr🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Stars: ✭ 15 (-70.59%)
Java String SimilarityImplementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+4611.76%)
ckipnlpCKIP CoreNLP Toolkits
Stars: ✭ 92 (+80.39%)
LevenshteinThe Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (-25.49%)
sentencepieceR package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Stars: ✭ 22 (-56.86%)
neuspellNeuSpell: A Neural Spelling Correction Toolkit
Stars: ✭ 524 (+927.45%)
viconfMy (n)Vim config files
Stars: ✭ 18 (-64.71%)
seqalign pathingRust implementation of sequence alignment / Levenshtein distance by A* acceleration of the DP algorithm
Stars: ✭ 17 (-66.67%)
affinegap📐 A Cython implementation of the affine gap string distance
Stars: ✭ 57 (+11.76%)
sentencepiece-jniJava JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (-49.02%)
sheldonVery Simple Erlang Spell Checker
Stars: ✭ 63 (+23.53%)
polylevenFast Levenshtein Distance Library for Python 3
Stars: ✭ 37 (-27.45%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-52.94%)
Angry-ReviewerStyle corrector for academic writing and scientific papers at angryreviewer.com
Stars: ✭ 69 (+35.29%)
sqlite-spellfixLoadable spellfix1 extension for sqlite as python package
Stars: ✭ 13 (-74.51%)
eddieNo description or website provided.
Stars: ✭ 18 (-64.71%)
sylbreakSyllable segmentation tool for Myanmar language (Burmese) by Ye.
Stars: ✭ 44 (-13.73%)
MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+298.04%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (+35.29%)
edits.crEdit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-68.63%)
dnn-lstm-word-segmentChinese Word Segmention Base on the Deep Learning and LSTM Neural Network
Stars: ✭ 24 (-52.94%)
hunspellHigh-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+98.04%)
PycantoneseCantonese Linguistics and NLP in Python
Stars: ✭ 147 (+188.24%)
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (-19.61%)
hunspell-asmWebAssembly based Javascript bindings for hunspell spellchecker
Stars: ✭ 60 (+17.65%)
esappAn unsupervised Chinese word segmentation tool.
Stars: ✭ 13 (-74.51%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (+86.27%)
sktSanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-58.82%)
Han Segment基于隐式马尔可夫模型和正向最大化匹配的中文分词系统
Stars: ✭ 17 (-66.67%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+268.63%)
mingw-w64-texmacsTeXmacs for Windows (build in MSys2/Mingw32 environment)
Stars: ✭ 21 (-58.82%)
Lac百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+5374.51%)
KiwiKiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (+109.8%)
stringdistanceA fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+17.65%)
CwsSource code for an ACL2016 paper of Chinese word segmentation
Stars: ✭ 81 (+58.82%)
codeprepA toolkit for pre-processing large source code corpora
Stars: ✭ 39 (-23.53%)
YoutokentomeUnsupervised text tokenizer focused on computational efficiency
Stars: ✭ 728 (+1327.45%)
deep-spell-checkrKeras implementation of character-level sequence-to-sequence learning for spelling correction
Stars: ✭ 65 (+27.45%)
ka GE.spellქართული ორთოგრაფიული ლექსიკონი - Georgian Spell Checking Dictionary
Stars: ✭ 24 (-52.94%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+196.08%)
grammarifyGrammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
Stars: ✭ 43 (-15.69%)