Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Stars: ✭ 2,403 (+4611.76%)

Mutual labels: levenshtein-distance, damerau-levenshtein

ckipnlp

CKIP CoreNLP Toolkits

Stars: ✭ 92 (+80.39%)

Mutual labels: word-segmentation

Levenshtein

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Stars: ✭ 38 (-25.49%)

Mutual labels: levenshtein-distance

sentencepiece

R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece

Stars: ✭ 22 (-56.86%)

Mutual labels: word-segmentation

neuspell

NeuSpell: A Neural Spelling Correction Toolkit

Stars: ✭ 524 (+927.45%)

Mutual labels: spelling-correction

viconf

My (n)Vim config files

Stars: ✭ 18 (-64.71%)

Mutual labels: spellchecker

pytorch Joint-Word-Segmentation-and-POS-Tagging

Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging

Stars: ✭ 37 (-27.45%)

Mutual labels: word-segmentation

seqalign pathing

Rust implementation of sequence alignment / Levenshtein distance by A* acceleration of the DP algorithm

Stars: ✭ 17 (-66.67%)

Mutual labels: levenshtein-distance

affinegap

📐 A Cython implementation of the affine gap string distance

Stars: ✭ 57 (+11.76%)

Mutual labels: levenshtein-distance

sentencepiece-jni

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

Stars: ✭ 26 (-49.02%)

Mutual labels: word-segmentation

sheldon

Very Simple Erlang Spell Checker

Stars: ✭ 63 (+23.53%)

Mutual labels: spelling-correction

levenshtein.c

Levenshtein algorithm in C

Stars: ✭ 77 (+50.98%)

Mutual labels: levenshtein-distance

FastFuzzyStringMatcherDotNet

A BK tree implementation for fast fuzzy string matching

Stars: ✭ 23 (-54.9%)

Mutual labels: levenshtein-distance

polyleven

Fast Levenshtein Distance Library for Python 3

Stars: ✭ 37 (-27.45%)

Mutual labels: levenshtein-distance

ceja

PySpark phonetic and string matching algorithms

Stars: ✭ 24 (-52.94%)

Mutual labels: damerau-levenshtein

Angry-Reviewer

Style corrector for academic writing and scientific papers at angryreviewer.com

Stars: ✭ 69 (+35.29%)

Mutual labels: spellchecker

sqlite-spellfix

Loadable spellfix1 extension for sqlite as python package

Stars: ✭ 13 (-74.51%)

Mutual labels: spelling-correction

sentences-similarity-cluster

Calculate similarity of sentences & Cluster the result.

Stars: ✭ 14 (-72.55%)

Mutual labels: levenshtein-distance

ocr-machine-learning

OCR Machine Learning in python

Stars: ✭ 42 (-17.65%)

Mutual labels: spelling-correction

eddie

No description or website provided.

Stars: ✭ 18 (-64.71%)

Mutual labels: damerau-levenshtein

sylbreak

Syllable segmentation tool for Myanmar language (Burmese) by Ye.

Stars: ✭ 44 (-13.73%)

Mutual labels: word-segmentation

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (+298.04%)

Mutual labels: word-segmentation

hanzi-tools

Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.

Stars: ✭ 69 (+35.29%)

Mutual labels: word-segmentation

word tokenize

Vietnamese Word Tokenize

Stars: ✭ 45 (-11.76%)

Mutual labels: word-segmentation

edits.cr

Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

Stars: ✭ 16 (-68.63%)

Mutual labels: damerau-levenshtein

dnn-lstm-word-segment

Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network

Stars: ✭ 24 (-52.94%)

Mutual labels: word-segmentation

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Stars: ✭ 101 (+98.04%)

Mutual labels: spellchecker

Pycantonese

Cantonese Linguistics and NLP in Python

Stars: ✭ 147 (+188.24%)

Mutual labels: word-segmentation

SynThai

Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning

Stars: ✭ 41 (-19.61%)

Mutual labels: word-segmentation

hunspell-asm

WebAssembly based Javascript bindings for hunspell spellchecker

Stars: ✭ 60 (+17.65%)

Mutual labels: spellchecker

esapp

An unsupervised Chinese word segmentation tool.

Stars: ✭ 13 (-74.51%)

Mutual labels: word-segmentation

kotlin-java-spellchecker

A simple spellcheckers on Java and Kotlin

Stars: ✭ 13 (-74.51%)

Mutual labels: spellchecker

Toiro

A comparison tool of Japanese tokenizers

Stars: ✭ 95 (+86.27%)

Mutual labels: word-segmentation

skt

Sanskrit compound segmentation using seq2seq model

Stars: ✭ 21 (-58.82%)

Mutual labels: word-segmentation

Han Segment

基于隐式马尔可夫模型和正向最大化匹配的中文分词系统

Stars: ✭ 17 (-66.67%)

Mutual labels: word-segmentation

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (+268.63%)

Mutual labels: levenshtein-distance

mingw-w64-texmacs

TeXmacs for Windows (build in MSys2/Mingw32 environment)

Stars: ✭ 21 (-58.82%)

Mutual labels: spellchecker

Lac

百度NLP：分词，词性标注，命名实体识别，词重要性

Stars: ✭ 2,792 (+5374.51%)

Mutual labels: word-segmentation

Intelligent Document Finder

Document Search Engine Tool

Stars: ✭ 45 (-11.76%)

Mutual labels: spellchecker

Kiwi

Kiwi(지능형 한국어 형태소 분석기)

Stars: ✭ 107 (+109.8%)

Mutual labels: word-segmentation

stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..

Stars: ✭ 60 (+17.65%)

Mutual labels: levenshtein-distance

Cws

Source code for an ACL2016 paper of Chinese word segmentation

Stars: ✭ 81 (+58.82%)

Mutual labels: word-segmentation

codeprep

A toolkit for pre-processing large source code corpora

Stars: ✭ 39 (-23.53%)

Mutual labels: word-segmentation

Youtokentome

Unsupervised text tokenizer focused on computational efficiency

Stars: ✭ 728 (+1327.45%)

Mutual labels: word-segmentation

deep-spell-checkr

Keras implementation of character-level sequence-to-sequence learning for spelling correction

Stars: ✭ 65 (+27.45%)

Mutual labels: spelling-correction

vscode-languagetool-linter

A from scratch redesign of LanguageTool integration for VS Code.

Stars: ✭ 72 (+41.18%)

Mutual labels: spellchecker

ka GE.spell

ქართული ორთოგრაფიული ლექსიკონი - Georgian Spell Checking Dictionary

Stars: ✭ 24 (-52.94%)

Mutual labels: spelling-correction

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+196.08%)

Mutual labels: word-segmentation

grammarify

Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.

Stars: ✭ 43 (-15.69%)

Mutual labels: spelling-correction

1-60 of 76 similar projects

›