Han Segment基于隐式马尔可夫模型和正向最大化匹配的中文分词系统
Stars: ✭ 17 (-56.41%)
NagisaA Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+566.67%)
objc-runtime-CNObjective-C Runtime Analysis (Objective-C运行时分析)
Stars: ✭ 28 (-28.21%)
SymspellSymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+4966.67%)
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (+30.77%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1010.26%)
group-transformerOfficial code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
Stars: ✭ 21 (-46.15%)
LNEx📍 🏢 🏦 🏣 🏪 🏬 LNEx: Location Name Extractor
Stars: ✭ 21 (-46.15%)
Lac百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+7058.97%)
MSR2021-ProgramRepairCode of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.
Stars: ✭ 26 (-33.33%)
spellSpelling correction and string segmentation written in Go
Stars: ✭ 24 (-38.46%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (+143.59%)
IndRNN pytorchIndependently Recurrent Neural Networks (IndRNN) implemented in pytorch.
Stars: ✭ 112 (+187.18%)
PythainlpThai Natural Language Processing in Python.
Stars: ✭ 582 (+1392.31%)
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (+5.13%)
android-source-codes⚙️ Code analysis of common Android projects and components.
Stars: ✭ 59 (+51.28%)
hashformersHashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-53.85%)
DartsDifferentiable architecture search for convolutional and recurrent networks
Stars: ✭ 3,463 (+8779.49%)
UETsegmenterA toolkit for Vietnamese word segmentation
Stars: ✭ 60 (+53.85%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+287.18%)
theano-recurrenceRecurrent Neural Networks (RNN, GRU, LSTM) and their Bidirectional versions (BiRNN, BiGRU, BiLSTM) for word & character level language modelling in Theano
Stars: ✭ 40 (+2.56%)
MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+420.51%)
spiralA Python 3 module that provides functions for splitting identifiers found in source code files.
Stars: ✭ 37 (-5.13%)
tape-neurips2019Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (DEPRECATED)
Stars: ✭ 117 (+200%)
PycantoneseCantonese Linguistics and NLP in Python
Stars: ✭ 147 (+276.92%)
JPlagDetecting Software Plagiarism and Collusion since 1996.
Stars: ✭ 674 (+1628.21%)
KiwiKiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (+174.36%)
rust-code-analysisLibrary to analyze and collect metrics on source code
Stars: ✭ 171 (+338.46%)
CwsSource code for an ACL2016 paper of Chinese word segmentation
Stars: ✭ 81 (+107.69%)
YoutokentomeUnsupervised text tokenizer focused on computational efficiency
Stars: ✭ 728 (+1766.67%)
SZZUnleashedAn implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
Stars: ✭ 90 (+130.77%)
SentencepieceUnsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 5,540 (+14105.13%)
sktSanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-46.15%)
SymspellpyPython port of SymSpell
Stars: ✭ 420 (+976.92%)
get-sourceFetch source-mapped sources. Peek by file, line, column. Node & browsers. Sync & async.
Stars: ✭ 26 (-33.33%)
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+807.69%)
referit3dCode accompanying our ECCV-2020 paper on 3D Neural Listeners.
Stars: ✭ 59 (+51.28%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+551.28%)
esappAn unsupervised Chinese word segmentation tool.
Stars: ✭ 13 (-66.67%)
ckipnlpCKIP CoreNLP Toolkits
Stars: ✭ 92 (+135.9%)
youtokentome-rubyHigh performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-56.41%)
BablerData Collection System For NLP/Speech Recognition
Stars: ✭ 21 (-46.15%)
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-28.21%)
pytorch-translmAn implementation of transformer-based language model for sentence rewriting tasks such as summarization, simplification, and grammatical error correction.
Stars: ✭ 22 (-43.59%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (+76.92%)
mozolmMozoLM: A language model (LM) serving library
Stars: ✭ 32 (-17.95%)
dnn-lstm-word-segmentChinese Word Segmention Base on the Deep Learning and LSTM Neural Network
Stars: ✭ 24 (-38.46%)
sentencepieceR package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Stars: ✭ 22 (-43.59%)
rnn darts fastaiImplement Differentiable Architecture Search (DARTS) for RNN with fastai
Stars: ✭ 21 (-46.15%)
deepblastNeural Networks for Protein Sequence Alignment
Stars: ✭ 29 (-25.64%)
sylbreakSyllable segmentation tool for Myanmar language (Burmese) by Ye.
Stars: ✭ 44 (+12.82%)
sentencepiece-jniJava JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (-33.33%)
lingua-go👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+1653.85%)