All Projects → codeprep → Similar Projects or Alternatives

65 Open source projects that are alternatives of or similar to codeprep

Han Segment
基于隐式马尔可夫模型和正向最大化匹配的中文分词系统
Stars: ✭ 17 (-56.41%)
Mutual labels:  word-segmentation
Nagisa
A Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+566.67%)
Mutual labels:  word-segmentation
objc-runtime-CN
Objective-C Runtime Analysis (Objective-C运行时分析)
Stars: ✭ 28 (-28.21%)
Mutual labels:  source-code-analysis
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+4966.67%)
Mutual labels:  word-segmentation
customized-symspell
Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (+30.77%)
Mutual labels:  word-segmentation
word tokenize
Vietnamese Word Tokenize
Stars: ✭ 45 (+15.38%)
Mutual labels:  word-segmentation
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1010.26%)
Mutual labels:  word-segmentation
group-transformer
Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
Stars: ✭ 21 (-46.15%)
Mutual labels:  language-modeling
rakutenma-python
Rakuten MA (Python version)
Stars: ✭ 15 (-61.54%)
Mutual labels:  word-segmentation
LNEx
📍 🏢 🏦 🏣 🏪 🏬 LNEx: Location Name Extractor
Stars: ✭ 21 (-46.15%)
Mutual labels:  language-modeling
Lac
百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+7058.97%)
Mutual labels:  word-segmentation
MSR2021-ProgramRepair
Code of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.
Stars: ✭ 26 (-33.33%)
spell
Spelling correction and string segmentation written in Go
Stars: ✭ 24 (-38.46%)
Mutual labels:  word-segmentation
Toiro
A comparison tool of Japanese tokenizers
Stars: ✭ 95 (+143.59%)
Mutual labels:  word-segmentation
IndRNN pytorch
Independently Recurrent Neural Networks (IndRNN) implemented in pytorch.
Stars: ✭ 112 (+187.18%)
Mutual labels:  language-modeling
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (+1392.31%)
Mutual labels:  word-segmentation
SynThai
Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (+5.13%)
Mutual labels:  word-segmentation
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+874.36%)
Mutual labels:  word-segmentation
android-source-codes
⚙️ Code analysis of common Android projects and components.
Stars: ✭ 59 (+51.28%)
Mutual labels:  source-code-analysis
hashformers
Hashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-53.85%)
Mutual labels:  word-segmentation
Darts
Differentiable architecture search for convolutional and recurrent networks
Stars: ✭ 3,463 (+8779.49%)
Mutual labels:  language-modeling
UETsegmenter
A toolkit for Vietnamese word segmentation
Stars: ✭ 60 (+53.85%)
Mutual labels:  word-segmentation
WordSegmentationDP
Word Segmentation with Dynamic Programming
Stars: ✭ 18 (-53.85%)
Mutual labels:  word-segmentation
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+287.18%)
Mutual labels:  word-segmentation
theano-recurrence
Recurrent Neural Networks (RNN, GRU, LSTM) and their Bidirectional versions (BiRNN, BiGRU, BiLSTM) for word & character level language modelling in Theano
Stars: ✭ 40 (+2.56%)
Mutual labels:  language-modeling
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+420.51%)
Mutual labels:  word-segmentation
spiral
A Python 3 module that provides functions for splitting identifiers found in source code files.
Stars: ✭ 37 (-5.13%)
tape-neurips2019
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (DEPRECATED)
Stars: ✭ 117 (+200%)
Mutual labels:  language-modeling
Pycantonese
Cantonese Linguistics and NLP in Python
Stars: ✭ 147 (+276.92%)
Mutual labels:  word-segmentation
JPlag
Detecting Software Plagiarism and Collusion since 1996.
Stars: ✭ 674 (+1628.21%)
Mutual labels:  source-code-analysis
Kiwi
Kiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (+174.36%)
Mutual labels:  word-segmentation
rust-code-analysis
Library to analyze and collect metrics on source code
Stars: ✭ 171 (+338.46%)
Mutual labels:  source-code-analysis
Cws
Source code for an ACL2016 paper of Chinese word segmentation
Stars: ✭ 81 (+107.69%)
Mutual labels:  word-segmentation
pytorch Joint-Word-Segmentation-and-POS-Tagging
Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging
Stars: ✭ 37 (-5.13%)
Mutual labels:  word-segmentation
Youtokentome
Unsupervised text tokenizer focused on computational efficiency
Stars: ✭ 728 (+1766.67%)
Mutual labels:  word-segmentation
SZZUnleashed
An implementation of the SZZ algorithm, i.e., an approach to identify bug-introducing commits.
Stars: ✭ 90 (+130.77%)
Sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 5,540 (+14105.13%)
Mutual labels:  word-segmentation
skt
Sanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-46.15%)
Mutual labels:  word-segmentation
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+976.92%)
Mutual labels:  word-segmentation
get-source
Fetch source-mapped sources. Peek by file, line, column. Node & browsers. Sync & async.
Stars: ✭ 26 (-33.33%)
Mutual labels:  source-code-analysis
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+807.69%)
Mutual labels:  word-segmentation
referit3d
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Stars: ✭ 59 (+51.28%)
Mutual labels:  language-modeling
Jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+551.28%)
Mutual labels:  word-segmentation
esapp
An unsupervised Chinese word segmentation tool.
Stars: ✭ 13 (-66.67%)
Mutual labels:  word-segmentation
cws-tensorflow
基于Tensorflow的中文分词模型
Stars: ✭ 25 (-35.9%)
Mutual labels:  word-segmentation
ckipnlp
CKIP CoreNLP Toolkits
Stars: ✭ 92 (+135.9%)
Mutual labels:  word-segmentation
youtokentome-ruby
High performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-56.41%)
Mutual labels:  word-segmentation
Babler
Data Collection System For NLP/Speech Recognition
Stars: ✭ 21 (-46.15%)
Mutual labels:  language-modeling
SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-28.21%)
Mutual labels:  word-segmentation
pytorch-translm
An implementation of transformer-based language model for sentence rewriting tasks such as summarization, simplification, and grammatical error correction.
Stars: ✭ 22 (-43.59%)
Mutual labels:  language-modeling
hanzi-tools
Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (+76.92%)
Mutual labels:  word-segmentation
mozolm
MozoLM: A language model (LM) serving library
Stars: ✭ 32 (-17.95%)
Mutual labels:  language-modeling
dnn-lstm-word-segment
Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network
Stars: ✭ 24 (-38.46%)
Mutual labels:  word-segmentation
sentencepiece
R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Stars: ✭ 22 (-43.59%)
Mutual labels:  word-segmentation
rnn darts fastai
Implement Differentiable Architecture Search (DARTS) for RNN with fastai
Stars: ✭ 21 (-46.15%)
Mutual labels:  language-modeling
deepblast
Neural Networks for Protein Sequence Alignment
Stars: ✭ 29 (-25.64%)
Mutual labels:  language-modeling
sylbreak
Syllable segmentation tool for Myanmar language (Burmese) by Ye.
Stars: ✭ 44 (+12.82%)
Mutual labels:  word-segmentation
sentencepiece-jni
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (-33.33%)
Mutual labels:  word-segmentation
linguistics problems
Natural language processing in examples and games
Stars: ✭ 23 (-41.03%)
Mutual labels:  language-modeling
lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+1653.85%)
Mutual labels:  language-modeling
1-60 of 65 similar projects