PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-80.56%)
Word2VecAndTsneScripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-37.5%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+123.61%)
greebGreeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-77.78%)
Works For MeCollection of developer toolkits
Stars: ✭ 131 (+81.94%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-37.5%)
ark-pixel-fontOpen source Pan-CJK pixel font / 开源的泛中日韩像素字体
Stars: ✭ 1,767 (+2354.17%)
SomajoA tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (+18.06%)
AiSpaceAiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-61.11%)
lexertkC++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-63.89%)
Js TokensTiny JavaScript tokenizer.
Stars: ✭ 166 (+130.56%)
eslint-config-mingelzA shared ESLint configuration with Chinese comments. 一份带有完整中文注释的 ESLint 规则。
Stars: ✭ 15 (-79.17%)
LexReplaced by foonathan/lexy
Stars: ✭ 137 (+90.28%)
ChevrotainParser Building Toolkit for JavaScript
Stars: ✭ 1,795 (+2393.06%)
KadotKadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+50%)
say-itTTS in command line -- Pronounce the Chinese and English words you typed in.
Stars: ✭ 19 (-73.61%)
HippoPHP standards checker.
Stars: ✭ 82 (+13.89%)
next-qrcodeReact hooks for generating QRCode for your next React apps.
Stars: ✭ 87 (+20.83%)
rime-wugniu zaonhe上海吳語拼音輸入方案 · 上海吴语拼音输入方案 · Rime input schemas for Shanghai Dialects
Stars: ✭ 20 (-72.22%)
String CalcPHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (-16.67%)
GreynirThe greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-34.72%)
MixPoetSource codes of MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space (AAAI 2020)
Stars: ✭ 141 (+95.83%)
BitextorBitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+133.33%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (-19.44%)
Query TranslatorQuery Translator is a search query translator with AST representation
Stars: ✭ 165 (+129.17%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+122.22%)
tensorflow-chatbot-chinese網頁聊天機器人 | tensorflow implementation of seq2seq model with bahdanau attention and Word2Vec pretrained embedding
Stars: ✭ 50 (-30.56%)
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+83.33%)
chinese-learnerA desktop web application for learning Mandarin Chinese and its character stroke order.
Stars: ✭ 22 (-69.44%)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+73.61%)
SyntokText tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+70.83%)
dialectID siamDialect identification using Siamese network
Stars: ✭ 15 (-79.17%)
TokenizerSource code tokenizer
Stars: ✭ 119 (+65.28%)
anki-maobimáobĭ (毛笔) is an Anki add-on to create cards with writing quizzes for Hanzi (Chinese characters)
Stars: ✭ 42 (-41.67%)
Megamark😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+38.89%)
NLPDataAugmentationChinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+30.56%)
DjurlSimple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+18.06%)
Roy VnTokenizerVietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (-31.94%)
Sentence SplitterText to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+13.89%)
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-56.94%)
WirbRuby Object Inspection for IRB
Stars: ✭ 69 (-4.17%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-47.22%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (-26.39%)
TokenizerA tokenizer for Icelandic text
Stars: ✭ 27 (-62.5%)
Py NltoolsA collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-36.11%)
Vanhiupun.github.io🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (-20.83%)
ime.vimA Vim input method engine
Stars: ✭ 74 (+2.78%)
word2vec-moviesBag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (-25%)