All Projects → chinese-tokenizer → Similar Projects or Alternatives

437 Open source projects that are alternatives of or similar to chinese-tokenizer

PaddleTokenizer
使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-80.56%)
Mutual labels:  tokenizer, chinese
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-37.5%)
Mutual labels:  words
Tokenizers
Fast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+123.61%)
Mutual labels:  tokenizer
Japanesetokenizers
aim to use JapaneseTokenizer as easy as possible
Stars: ✭ 120 (+66.67%)
Mutual labels:  tokenizer
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-77.78%)
Mutual labels:  tokenizer
date-extractor
Extract dates from text
Stars: ✭ 58 (-19.44%)
Mutual labels:  chinese
Works For Me
Collection of developer toolkits
Stars: ✭ 131 (+81.94%)
Mutual labels:  tokenizer
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-37.5%)
Mutual labels:  tokenizer
ark-pixel-font
Open source Pan-CJK pixel font / 开源的泛中日韩像素字体
Stars: ✭ 1,767 (+2354.17%)
Mutual labels:  chinese
Somajo
A tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (+18.06%)
Mutual labels:  tokenizer
Cols Agent Tasks
Colin's ALM Corner Custom Build Tasks
Stars: ✭ 70 (-2.78%)
Mutual labels:  tokenizer
AiSpace
AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-61.11%)
Mutual labels:  chinese
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-63.89%)
Mutual labels:  tokenizer
Js Tokens
Tiny JavaScript tokenizer.
Stars: ✭ 166 (+130.56%)
Mutual labels:  tokenizer
eslint-config-mingelz
A shared ESLint configuration with Chinese comments. 一份带有完整中文注释的 ESLint 规则。
Stars: ✭ 15 (-79.17%)
Mutual labels:  chinese
Lex
Replaced by foonathan/lexy
Stars: ✭ 137 (+90.28%)
Mutual labels:  tokenizer
exhentai-tags-chinese-translation
E-Hentai/ExHentai 全部 TAGs 中文翻译
Stars: ✭ 273 (+279.17%)
Mutual labels:  chinese
Chevrotain
Parser Building Toolkit for JavaScript
Stars: ✭ 1,795 (+2393.06%)
Mutual labels:  tokenizer
ModernSecurityProtectionGuide
Modern Security Protection Guide
Stars: ✭ 72 (+0%)
Mutual labels:  chinese
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+50%)
Mutual labels:  tokenizer
say-it
TTS in command line -- Pronounce the Chinese and English words you typed in.
Stars: ✭ 19 (-73.61%)
Mutual labels:  chinese
Hippo
PHP standards checker.
Stars: ✭ 82 (+13.89%)
Mutual labels:  tokenizer
next-qrcode
React hooks for generating QRCode for your next React apps.
Stars: ✭ 87 (+20.83%)
Mutual labels:  chinese
rime-wugniu zaonhe
上海吳語拼音輸入方案 · 上海吴语拼音输入方案 · Rime input schemas for Shanghai Dialects
Stars: ✭ 20 (-72.22%)
Mutual labels:  chinese
String Calc
PHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (-16.67%)
Mutual labels:  tokenizer
Greynir
The greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-34.72%)
Mutual labels:  tokenizer
MixPoet
Source codes of MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space (AAAI 2020)
Stars: ✭ 141 (+95.83%)
Mutual labels:  chinese
Email-newsletter-RSS
邮箱 📧 newsletter RSS 荟萃 News
Stars: ✭ 1,225 (+1601.39%)
Mutual labels:  chinese
SIMCSE unsup
中文无监督SimCSE Pytorch实现
Stars: ✭ 113 (+56.94%)
Mutual labels:  chinese
embedding study
中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果
Stars: ✭ 94 (+30.56%)
Mutual labels:  chinese
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+133.33%)
Mutual labels:  tokenizer
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (-19.44%)
Mutual labels:  tokenizer
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+129.17%)
Mutual labels:  tokenizer
vocascan-frontend
A highly configurable vocabulary trainer
Stars: ✭ 26 (-63.89%)
Mutual labels:  words
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+122.22%)
Mutual labels:  tokenizer
tensorflow-chatbot-chinese
網頁聊天機器人 | tensorflow implementation of seq2seq model with bahdanau attention and Word2Vec pretrained embedding
Stars: ✭ 50 (-30.56%)
Mutual labels:  chinese
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+83.33%)
Mutual labels:  tokenizer
chinese-learner
A desktop web application for learning Mandarin Chinese and its character stroke order.
Stars: ✭ 22 (-69.44%)
Mutual labels:  chinese
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+73.61%)
Mutual labels:  tokenizer
discussion
記錄有關繁化姬的議題或是內容
Stars: ✭ 33 (-54.17%)
Mutual labels:  chinese
Syntok
Text tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+70.83%)
Mutual labels:  tokenizer
dialectID siam
Dialect identification using Siamese network
Stars: ✭ 15 (-79.17%)
Mutual labels:  words
Tokenizer
Source code tokenizer
Stars: ✭ 119 (+65.28%)
Mutual labels:  tokenizer
anki-maobi
máobĭ (毛笔) is an Anki add-on to create cards with writing quizzes for Hanzi (Chinese characters)
Stars: ✭ 42 (-41.67%)
Mutual labels:  chinese
Megamark
😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+38.89%)
Mutual labels:  tokenizer
NLPDataAugmentation
Chinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+30.56%)
Mutual labels:  chinese
Djurl
Simple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+18.06%)
Mutual labels:  tokenizer
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (-31.94%)
Mutual labels:  tokenizer
Sentence Splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+13.89%)
Mutual labels:  tokenizer
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-56.94%)
Mutual labels:  tokenizer
Wirb
Ruby Object Inspection for IRB
Stars: ✭ 69 (-4.17%)
Mutual labels:  tokenizer
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-47.22%)
Mutual labels:  tokenizer
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-26.39%)
Mutual labels:  tokenizer
Tokenizer
A tokenizer for Icelandic text
Stars: ✭ 27 (-62.5%)
Mutual labels:  tokenizer
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-36.11%)
Mutual labels:  tokenizer
Robot Arm Write Chinese
使用uArm Swift Pro机械臂写中文-毛笔字
Stars: ✭ 57 (-20.83%)
Mutual labels:  chinese
Vanhiupun.github.io
🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (-20.83%)
Mutual labels:  chinese
ime.vim
A Vim input method engine
Stars: ✭ 74 (+2.78%)
Mutual labels:  chinese
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (-25%)
Mutual labels:  chinese
Sublime-Fanhuaji
繁化姬的 Sublime Text 插件
Stars: ✭ 48 (-33.33%)
Mutual labels:  chinese
1-60 of 437 similar projects