All Projects → chinese-tokenizer → Similar Projects or Alternatives

437 Open source projects that are alternatives of or similar to chinese-tokenizer

Email-newsletter-RSS
邮箱 📧 newsletter RSS 荟萃 News
Stars: ✭ 1,225 (+1601.39%)
Mutual labels:  chinese
Lisp Esque Language
💠The Lel programming language
Stars: ✭ 24 (-66.67%)
Mutual labels:  tokenizer
SIMCSE unsup
中文无监督SimCSE Pytorch实现
Stars: ✭ 113 (+56.94%)
Mutual labels:  chinese
Natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+994.44%)
Mutual labels:  tokenizer
embedding study
中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果
Stars: ✭ 94 (+30.56%)
Mutual labels:  chinese
Soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+751.39%)
Mutual labels:  tokenizer
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+133.33%)
Mutual labels:  tokenizer
Tokenizer
A small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+6525%)
Mutual labels:  tokenizer
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (-19.44%)
Mutual labels:  tokenizer
Smoothnlp
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+504.17%)
Mutual labels:  tokenizer
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+129.17%)
Mutual labels:  tokenizer
Moo
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+502.78%)
Mutual labels:  tokenizer
vocascan-frontend
A highly configurable vocabulary trainer
Stars: ✭ 26 (-63.89%)
Mutual labels:  words
Jflex
The fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+427.78%)
Mutual labels:  tokenizer
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+122.22%)
Mutual labels:  tokenizer
Friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+334.72%)
Mutual labels:  tokenizer
tensorflow-chatbot-chinese
網頁聊天機器人 | tensorflow implementation of seq2seq model with bahdanau attention and Word2Vec pretrained embedding
Stars: ✭ 50 (-30.56%)
Mutual labels:  chinese
Sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+306.94%)
Mutual labels:  tokenizer
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+83.33%)
Mutual labels:  tokenizer
pascal-interpreter
A simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-70.83%)
Mutual labels:  tokenizer
chinese-learner
A desktop web application for learning Mandarin Chinese and its character stroke order.
Stars: ✭ 22 (-69.44%)
Mutual labels:  chinese
Hebrew-Tokenizer
A very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-77.78%)
Mutual labels:  tokenizer
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+73.61%)
Mutual labels:  tokenizer
Sharpmath
A small .NET math library.
Stars: ✭ 36 (-50%)
Mutual labels:  tokenizer
discussion
記錄有關繁化姬的議題或是內容
Stars: ✭ 33 (-54.17%)
Mutual labels:  chinese
bredon
A modern CSS value compiler in JavaScript
Stars: ✭ 39 (-45.83%)
Mutual labels:  tokenizer
Syntok
Text tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+70.83%)
Mutual labels:  tokenizer
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-70.83%)
Mutual labels:  tokenizer
dialectID siam
Dialect identification using Siamese network
Stars: ✭ 15 (-79.17%)
Mutual labels:  words
ilmulti
Tooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-73.61%)
Mutual labels:  tokenizer
Tokenizer
Source code tokenizer
Stars: ✭ 119 (+65.28%)
Mutual labels:  tokenizer
liblex
C library for Lexical Analysis
Stars: ✭ 25 (-65.28%)
Mutual labels:  tokenizer
anki-maobi
máobĭ (毛笔) is an Anki add-on to create cards with writing quizzes for Hanzi (Chinese characters)
Stars: ✭ 42 (-41.67%)
Mutual labels:  chinese
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-76.39%)
Mutual labels:  tokenizer
Megamark
😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+38.89%)
Mutual labels:  tokenizer
elasticsearch-plugins
Some native scoring script plugins for elasticsearch
Stars: ✭ 30 (-58.33%)
Mutual labels:  tokenizer
NLPDataAugmentation
Chinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+30.56%)
Mutual labels:  chinese
farasapy
A Python implementation of Farasa toolkit
Stars: ✭ 69 (-4.17%)
Mutual labels:  tokenizer
Djurl
Simple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+18.06%)
Mutual labels:  tokenizer
psr2r-sniffer
A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (-55.56%)
Mutual labels:  tokenizer
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (-31.94%)
Mutual labels:  tokenizer
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+40.28%)
Mutual labels:  tokenizer
Sentence Splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+13.89%)
Mutual labels:  tokenizer
lindera
A morphological analysis library.
Stars: ✭ 226 (+213.89%)
Mutual labels:  tokenizer
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-56.94%)
Mutual labels:  tokenizer
gd-tokenizer
A small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-52.78%)
Mutual labels:  tokenizer
Wirb
Ruby Object Inspection for IRB
Stars: ✭ 69 (-4.17%)
Mutual labels:  tokenizer
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-63.89%)
Mutual labels:  tokenizer
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-47.22%)
Mutual labels:  tokenizer
Shan Shui Inf
Procedurally generated Chinese landscape painting.
Stars: ✭ 3,168 (+4300%)
Mutual labels:  chinese
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-26.39%)
Mutual labels:  tokenizer
Chinese text normalization
Chinese text normalization for speech processing
Stars: ✭ 242 (+236.11%)
Mutual labels:  chinese
Tokenizer
A tokenizer for Icelandic text
Stars: ✭ 27 (-62.5%)
Mutual labels:  tokenizer
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-36.11%)
Mutual labels:  tokenizer
Vanhiupun.github.io
🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (-20.83%)
Mutual labels:  chinese
ime.vim
A Vim input method engine
Stars: ✭ 74 (+2.78%)
Mutual labels:  chinese
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (-25%)
Mutual labels:  chinese
Sublime-Fanhuaji
繁化姬的 Sublime Text 插件
Stars: ✭ 48 (-33.33%)
Mutual labels:  chinese
hanzi-pinyin-font
Chinese font displaying Hanzi (汉字) characters with by transliteration/pronunciation (Pīnyīn).
Stars: ✭ 79 (+9.72%)
Mutual labels:  chinese
Nlp Js Tools French
POS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-55.56%)
Mutual labels:  tokenizer
61-120 of 437 similar projects