NatashaSolves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+994.44%)
Soynlp한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+751.39%)
BitextorBitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+133.33%)
TokenizerA small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+6525%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (-19.44%)
Smoothnlp专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+504.17%)
Query TranslatorQuery Translator is a search query translator with AST representation
Stars: ✭ 165 (+129.17%)
MooOptimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+502.78%)
JflexThe fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+427.78%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+122.22%)
FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+334.72%)
tensorflow-chatbot-chinese網頁聊天機器人 | tensorflow implementation of seq2seq model with bahdanau attention and Word2Vec pretrained embedding
Stars: ✭ 50 (-30.56%)
SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+306.94%)
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+83.33%)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-70.83%)
chinese-learnerA desktop web application for learning Mandarin Chinese and its character stroke order.
Stars: ✭ 22 (-69.44%)
Hebrew-TokenizerA very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-77.78%)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+73.61%)
SharpmathA small .NET math library.
Stars: ✭ 36 (-50%)
bredonA modern CSS value compiler in JavaScript
Stars: ✭ 39 (-45.83%)
SyntokText tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+70.83%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-70.83%)
dialectID siamDialect identification using Siamese network
Stars: ✭ 15 (-79.17%)
ilmultiTooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-73.61%)
TokenizerSource code tokenizer
Stars: ✭ 119 (+65.28%)
liblexC library for Lexical Analysis
Stars: ✭ 25 (-65.28%)
anki-maobimáobĭ (毛笔) is an Anki add-on to create cards with writing quizzes for Hanzi (Chinese characters)
Stars: ✭ 42 (-41.67%)
berserkerBerserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-76.39%)
Megamark😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+38.89%)
NLPDataAugmentationChinese NLP Data Augmentation, BERT Contextual Augmentation
Stars: ✭ 94 (+30.56%)
farasapyA Python implementation of Farasa toolkit
Stars: ✭ 69 (-4.17%)
DjurlSimple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+18.06%)
psr2r-snifferA PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (-55.56%)
Roy VnTokenizerVietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (-31.94%)
hunspellHigh-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+40.28%)
Sentence SplitterText to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+13.89%)
linderaA morphological analysis library.
Stars: ✭ 226 (+213.89%)
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-56.94%)
gd-tokenizerA small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-52.78%)
WirbRuby Object Inspection for IRB
Stars: ✭ 69 (-4.17%)
xontrib-output-searchGet identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-63.89%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-47.22%)
Shan Shui InfProcedurally generated Chinese landscape painting.
Stars: ✭ 3,168 (+4300%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (-26.39%)
TokenizerA tokenizer for Icelandic text
Stars: ✭ 27 (-62.5%)
Py NltoolsA collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-36.11%)
Vanhiupun.github.io🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (-20.83%)
ime.vimA Vim input method engine
Stars: ✭ 74 (+2.78%)
word2vec-moviesBag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (-25%)
hanzi-pinyin-fontChinese font displaying Hanzi (汉字) characters with by transliteration/pronunciation (Pīnyīn).
Stars: ✭ 79 (+9.72%)
Nlp Js Tools FrenchPOS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-55.56%)