KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+361.67%)
LfuzzerFuzzing Parsers with Tokens
Stars: ✭ 28 (-76.67%)
MooOptimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+261.67%)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-82.5%)
YomichanJapanese pop-up dictionary extension for Chrome and Firefox.
Stars: ✭ 464 (+286.67%)
TalismaneNLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Stars: ✭ 38 (-68.33%)
FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+160.83%)
Sentence SplitterText to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (-31.67%)
Snl CompilerSNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
Stars: ✭ 19 (-84.17%)
GreynirThe greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-60.83%)
Smoothnlp专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+262.5%)
DjurlSimple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (-29.17%)
JflexThe fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+216.67%)
Nlp Js Tools FrenchPOS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-73.33%)
SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+144.17%)
Languagepod101 ScraperPython scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Stars: ✭ 104 (-13.33%)
KanaQuizA simple app to quiz the user on identifying Japanese characters.
Stars: ✭ 19 (-84.17%)
cang-jieChinese tokenizer for tantivy, based on jieba-rs
Stars: ✭ 48 (-60%)
WirbRuby Object Inspection for IRB
Stars: ✭ 69 (-42.5%)
Mustard🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Stars: ✭ 689 (+474.17%)
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-88.33%)
OsetiDictionary based Sentiment Analysis for Japanese
Stars: ✭ 49 (-59.17%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+56.67%)
TokenizerA small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+3875%)
SomajoA tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (-29.17%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+265%)
Py NltoolsA collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-61.67%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+260.83%)
KadotKadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-10%)
Php Parser🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (+233.33%)
SharpmathA small .NET math library.
Stars: ✭ 36 (-70%)
LexmachineLex machinary for go.
Stars: ✭ 335 (+179.17%)
HippoPHP standards checker.
Stars: ✭ 82 (-31.67%)
SentencesA multilingual command line sentence tokenizer in Golang
Stars: ✭ 293 (+144.17%)
Omnicat BayesNaive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-75%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+111.67%)
TokenizerSource code tokenizer
Stars: ✭ 119 (-0.83%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-84.17%)
unofficial-jisho-apiEncapsulates the official Jisho.org API and also provides kanji, example, and stroke diagram search.
Stars: ✭ 88 (-26.67%)
Hebrew-TokenizerA very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-86.67%)
ebe-datasetEvidence-based Explanation Dataset (AACL-IJCNLP 2020)
Stars: ✭ 16 (-86.67%)
Megamark😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (-16.67%)
madomagiOOP👨💻♐ OOP learning with anime magical girl. (魔法少女で学ぶオブジェクト指向)🧙
Stars: ✭ 17 (-85.83%)
NatashaSolves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+556.67%)
String CalcPHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (-50%)
JanomeJapanese morphological analysis engine written in pure Python
Stars: ✭ 630 (+425%)
ScattertextBeautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+1335%)
TopokanjiTopologically ordered lists of kanji for effective learning
Stars: ✭ 108 (-10%)
The Tab Of WordsA minimal Chrome / Firefox extension to help you learn Japanese words in each new tab.
Stars: ✭ 94 (-21.67%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (-55.83%)
Soynlp한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+410.83%)