KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+1687.1%)
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (+3.23%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+719.35%)
Nlp Js Tools FrenchPOS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (+3.23%)
GreynirThe greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (+51.61%)
Works For MeCollection of developer toolkits
Stars: ✭ 131 (+322.58%)
greebGreeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-48.39%)
Mustard🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Stars: ✭ 689 (+2122.58%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+1312.9%)
String CalcPHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (+93.55%)
LexReplaced by foonathan/lexy
Stars: ✭ 137 (+341.94%)
TalismaneNLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Stars: ✭ 38 (+22.58%)
Roy VnTokenizerVietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+58.06%)
LfuzzerFuzzing Parsers with Tokens
Stars: ✭ 28 (-9.68%)
ChevrotainParser Building Toolkit for JavaScript
Stars: ✭ 1,795 (+5690.32%)
Snl CompilerSNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
Stars: ✭ 19 (-38.71%)
yapYet Another (natural language) Parser
Stars: ✭ 40 (+29.03%)
Soynlp한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+1877.42%)
KadotKadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+248.39%)
Js TokensTiny JavaScript tokenizer.
Stars: ✭ 166 (+435.48%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1296.77%)
SomajoA tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (+174.19%)
Php Parser🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (+1190.32%)
LexmachineLex machinary for go.
Stars: ✭ 335 (+980.65%)
WirbRuby Object Inspection for IRB
Stars: ✭ 69 (+122.58%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+416.13%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (+70.97%)
Py NltoolsA collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (+48.39%)
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+325.81%)
SharpmathA small .NET math library.
Stars: ✭ 36 (+16.13%)
TokenizerA tokenizer for Icelandic text
Stars: ✭ 27 (-12.9%)
Omnicat BayesNaive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-3.23%)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+303.23%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (+22.58%)
SyntokText tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+296.77%)
NatashaSolves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+2441.94%)
Quantitative-Big-Imaging-2018(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (+61.29%)
TokenizerSource code tokenizer
Stars: ✭ 119 (+283.87%)
HippoPHP standards checker.
Stars: ✭ 82 (+164.52%)
SentencesA multilingual command line sentence tokenizer in Golang
Stars: ✭ 293 (+845.16%)
TokenizerA small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+15287.1%)
BitextorBitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+441.94%)
Smoothnlp专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+1303.23%)
Megamark😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+222.58%)
MooOptimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+1300%)
lexertkC++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-16.13%)
JflexThe fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+1125.81%)
DjurlSimple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+174.19%)
FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+909.68%)
Query TranslatorQuery Translator is a search query translator with AST representation
Stars: ✭ 165 (+432.26%)
SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+845.16%)
Sentence SplitterText to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+164.52%)
zeyrekPython morphological analyzer for Turkish language. Partial port of ZemberekNLP.
Stars: ✭ 36 (+16.13%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+45.16%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+87.1%)