BitextorBitextor generates translation memories from multilingual websites.
Query TranslatorQuery Translator is a search query translator with AST representation
TokenizersFast, Consistent Tokenization of Natural Language Text
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
LexReplaced by foonathan/lexy
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
SyntokText tokenization and sentence segmentation (segtok v2)
KadotKadot, the unsupervised natural language processing library.
Megamark😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
SomajoA tokenizer and sentence splitter for German and English web and social media texts.
DjurlSimple yet helpful library for writing Django urls by an easy, short and intuitive way.
HippoPHP standards checker.
Sentence SplitterText to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
WirbRuby Object Inspection for IRB
String CalcPHP calculator library for mathematical terms (expressions) passed as strings
ThotThot toolkit for statistical machine translation
GreynirThe greynir.is natural language processing website for Icelandic
Py NltoolsA collection of basic python modules for spoken natural language processing
TalismaneNLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Omnicat BayesNaive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Snl CompilerSNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
NatashaSolves basic Russian NLP tasks, API for lower level Natasha projects
Mustard🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Soynlp한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
TokenizerA small library for converting tokenized PHP source code into XML (and potentially other formats)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Smoothnlp专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
MooOptimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Php Parser🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
JflexThe fast scanner generator for Java™ with full Unicode support
FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
SentencesA multilingual command line sentence tokenizer in Golang
SacremosesPython port of Moses tokenizer, truecaser and normalizer
JumanppJuman++ (a Morphological Analyzer Toolkit)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
cang-jieChinese tokenizer for tantivy, based on jieba-rs
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
text2textText2Text: Cross-lingual natural language processing and generation toolkit
bredonA modern CSS value compiler in JavaScript
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
tokenizerTokenize CSS according to the CSS Syntax