StringiTHE String Processing Package for R (with ICU)
Regex AutomataA low level regular expression library that uses deterministic finite automata.
Rust UnicUNIC: Unicode and Internationalization Crates for Rust
SdIntuitive find & replace CLI (sed alternative)
FastnlpfastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
TextvecText vectorization tool to outperform TFIDF for classification tasks
NlprePython library for Natural Language Preprocessing (NLPre)
JaconvPure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Japanese.jsUtil collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
XiocExtract indicators of compromise from text, including "escaped" ones.
BrowsecloudA web app to create and browse text visualizations for automated customer listening.
Stanza OldStanford NLP group's shared Python tools.
TmtoolkitText Mining and Topic Modeling Toolkit for Python with parallel processing power
PrenlpPreprocessing Library for Natural Language Processing
Konoha🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
LibasciidocA Golang library for processing Asciidoc files.
Colibri CoreColibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
BplBinary Processing Language
MtpMulti-lingual Text Processing
NostrilNostril: Nonsense String Evaluator
Node RakeA NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.
KefirbbA flexible Java text processor. BB, BBCode, BB-code, HTML, Textile, Markdown, parser, translator, converter.
TerText Expression Runner – Readable and easy to use text expressions
PipeitPipeIt is a text transformation, conversion, cleansing and extraction tool.
Lingua FrancaMycroft's multilingual text parsing and formatting library
PyparsingPython library for creating PEG parsers
Qp Trie RsAn idiomatic and fast QP-trie implementation in pure Rust.
FxtA large scale feature extraction tool for text-based machine learning
Concise Ipython Notebooks For Deep LearningIpython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Chr🔤 Lightweight R package for manipulating [string] characters
GohnHatena Notation (はてな記法) Parser written in Go
WhatlanggoNatural language detection library for Go
Python NameparserA simple Python module for parsing human names into their individual components
Diff Match PatchDiff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
BsedSimple SQL-like syntax on top of Perl text processing.
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
TextpipeTextpipe: clean and extract metadata from text
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
daachorse🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
NLP-toolsUseful python NLP tools (evaluation, GUI interface, tokenization)