All Categories → Text Processing → text-processing

Top 110 text-processing open source projects

Regex Automata
A low level regular expression library that uses deterministic finite automata.
Rust Unic
UNIC: Unicode and Internationalization Crates for Rust
Sd
Intuitive find & replace CLI (sed alternative)
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Japanese.js
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Browsecloud
A web app to create and browse text visualizations for automated customer listening.
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Prenlp
Preprocessing Library for Natural Language Processing
Konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Libasciidoc
A Golang library for processing Asciidoc files.
Textcluster
短文本聚类预处理模块 Short text cluster
Colibri Core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Command Line Text Processing
⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Bpl
Binary Processing Language
Mtp
Multi-lingual Text Processing
Nostril
Nostril: Nonsense String Evaluator
Node Rake
A NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.
Kefirbb
A flexible Java text processor. BB, BBCode, BB-code, HTML, Textile, Markdown, parser, translator, converter.
Unix Text Commands
Unix Text Processing Command Reference
Virastar
Cleaning-up Persian Texts!
Ter
Text Expression Runner – Readable and easy to use text expressions
Javascript Text Expander
Expands texts as you type, naturally
Go Search Replace
🚀 Search & replace URLs in WordPress SQL files.
Pipeit
PipeIt is a text transformation, conversion, cleansing and extraction tool.
Lingua Franca
Mycroft's multilingual text parsing and formatting library
Pyparsing
Python library for creating PEG parsers
Qp Trie Rs
An idiomatic and fast QP-trie implementation in pure Rust.
Fxt
A large scale feature extraction tool for text-based machine learning
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Chr
🔤 Lightweight R package for manipulating [string] characters
Gohn
Hatena Notation (はてな記法) Parser written in Go
Whatlanggo
Natural language detection library for Go
Python Nameparser
A simple Python module for parsing human names into their individual components
Diff Match Patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Aho Corasick
A fast implementation of Aho-Corasick in Rust.
Bsed
Simple SQL-like syntax on top of Perl text processing.
Textpipe
Textpipe: clean and extract metadata from text
ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
daachorse
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
NLP-tools
Useful python NLP tools (evaluation, GUI interface, tokenization)
1-60 of 110 text-processing projects