Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

✭ 112

python library nlp text-processing corpus linguistics

Command Line Text Processing

⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨

✭ 9,771

shell python linux command-line regex ebook text-processing grep

Bpl

Binary Processing Language

✭ 103

go golang language text-processing

Text classification

Text Classification Algorithms: A Survey

✭ 1,276

python deep-learning convolutional-neural-networks text-classification recurrent-neural-networks nlp-machine-learning text-processing logistic-regression random-forest decision-trees

Mtp

Multi-lingual Text Processing

✭ 87

text-processing

Nostril

Nostril: Nonsense String Evaluator

✭ 86

python inference text-processing source-code detector

Ios11 Visionframework

Vision Framework IOS WWDC 2017

✭ 85

swift ios learning face-detection text-processing machine image-analysis text-detection

Node Rake

A NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.

✭ 85

javascript nodejs text-processing

Kefirbb

A flexible Java text processor. BB, BBCode, BB-code, HTML, Textile, Markdown, parser, translator, converter.

✭ 83

java html markdown converter text-processing

Unix Text Commands

Unix Text Processing Command Reference

✭ 78

nlp command-line unix reference text-processing

Virastar

Cleaning-up Persian Texts!

✭ 77

javascript text-processing persian

Ter

Text Expression Runner – Readable and easy to use text expressions

✭ 67

rust linux text-processing

Applied Text Mining In Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

✭ 59

python jupyter-notebook nlp classification pandas text-classification regex text-mining text-processing

Javascript Text Expander

Expands texts as you type, naturally

✭ 58

javascript text-processing text-analysis

Go Search Replace

🚀 Search & replace URLs in WordPress SQL files.

✭ 57

go golang wordpress text-processing

Pipeit

PipeIt is a text transformation, conversion, cleansing and extraction tool.

✭ 57

go text-mining text-processing

Lingua Franca

Mycroft's multilingual text parsing and formatting library

✭ 51

python hacktoberfest library natural-language-processing text-processing

Pyparsing

Python library for creating PEG parsers

✭ 1,052

python python3 python2 parsing text-processing parser-combinators

Qp Trie Rs

An idiomatic and fast QP-trie implementation in pure Rust.

✭ 47

rust data-structures text-processing trie bytes

Fxt

A large scale feature extraction tool for text-based machine learning

✭ 25

machine-learning information-retrieval feature-extraction text-processing

Concise Ipython Notebooks For Deep Learning

Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.

✭ 23

jupyter-notebook image-processing deep-neural-networks deeplearning image-classification text-classification image-segmentation autoencoder snapshot word-embeddings text-processing text-generation

Text Mining

Text Mining in Python

✭ 18

python jupyter-notebook text-classification text-mining text-processing

Chr

🔤 Lightweight R package for manipulating [string] characters

✭ 18

r rstats r-package regex extract text-processing strings string-manipulation character

Gohn

Hatena Notation (はてな記法) Parser written in Go

✭ 17

go parser text-processing

Whatlanggo

Natural language detection library for Go

✭ 479

go language nlp text-processing

Python Nameparser

A simple Python module for parsing human names into their individual components

✭ 462

python text-processing

Diff Match Patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

✭ 4,910

python objective c dart C++C#java diff text-processing patch match difference

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

✭ 438

scala natural-language-processing korean text-processing tokenizer

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

✭ 433

python nlp text-processing tokenizer nlp-library word-segmentation

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

✭ 426

python machine-learning library nlp natural-language-processing text-processing nlp-library linguistics

Aho Corasick

A fast implementation of Aho-Corasick in Rust.

✭ 424

rust search text-processing finite-state-machine

Bsed

Simple SQL-like syntax on top of Perl text processing.

✭ 414

python perl awk csv text-processing grep

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them