TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-87.32%)
Dan Jurafsky Chris Manning NlpMy solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-41.78%)
PrenlpPreprocessing Library for Natural Language Processing
Stars: ✭ 130 (-38.97%)
PyparsingPython library for creating PEG parsers
Stars: ✭ 1,052 (+393.9%)
TextvecText vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (-21.6%)
Command Line Text Processing⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Stars: ✭ 9,771 (+4487.32%)
Python NameparserA simple Python module for parsing human names into their individual components
Stars: ✭ 462 (+116.9%)
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+100%)
Node RakeA NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.
Stars: ✭ 85 (-60.09%)
Stanza OldStanford NLP group's shared Python tools.
Stars: ✭ 142 (-33.33%)
TerText Expression Runner – Readable and easy to use text expressions
Stars: ✭ 67 (-68.54%)
FastnlpfastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+1046.01%)
PipeitPipeIt is a text transformation, conversion, cleansing and extraction tool.
Stars: ✭ 57 (-73.24%)
LibasciidocA Golang library for processing Asciidoc files.
Stars: ✭ 129 (-39.44%)
FxtA large scale feature extraction tool for text-based machine learning
Stars: ✭ 25 (-88.26%)
Regex AutomataA low level regular expression library that uses deterministic finite automata.
Stars: ✭ 203 (-4.69%)
GohnHatena Notation (はてな記法) Parser written in Go
Stars: ✭ 17 (-92.02%)
Textcluster短文本聚类预处理模块 Short text cluster
Stars: ✭ 115 (-46.01%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+105.63%)
JaconvPure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Stars: ✭ 157 (-26.29%)
BsedSimple SQL-like syntax on top of Perl text processing.
Stars: ✭ 414 (+94.37%)
TextpipeTextpipe: clean and extract metadata from text
Stars: ✭ 284 (+33.33%)
BrowsecloudA web app to create and browse text visualizations for automated customer listening.
Stars: ✭ 143 (-32.86%)
KefirbbA flexible Java text processor. BB, BBCode, BB-code, HTML, Textile, Markdown, parser, translator, converter.
Stars: ✭ 83 (-61.03%)
SdIntuitive find & replace CLI (sed alternative)
Stars: ✭ 2,755 (+1193.43%)
VirastarCleaning-up Persian Texts!
Stars: ✭ 77 (-63.85%)
TmtoolkitText Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (-36.62%)
StringiTHE String Processing Package for R (with ICU)
Stars: ✭ 204 (-4.23%)
Go Search Replace🚀 Search & replace URLs in WordPress SQL files.
Stars: ✭ 57 (-73.24%)
Konoha🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Stars: ✭ 130 (-38.97%)
Lingua FrancaMycroft's multilingual text parsing and formatting library
Stars: ✭ 51 (-76.06%)
Text DetectorTool which allow you to detect and translate text.
Stars: ✭ 173 (-18.78%)
Qp Trie RsAn idiomatic and fast QP-trie implementation in pure Rust.
Stars: ✭ 47 (-77.93%)
PadatiousA neural network intent parser
Stars: ✭ 124 (-41.78%)
Concise Ipython Notebooks For Deep LearningIpython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-89.2%)
rake-rsMultilingual implementation of RAKE algorithm for Rust
Stars: ✭ 30 (-85.92%)
Chr🔤 Lightweight R package for manipulating [string] characters
Stars: ✭ 18 (-91.55%)
Cogcomp NlpyCogComp's light-weight Python NLP annotators
Stars: ✭ 115 (-46.01%)
WhatlanggoNatural language detection library for Go
Stars: ✭ 479 (+124.88%)
NlprePython library for Natural Language Preprocessing (NLPre)
Stars: ✭ 158 (-25.82%)
Diff Match PatchDiff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Stars: ✭ 4,910 (+2205.16%)
Colibri CoreColibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (-47.42%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+103.29%)
Rust UnicUNIC: Unicode and Internationalization Crates for Rust
Stars: ✭ 189 (-11.27%)
Aho CorasickA fast implementation of Aho-Corasick in Rust.
Stars: ✭ 424 (+99.06%)
BplBinary Processing Language
Stars: ✭ 103 (-51.64%)
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+63.38%)
Japanese.jsUtil collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Stars: ✭ 150 (-29.58%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-91.08%)
MtpMulti-lingual Text Processing
Stars: ✭ 87 (-59.15%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-94.37%)
XiocExtract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (-30.52%)