learn perl onelinersExample based guide for text processing with perl from the command line
Stars: ✭ 63 (-86.36%)
s3-concatConcatenate Amazon S3 files remotely using flexible patterns
Stars: ✭ 32 (-93.07%)
HrEasy Access to Uppercase H
Stars: ✭ 56 (-87.88%)
frogFrog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (-84.85%)
ConTextoLibrería en Python para minería de texto y NLP
Stars: ✭ 43 (-90.69%)
stringxDrop-in replacements for base R string functions powered by stringi
Stars: ✭ 14 (-96.97%)
fuzzychineseA small package to fuzzy match chinese words
Stars: ✭ 50 (-89.18%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-89.61%)
SuperCombinators[Deprecated] A Swift parser combinator framework
Stars: ✭ 19 (-95.89%)
cinjeA Pythonic and ultra fast template engine DSL.
Stars: ✭ 26 (-94.37%)
SuffixTreeOptimized implementation of suffix tree in python using Ukkonen's algorithm.
Stars: ✭ 38 (-91.77%)
WeTextProcessingText Normalization & Inverse Text Normalization
Stars: ✭ 213 (-53.9%)
deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-91.34%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-95.89%)
vi-rsVietnamese Input Method library
Stars: ✭ 69 (-85.06%)
andaluh-jsTransliterate español (spanish) spelling to andaluz proposals using javascript
Stars: ✭ 22 (-95.24%)
teanaps자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (-80.3%)
Aho CorasickA fast implementation of Aho-Corasick in Rust.
Stars: ✭ 424 (-8.23%)
dif'dif' is a Linux preprocessing front end to gvimdiff/meld/kompare
Stars: ✭ 18 (-96.1%)
TRUNAJOD2.0An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (-96.1%)
frangipanniProgram to convert lines of text into a tree structure.
Stars: ✭ 1,176 (+154.55%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-69.26%)
sliceslice-rsA fast implementation of single-pattern substring search using SIMD acceleration.
Stars: ✭ 66 (-85.71%)
TextrudeCode generation from YAML/JSON/CSV models via SCRIBAN templates
Stars: ✭ 79 (-82.9%)
nlcliNatural language interface for the command line.
Stars: ✭ 21 (-95.45%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-97.4%)
typ3r.js🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or
Stars: ✭ 22 (-95.24%)
finglishA Finglish to Persian converter.
Stars: ✭ 60 (-87.01%)
TextpipeTextpipe: clean and extract metadata from text
Stars: ✭ 284 (-38.53%)
sova-tts-tpsNLP-preprocessor for the SOVA-TTS project
Stars: ✭ 44 (-90.48%)
hckA sharp cut(1) clone.
Stars: ✭ 542 (+17.32%)
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (-7.79%)
s3-utilsUtilities and tools based around Amazon S3 to provide convenience APIs in a CLI
Stars: ✭ 45 (-90.26%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-94.16%)
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-94.16%)
daachorse🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Stars: ✭ 75 (-83.77%)
Emotion-recognition-from-tweetsA comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Stars: ✭ 17 (-96.32%)
pwsh-preludePowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.
Stars: ✭ 26 (-94.37%)
text2videoText to Video Generation Problem
Stars: ✭ 28 (-93.94%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (-5.19%)
estrattoparsing fixed width files content made easy
Stars: ✭ 12 (-97.4%)
Compare-UserJSPowerShell script for comparing user.js (or prefs.js) files.
Stars: ✭ 79 (-82.9%)
NLP-toolsUseful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-91.56%)
synsyn - the thesaurus
Stars: ✭ 45 (-90.26%)
lingua-go👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+48.05%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-90.26%)
BsedSimple SQL-like syntax on top of Perl text processing.
Stars: ✭ 414 (-10.39%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (-87.01%)
textstatRuby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
Stars: ✭ 25 (-94.59%)
r4stringsHandling Strings in R
Stars: ✭ 39 (-91.56%)
hama-py🦛 파이썬 한글 처리 라이브러리. Python Korean Morphological Analyzer
Stars: ✭ 16 (-96.54%)
Diff Match PatchDiff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Stars: ✭ 4,910 (+962.77%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (-6.28%)
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-24.68%)
textQiniu Text Processing Libraries for Go
Stars: ✭ 25 (-94.59%)