The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-21.05%)

Mutual labels: text-mining, text-processing

Cogcomp Nlpy

CogComp's light-weight Python NLP annotators

Stars: ✭ 115 (+101.75%)

Mutual labels: text-mining, text-processing

estratto

parsing fixed width files content made easy

Stars: ✭ 12 (-78.95%)

Mutual labels: text-mining, text-processing

TRUNAJOD2.0

An easy-to-use library to extract indices from texts.

Stars: ✭ 18 (-68.42%)

Mutual labels: text-mining, text-processing

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (+510.53%)

Mutual labels: text-mining, text-processing

Textcluster

短文本聚类预处理模块 Short text cluster

Stars: ✭ 115 (+101.75%)

Mutual labels: text-mining, text-processing

teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Stars: ✭ 91 (+59.65%)

Mutual labels: text-mining, text-processing

text-analysis

Weaving analytical stories from text data

Stars: ✭ 12 (-78.95%)

Mutual labels: text-mining, text-processing

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (-52.63%)

Mutual labels: text-mining, text-processing

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+149.12%)

Mutual labels: text-mining, text-processing

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (+668.42%)

Mutual labels: text-processing

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+659.65%)

Mutual labels: text-processing

Pyshorttextcategorization

Various Algorithms for Short Text Mining

Stars: ✭ 429 (+652.63%)

Mutual labels: text-mining

Aho Corasick

A fast implementation of Aho-Corasick in Rust.

Stars: ✭ 424 (+643.86%)

Mutual labels: text-processing

Tidytext

Text mining using tidy tools ✨📄✨

Stars: ✭ 975 (+1610.53%)

Mutual labels: text-mining

Chr

🔤 Lightweight R package for manipulating [string] characters

Stars: ✭ 18 (-68.42%)

Mutual labels: text-processing

Open Semantic Search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Stars: ✭ 386 (+577.19%)

Mutual labels: text-mining

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+528.07%)

Mutual labels: text-mining

Diff Match Patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Stars: ✭ 4,910 (+8514.04%)

Mutual labels: text-processing

Concise Ipython Notebooks For Deep Learning

Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.

Stars: ✭ 23 (-59.65%)

Mutual labels: text-processing

Gsoc2018 3gm

💫 Automated codification of Greek Legislation with NLP

Stars: ✭ 36 (-36.84%)

Mutual labels: text-mining

Autophrase

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

Stars: ✭ 835 (+1364.91%)

Mutual labels: text-mining

Graphbrain

Language, Knowledge, Cognition

Stars: ✭ 294 (+415.79%)

Mutual labels: text-mining

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (+647.37%)

Mutual labels: text-processing

Bagofconcepts

Python implementation of bag-of-concepts

Stars: ✭ 18 (-68.42%)

Mutual labels: text-mining

Bsed

Simple SQL-like syntax on top of Perl text processing.

Stars: ✭ 414 (+626.32%)

Mutual labels: text-processing

Pyparsing

Python library for creating PEG parsers

Stars: ✭ 1,052 (+1745.61%)

Mutual labels: text-processing

Rmdl

RMDL: Random Multimodel Deep Learning for Classification

Stars: ✭ 375 (+557.89%)

Mutual labels: text-mining

Gohn

Hatena Notation (はてな記法) Parser written in Go

Stars: ✭ 17 (-70.18%)

Mutual labels: text-processing

Metasra Pipeline

MetaSRA: normalized sample-specific metadata for the Sequence Read Archive

Stars: ✭ 33 (-42.11%)

Mutual labels: text-mining

Rake Nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Stars: ✭ 793 (+1291.23%)

Mutual labels: text-mining

Rplos

R client for the PLoS Journals API

Stars: ✭ 289 (+407.02%)

Mutual labels: text-mining

Textract

extract text from any document. no muss. no fuss.

Stars: ✭ 3,165 (+5452.63%)

Mutual labels: text-mining

Textpipe

Textpipe: clean and extract metadata from text

Stars: ✭ 284 (+398.25%)

Mutual labels: text-processing

Lingua Franca

Mycroft's multilingual text parsing and formatting library

Stars: ✭ 51 (-10.53%)

Mutual labels: text-processing

Qp Trie Rs

An idiomatic and fast QP-trie implementation in pure Rust.

Stars: ✭ 47 (-17.54%)

Mutual labels: text-processing

Uc Davis Cs Exams Analysis

📈 Regression and Classification with UC Davis student quiz data and exam data

Stars: ✭ 33 (-42.11%)

Mutual labels: text-mining

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+1285.96%)

Mutual labels: text-mining

2018 Machinelearning Lectures Esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Stars: ✭ 280 (+391.23%)

Mutual labels: text-mining

Textmining

Python文本挖掘系统 Research of Text Mining System

Stars: ✭ 268 (+370.18%)

Mutual labels: text-mining

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (+1154.39%)

Mutual labels: text-mining

Nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

Stars: ✭ 265 (+364.91%)

Mutual labels: text-mining

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-66.67%)

Mutual labels: text-processing

Tidy Text Mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

Stars: ✭ 961 (+1585.96%)

Mutual labels: text-mining

Bigartm

Fast topic modeling platform