The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+181.25%)

Mutual labels: text-mining

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (+68.75%)

Mutual labels: text-mining

perke

A keyphrase extractor for Persian

Stars: ✭ 60 (+275%)

Mutual labels: text-mining

textreadr

Tools to uniformly read in text data including semi-structured transcripts

Stars: ✭ 65 (+306.25%)

Mutual labels: text-mining

Answerable

Recommendation system for Stack Overflow unanswered questions

Stars: ✭ 13 (-18.75%)

Mutual labels: text-mining

thrones2vec

Using Word2Vec to explore semantic similarities between the entities of "A Song of Ice and Fire" ("Game of Thrones").

Stars: ✭ 27 (+68.75%)

Mutual labels: text-mining

koshort

(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.

Stars: ✭ 62 (+287.5%)

Mutual labels: text-mining

JoSH

[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Stars: ✭ 55 (+243.75%)

Mutual labels: text-mining

pathvalidate

A Python library to sanitize/validate a string such as filenames/file-paths/etc.

Stars: ✭ 139 (+768.75%)

Mutual labels: sanitization

civicmine

Text mining cancer biomarkers for the CIVIC database

Stars: ✭ 19 (+18.75%)

Mutual labels: text-mining

Sanitize

Ruby HTML and CSS sanitizer.

Stars: ✭ 1,940 (+12025%)

Mutual labels: sanitization

odinson

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Stars: ✭ 59 (+268.75%)

Mutual labels: text-mining

Govalidator

[Go] Package of validators and sanitizers for strings, numerics, slices and structs

Stars: ✭ 5,163 (+32168.75%)

Mutual labels: sanitization

misinfo

📊 Tools to Perform ‘Misinformation’ Analysis on a Text Corpus (wrapper for methods in https://github.com/PDXBek/Misinformation)

Stars: ✭ 17 (+6.25%)

Mutual labels: text-mining

Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Stars: ✭ 31 (+93.75%)

Mutual labels: text-mining

restaurant-finder-featureReviews

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Stars: ✭ 21 (+31.25%)

Mutual labels: text-mining

Quran-and-Arabic-Language-Repository

Projects & Libraries related to Quran & Arabic Language

Stars: ✭ 26 (+62.5%)

Mutual labels: text-mining

lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (+68.75%)

Mutual labels: text-mining

VERSE

Vancouver Event and Relation System for Extraction