The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+200%)

Mutual labels: text-mining, text-processing

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Stars: ✭ 127 (+746.67%)

Mutual labels: text-mining, data-mining

Textract

extract text from any document. no muss. no fuss.

Stars: ✭ 3,165 (+21000%)

Mutual labels: text-mining, data-mining

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (+80%)

Mutual labels: text-mining, text-processing

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+846.67%)

Mutual labels: text-mining, text-processing

Applied Text Mining In Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

Stars: ✭ 59 (+293.33%)

Mutual labels: text-mining, text-processing

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+2286.67%)

Mutual labels: text-mining, data-mining

Rmdl

RMDL: Random Multimodel Deep Learning for Classification

Stars: ✭ 375 (+2400%)

Mutual labels: text-mining, data-mining

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (+220%)

Mutual labels: text-mining, text-processing

Textcluster

短文本聚类预处理模块 Short text cluster

Stars: ✭ 115 (+666.67%)

Mutual labels: text-mining, text-processing

Tadw

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (+186.67%)

Mutual labels: text-mining, data-mining

Pipeit

PipeIt is a text transformation, conversion, cleansing and extraction tool.

Stars: ✭ 57 (+280%)

Mutual labels: text-mining, text-processing

Gwu data mining

Materials for GWU DNSC 6279 and DNSC 6290.

Stars: ✭ 217 (+1346.67%)

Mutual labels: text-mining, data-mining

Pyss3

A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI

Stars: ✭ 191 (+1173.33%)

Mutual labels: text-mining, data-mining

Qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Stars: ✭ 206 (+1273.33%)

Mutual labels: text-mining, data-mining

text-analysis

Weaving analytical stories from text data

Stars: ✭ 12 (-20%)

Mutual labels: text-mining, text-processing

estratto

parsing fixed width files content made easy

Stars: ✭ 12 (-20%)

Mutual labels: text-mining, text-processing

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (+6.67%)

Mutual labels: text-mining, data-mining

tensorflow-ml-nlp-tf2

텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료

Stars: ✭ 245 (+1533.33%)

Mutual labels: korean-text-processing, korean-nlp

deduce

Deduce: de-identification method for Dutch medical text

Stars: ✭ 40 (+166.67%)

Mutual labels: text-mining, text-processing

hangul-search-js

🇰🇷 Simple Korean text search module

Stars: ✭ 22 (+46.67%)

Mutual labels: korean-text-processing, korean-nlp

PyKOMORAN

(Beta) PyKOMORAN is wrapped KOMORAN in Python using Py4J.

Stars: ✭ 38 (+153.33%)

Mutual labels: korean-text-processing, korean-nlp

tf-idf-python

Term frequency–inverse document frequency for Chinese novel/documents implemented in python.

Stars: ✭ 98 (+553.33%)

Mutual labels: text-mining, data-mining

Awesome-DataScience-Cheatsheets

Collection of cheatsheets for data science, machine learning and deep learning :).

Stars: ✭ 48 (+220%)

Mutual labels: data-mining

Tencent2017 Final Rank28 code

2017第一届腾讯社交广告高校算法大赛Rank28_code

Stars: ✭ 85 (+466.67%)

Mutual labels: data-mining

4chanMarkovText

Text Generation using Markov Chains fed by 4chan APIs

Stars: ✭ 28 (+86.67%)

Mutual labels: data-mining

kmeans

A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python

Stars: ✭ 18 (+20%)

Mutual labels: data-mining

kasthack.osp

Генератор сырых дампов пользователей VK.

Stars: ✭ 15 (+0%)

Mutual labels: data-mining

CS259D Notes HW cn

本笔记是对课程CS 259D中涉及的论文和讲义的扩展，建议阅读原始论文和讲义。

Stars: ✭ 63 (+320%)

Mutual labels: data-mining

pathpy

pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models

Stars: ✭ 124 (+726.67%)

Mutual labels: data-mining

civicmine

Text mining cancer biomarkers for the CIVIC database

Stars: ✭ 19 (+26.67%)

Mutual labels: text-mining

Guten-gutter

Strips boilerplate from Project Gutenberg text files

Stars: ✭ 16 (+6.67%)

Mutual labels: text-mining

Network-Embedding-Resources

Network Embedding Survey and Resources

Stars: ✭ 43 (+186.67%)

Mutual labels: data-mining

KoEDA

Korean Easy Data Augmentation

Stars: ✭ 62 (+313.33%)

Mutual labels: korean-nlp

g2pK

g2pK: g2p module for Korean

Stars: ✭ 137 (+813.33%)

Mutual labels: korean-nlp

genie

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

Stars: ✭ 21 (+40%)

Mutual labels: data-mining

KoELECTRA-Pipeline

Transformers Pipeline with KoELECTRA

Stars: ✭ 37 (+146.67%)

Mutual labels: korean-nlp

lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (+80%)

Mutual labels: text-mining

BLUELAY

Searches online paste sites for certain search terms which can indicate a possible data breach.

Stars: ✭ 24 (+60%)

Mutual labels: data-mining

hck

A sharp cut(1) clone.

Stars: ✭ 542 (+3513.33%)

Mutual labels: text-processing

andaluh-js

Transliterate español (spanish) spelling to andaluz proposals using javascript

Stars: ✭ 22 (+46.67%)

Mutual labels: text-processing

learning2hash.github.io

Website for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io

Stars: ✭ 14 (-6.67%)

Mutual labels: text-mining

FSCNMF

An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".

Stars: ✭ 16 (+6.67%)

Mutual labels: data-mining

restaurant-finder-featureReviews

Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).

Stars: ✭ 21 (+40%)

Mutual labels: text-mining

machine learning in python

Demo of basic machine learning models in python with Jupter Notebook

Stars: ✭ 16 (+6.67%)

Mutual labels: data-mining

DataCon

🏆DataCon大数据安全分析大赛，2019年方向二（恶意代码检测）冠军源码、2020年方向五（恶意代码分析）季军源码

Stars: ✭ 69 (+360%)

Mutual labels: data-mining

stringx

Drop-in replacements for base R string functions powered by stringi

Stars: ✭ 14 (-6.67%)

Mutual labels: text-processing

Hefei ECG TOP1

“合肥高新杯”心电人机智能大赛 —— 心电异常事件预测 TOP1 Solution

Stars: ✭ 109 (+626.67%)

Mutual labels: data-mining

textdigester

TextDigester: document summarization java library

Stars: ✭ 23 (+53.33%)

Mutual labels: text-mining

ipo-miner

IPO Investment via Text Mining.

Stars: ✭ 20 (+33.33%)

Mutual labels: text-mining

pwsh-prelude

PowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.

Stars: ✭ 26 (+73.33%)

Mutual labels: text-processing

1-60 of 530 similar projects

›

next*5