All Projects → pesoto → Text-Analysis

pesoto / Text-Analysis

Licence: other
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text-Analysis

lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-43.75%)
Mutual labels:  text-mining, word2vec, word-embeddings, lda
text-mining-corona-articles
Text Mining for Indonesian Online News Articles About Corona
Stars: ✭ 15 (-68.75%)
Mutual labels:  text-mining, word2vec, web-scraping
PyLDA
A Latent Dirichlet Allocation implementation in Python.
Stars: ✭ 51 (+6.25%)
Mutual labels:  lda, latent-dirichlet-allocation, gibbs-sampling
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-31.25%)
Mutual labels:  text-mining, lda, latent-dirichlet-allocation
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (+320.83%)
Mutual labels:  word2vec, word-embeddings, lstm
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+1389.58%)
Mutual labels:  text-mining, word2vec, word-embeddings
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+3487.5%)
Mutual labels:  text-mining, word2vec, word-embeddings
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+308.33%)
Mutual labels:  text-mining, word2vec, word-embeddings
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-18.75%)
Mutual labels:  word2vec, lda, word-embedding
models-by-example
By-hand code for models and algorithms. An update to the 'Miscellaneous-R-Code' repo.
Stars: ✭ 43 (-10.42%)
Mutual labels:  expectation-maximization, gradient-descent
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-60.42%)
Mutual labels:  word2vec, word-embeddings
fsauor2018
基于LSTM网络与自注意力机制对中文评论进行细粒度情感分析
Stars: ✭ 36 (-25%)
Mutual labels:  word2vec, lstm
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+6.25%)
Mutual labels:  word2vec, word-embeddings
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+195.83%)
Mutual labels:  text-mining, text-processing
wikidata-corpus
Train Wikidata with word2vec for word embedding tasks
Stars: ✭ 109 (+127.08%)
Mutual labels:  word2vec, word-embeddings
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-43.75%)
Mutual labels:  text-mining, text-processing
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-33.33%)
Mutual labels:  word-embeddings, lstm
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-56.25%)
Mutual labels:  text-mining, web-scraping
SWDM
SIGIR 2017: Embedding-based query expansion for weighted sequential dependence retrieval model
Stars: ✭ 35 (-27.08%)
Mutual labels:  word2vec, word-embeddings
word embedding
Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding..
Stars: ✭ 21 (-56.25%)
Mutual labels:  word2vec, word-embeddings

Text-Analysis

This is not a module for large scale use, but rather a set of scripts to explain popular methodologies in text analysis, including Web Scraping, Preprocessing, Skip Gram (word2vec), and Topic Modelling.

1. Web Scraping

How can I download text data from a website algorithmically using Python? How do I store the data in a csv file for later use?

Web_Scraping.py: explains how to download movie quotes and store the data neatly in a table using the Pandas Python module.

2. Preprocessing

How are documents and words represented in Python? How can I clean text in Python by removing unnecessary words and adjusting for infrequent words?

Text_Preprocessing.py: explains common ways of representing text data in Python through one-hot encoded vectors, cleaning data with removal of stopwords and lowercasing, and TF-IDF weights.

3. EM-Algorithm

How can I discover topics of documents? I.e. how can I calculate how much one article is about sports, another about business, etc.?

EM_Algorithm.py: explains how to estimate a distribution using the EM-Algorithm. This is a precursor to the topic modelling example.

4. Gibbs Sampling

How can I discover topics of documents? I.e. how can I calculate how much one article is about sports, another about business, etc.?

Gibbs_Sampling.py: explains how Gibbs sampling works in the context of topic modelling.

5. Skip Gram

How can I find which words in my documents are related to each other syntactically and semantically? How does a basic neural network work?

Skip_Gram.py: explains how the Skip Gram model from Mikolov et al. works (with gradient descent and no negative sampling).

6. Long Short Term Memory

How can I develop a language model with memory? How does backpropogation/gradient descent through time work?

LSTM_Tutorial.py: explains the backpropogation of an LSTM model. Extends the code from Nicolas Jimenez to train a language model with memory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].