All Projects → jeremygrace → amazon-reviews

jeremygrace / amazon-reviews

Licence: other
Sentiment Analysis & Topic Modeling with Amazon Reviews

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to amazon-reviews

Quick-Data-Science-Experiments-2017
Quick-Data-Science-Experiments
Stars: ✭ 19 (-26.92%)
Mutual labels:  logistic-regression, lda, nmf
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (+50%)
Mutual labels:  topic-modeling, lda, nmf
Sarcasm Detection
Detecting Sarcasm on Twitter using both traditonal machine learning and deep learning techniques.
Stars: ✭ 73 (+180.77%)
Mutual labels:  sentiment-analysis, topic-modeling
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (+292.31%)
Mutual labels:  sentiment-analysis, topic-modeling
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-53.85%)
Mutual labels:  sentiment-analysis, topic-modeling
LinLP
使用Python进行自然语言处理相关实践,如新词发现,主题模型,隐马尔模型词性标注,Word2Vec,情感分析
Stars: ✭ 43 (+65.38%)
Mutual labels:  sentiment-analysis, lda
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+1276.92%)
Mutual labels:  sentiment-analysis, topic-modeling
Amazon Product Recommender System
Sentiment analysis on Amazon Review Dataset available at http://snap.stanford.edu/data/web-Amazon.html
Stars: ✭ 158 (+507.69%)
Mutual labels:  sentiment-analysis, logistic-regression
Familia
A Toolkit for Industrial Topic Modeling
Stars: ✭ 2,499 (+9511.54%)
Mutual labels:  topic-modeling, lda
Machine-Learning-Models
In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.
Stars: ✭ 30 (+15.38%)
Mutual labels:  logistic-regression, lda
TopicsExplorer
Explore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (+103.85%)
Mutual labels:  topic-modeling, lda
PlanSum
[AAAI2021] Unsupervised Opinion Summarization with Content Planning
Stars: ✭ 25 (-3.85%)
Mutual labels:  sentiment-analysis, amazon
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+465.38%)
Mutual labels:  sentiment-analysis, topic-modeling
Weibo Analyst
Social media (Weibo) comments analyzing toolbox in Chinese 微博评论分析工具, 实现功能: 1.微博评论数据爬取; 2.分词与关键词提取; 3.词云与词频统计; 4.情感分析; 5.主题聚类
Stars: ✭ 430 (+1553.85%)
Mutual labels:  sentiment-analysis, lda
Ldagibbssampling
Open Source Package for Gibbs Sampling of LDA
Stars: ✭ 218 (+738.46%)
Mutual labels:  topic-modeling, lda
Textclf
TextClf :基于Pytorch/Sklearn的文本分类框架,包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型,通过简单配置即可完成数据处理、模型训练、测试等过程。
Stars: ✭ 105 (+303.85%)
Mutual labels:  sentiment-analysis, logistic-regression
hlda
Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
Stars: ✭ 138 (+430.77%)
Mutual labels:  topic-modeling, lda
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+250%)
Mutual labels:  topic-modeling, lda
Sttm
Short Text Topic Modeling, JAVA
Stars: ✭ 100 (+284.62%)
Mutual labels:  topic-modeling, lda
Topic-Modeling-Workshop-with-R
A workshop on analyzing topic modeling (LDA, CTM, STM) using R
Stars: ✭ 51 (+96.15%)
Mutual labels:  topic-modeling, lda

amazon-review

Product Category:

Sports & Outdoor Reviews


Dataset:

Utilized AWS for storage and quick access of data:


![aws-s3](sentiment-topic_modeling/img/amazon_aws-s3.png)

Source : UCSanDiego library/repo Curated by Julian McAuley [ Link ]

Dataset size: 296,337 x 10 [ rows x columns ]

Column headers description
asin Product ID
summary Title of review
reviewText Written review
overall Rating 1-5 (stars)
reviewerID Reviewer ID
reviewerName Person's name (no standard format)
helpful Helpfulness rating of the review
reviewTime YYYY-MM--DD
unixReviewTime Time of the review (unix time)
pos_neg (1) Positive for 4-5 or (2) Negative for 1-3 Overall rating

Project Outline:

Business motivation :
For ecommerce sites and outfitters to stay competitive and innovate, they must be able to draw and hold dedicated customers. One particularly effective approach in recent years has been to build personalized recommendation engines into their platform or interface. Determining the specific topics and sentiments associated with given sports and outdoors products is essential in building a recommendation engine. This project is mainly to understand the concepts covered in class and apply them to a specific domain.

Problem formulation :

  1. Explore and Process the data in order to glean basic insights about the data and prep to utilize models

  2. Finding a classification model that works best with the data.

  3. Understanding the topics and words that describe the broad categories of Sports and Outdoors product sold over Amazon.

  4. Given the data and model performance, determine what is the best course of actions going forward.

Approach:

  • EDA

  • Preprocessing

  • Model data

    1. Classification / Sentiment Analysis

      • Logistic Regression
      • Multinomial Naive Bayes
    2. Clustering / Topic Modeling

      • Nonnegative Matrix Factorization (NMF)
      • Latent Dirichlet Allocation (Lda)
  • Summarize Findings and Proposed Further Work

Conclusion:

  • The data appears to be surprisingly quite biased and imbalanced toward Shooting sports. Since this group of activities was not going to be a main focus in the end product, more and different data is needed to build an appropriate model for the end goal.

  • Aside for the data itself, here is a summary of the modeling results:

Classification Summary:
* Logistic Regression (using CountVectorizer) performance was the best with - F1 score: 94 %.
* Multinomial NB with Tfidf was a close second with - F1 score: 92 %.

Clustering Summary:
* Both NMF and Lda with term frequency were about the same and just ok.
* NMF with Tfidf was the best with no obscure topics and the model even correctly associated a topic of words with a specific brand (Nalgene).
* Lda with Tfidf primarily retrieved unassociated words; however, it did return the most specific and unique words out of them all (i.e. brand news)

Further work:

  • Build a web scraper utilizing Beautiful Soup to gather more appropriate, unbiased reviews from a sports outlet like REI or Dick's Sporting Goods .

  • Focus on categorizing by Sports and Outdoor activities in order to build better classification model that excludes Shooting sports

  • Incorporate word2vec or LDA2vec

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].