Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → haoopeng → Cnn Yelp Challenge 2016 Sentiment Classification

haoopeng / Cnn Yelp Challenge 2016 Sentiment Classification

IPython Notebook for training a word-level Convolutional Neural Network model for sentiment classification task on Yelp-Challenge-2016 review dataset.

Labels

jupyter-notebook deep-learning artificial-intelligence sentiment-classification cnn-model

Projects that are alternatives of or similar to Cnn Yelp Challenge 2016 Sentiment Classification

Nlp Tutorial

A list of NLP(Natural Language Processing) tutorials

Stars: ✭ 1,188 (+1020.75%)

Mutual labels: jupyter-notebook, sentiment-classification

60 days rl challenge

60_Days_RL_Challenge中文版

Stars: ✭ 92 (-13.21%)

Mutual labels: artificial-intelligence, jupyter-notebook

Mit Deep Learning

Tutorials, assignments, and competitions for MIT Deep Learning related courses.

Stars: ✭ 8,912 (+8307.55%)

Mutual labels: artificial-intelligence, jupyter-notebook

Computervision Recipes

Best Practices, code samples, and documentation for Computer Vision.

Stars: ✭ 8,214 (+7649.06%)

Mutual labels: artificial-intelligence, jupyter-notebook

Tia

Your Advanced Twitter stalking tool

Stars: ✭ 98 (-7.55%)

Mutual labels: jupyter-notebook, sentiment-classification

Brihaspati

Collection of various implementations and Codes in Machine Learning, Deep Learning and Computer Vision ✨💥

Stars: ✭ 53 (-50%)

Mutual labels: artificial-intelligence, jupyter-notebook

Ai Dl Enthusiasts Meetup

AI & Deep Learning Enthusiasts Meetup Project & Study Sessions

Stars: ✭ 90 (-15.09%)

Mutual labels: artificial-intelligence, jupyter-notebook

Gaze Estimation

A deep learning based gaze estimation framework implemented with PyTorch

Stars: ✭ 33 (-68.87%)

Mutual labels: artificial-intelligence, jupyter-notebook

Rlai Exercises

Exercise Solutions for Reinforcement Learning: An Introduction [2nd Edition]

Stars: ✭ 97 (-8.49%)

Mutual labels: artificial-intelligence, jupyter-notebook

Person remover

People removal in images using Pix2Pix and YOLO.

Stars: ✭ 96 (-9.43%)

Mutual labels: artificial-intelligence, jupyter-notebook

Machine Learning From Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

Stars: ✭ 42 (-60.38%)

Mutual labels: artificial-intelligence, jupyter-notebook

Deep Image Analogy Pytorch

Visual Attribute Transfer through Deep Image Analogy in PyTorch!

Stars: ✭ 100 (-5.66%)

Mutual labels: artificial-intelligence, jupyter-notebook

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-63.21%)

Mutual labels: artificial-intelligence, jupyter-notebook

Notebooks

Some notebooks

Stars: ✭ 53 (-50%)

Mutual labels: artificial-intelligence, jupyter-notebook

True artificial intelligence

真AI人工智能

Stars: ✭ 38 (-64.15%)

Mutual labels: artificial-intelligence, jupyter-notebook

Phormatics

Using A.I. and computer vision to build a virtual personal fitness trainer. (Most Startup-Viable Hack - HackNYU2018)

Stars: ✭ 79 (-25.47%)

Mutual labels: artificial-intelligence, jupyter-notebook

Deep Learning Experiments

Notes and experiments to understand deep learning concepts

Stars: ✭ 883 (+733.02%)

Mutual labels: artificial-intelligence, jupyter-notebook

Particle Filter Prototype

Particle Filter Implementations in Python and C++, with lecture notes and visualizations

Stars: ✭ 29 (-72.64%)

Mutual labels: artificial-intelligence, jupyter-notebook

Ds With Pysimplegui

Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package

Stars: ✭ 93 (-12.26%)

Mutual labels: artificial-intelligence, jupyter-notebook

Recommenders

Best Practices on Recommendation Systems

Stars: ✭ 11,818 (+11049.06%)

Mutual labels: artificial-intelligence, jupyter-notebook

View All Similar Projects ➔

CNN-yelp-challenge-2016-sentiment-classification

This repository trains a word-level Convolutional Neural Network model for sentiment classification task on Yelp Challenge 2016 using standard deep learning packages.

The task is defined on the yelp_academic_dataset_review.json file (5 million rows) in the challenge. It has two fields: "stars" and "text". The "text" field is customer's raw review sentence, and the "stars" field is the customer's rating for the corresponding review, ranging from 1 to 5.

The model architecture is described in the Components section. For the first layer, I experimented with both word2vec and keras built-in embedding.

In order to train the model in a reasonable time, I randomly sampled 1 million datapoints, and ended up with 399850 samples after removing missing values. The class distribution of this subset is shown in table 1.

1	2	3	4	5
46906	34283	50678	106067	161916
11.7%	8.6%	12.7%	26.5%	40.5%

I applied the model to a binary classification task and a multi-lable classification task.

In the binary setting, reviews with a star greater than 2 are regarded as positive samples, otherwise as negative ones. The model achieved 77.91% accuracy on the validation set after 2 epochs of training (see Components section).

For the multi-label classification task, it achieved ~40% accuracy on test set after 1 epoch training. The result is shown in train_multi_class.ipynb.

Feel free to continue my work, and let me know if you obtain better results!

Requirements

Keras: pip install keras (1.0.3)
Theano: pip install theano (0.8.0.dev0)

Components

This repository contains the following components:

json-csv.pyThis is the script for data preprocessing, it converts the yelp_academic_dataset_review.json file to a csv file (named as review.csv).
Word2VecUtility.pyIt's borrowed from Kaggle's word2vec tutorial. It segments a sentence into a word list or a sentence list.
word2vec_model.ipynbIt trains a word2vec model on the review data. Each word is represented by a 300 dimensional vector. The trained model is named as 300features_40minwords_10context.
train_with_word2vec_embedding.ipynbThis file trains a 1D CNN for sentiment classification using word2vec embedding. (the embedded dataset has a shape of (N, 50, 300), see Details section). Unfortunately, my machine was unable to finish the training stage due to memory issues. So I turn to use Keras' built-in embedding layer instead.
train_keras_embedding.ipynbIt trains a model similar to the previous one. The only difference is the embedding layer. The architecture of this model is : Embedding layer - Dropout - Convolution1D - MaxPooling1D - Full Connected layer - Dropout - Relu activation - Sigmoid (with binary cross entropy loss). It was trained on 319880 samples and validated on 79970 samples (train acc: 0.7791 and val_acc: 0.7761 after 2 epoch training).
train_multi_class.ipynbIt trains a multi-label classification model with the same architecture on the same subset. It achieved ~40% validation accuracy after 1 epoch training.

Details

To train CNN models on textual data, we need to represent the dataset in 2-d matrices (just like traning CNN models on images). There are many ways to achieve this purpose. In this task, I tried two apporaches: (i) using the word2vec embedding and (ii) using keras' built-in embedding layer.

Word2vec embedding

With a word2vec model, we can transform each review into a fixed length of words with each word represented by its word vector using strategies such as truncating and padding.

e.g. We can set max_length = 50 (max number of words for each review) and the word2vec vocabulary size as 5000. The indices of words in word2vec model are all increased by 3 because 0, 1, 2 are reserved for special purposes. Specifically, reviews with less than 50 word are padded with 0 at the beginning, and longer reviews are truncated to only keep the first 50 words. We let all reviews begin with index 1 and all words outside of the vocabulary be replaced by index 2. Next, for each review, we can map its words to their corresponding word vectors. In the end, each review is represented as a (50, 300) matrix.

Keras embedding layer

This is similar to the previous case where we use word2vec to represent a review as a matrix. One can just consider Keras embedding layer as an end-to-end trained word embedding (it's the first layer in the respective model architecture).

Replication

To replication my results, please download the dataset and run json-csv.py and word2vec_model.py to sample the exact 399850 reviews that I used in this task. Run train_keras_embedding.py to train a CNN model using keras embedding layer. You can also run train_with_word2vec_embedding.py if you want to use word2vec embedding (You need to train a word2vec model beforehand). Make sure you get the sampled dataset before you train the model, or you are free to experiment on all 5 million reviews :)

If you would like to predict the 5-category star for each review, see my experiments in train_multi_class.ipynb.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 106

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗