Deep Learning Final Project Repo

Colby Wise | Mike Alvarino | Richard Dewey @ Columbia.edu

Project Overview:

In this research paper we apply the methodology outlined in the arXiv working paper: ”Joint Deep Modeling of Users and Items Using Reviews for Recommendation” for rating prediction of movies using the Amazon Instant Video data set and GloVe.6B 50 dimensional word embeddings. Of the data set there are only 18,000 text reviews. The approach used in this paper models users and items jointly using review text in two cooperative neural networks.

Before attempting to train the networks as provided in this repository, the user must preprocess the amazon instant video dataset. The goal of this process is for one data point to contain all of the users reviews (excluding the review for the current movie), all of the movie's reviews (excluding that written by the current user), and the associated rating. We have provided some notebooks and examples in the Preprocessing directory that may be useful.

Because one of the primary goals of our project was to explore the effectiveness of different sequential data modeling neural network layers, it was natural to split the code base into three different source files corresponding with the three different architectures we analyzed.

Data Utilized:

Amazon Instant Video 5-core via Julian McAuley @ UCSD. Available as of 11/27/17 URL: http://jmcauley.ucsd.edu/data/amazon/
Global Vectors for Word Representation (GloVe) version: 6B.50d.txt via J.Pennington, R. Socher, C.Manning @ Stanford Available as of 11/27/17 URL: https://nlp.stanford.edu/projects/glove/

Environment:

requirements.txt included for reference of packages used.

Source Code:

DeepCoNN-CNN.ipynb - re-implementation of the paper
DeepCoNN-GRU.ipynb - joint model with GRU instead of CNN
DeepCoNN-LSTM.ipynb - joint model with LSTM instead of CNN
Custom Functions.py - utility functions implemented

Model	Training Time	Test MSE
CNN	12 min 53 s	1.48519089265
CNN 100	17 min 45 s	0.854974748883
CNN Dropout	12 min 1 s	1.13791756289
CNN Dropout 100	19 min 12 s	1.12168053715
LSTM	1 hr 54 min 30 s	1.53920110091
LSTM 100	2 hr 3 min 46 s	1.3328432198
LSTM Dropout	2 hr 4 min 7 s	1.11078817165
LSTM Dropout 100	2 hr 0 min 20 s	1.47418677544
GRU	1 hr 33 min 27 s	1.07871009008
GRU 100	1 hr 49 min 52 s	1.12462539877
GRU Dropout	1 hr 28 min 49 s	1.21747808816
GRU Dropout 100	1 hr 43 min 7 s	1.82918500817

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting