All Projects → sujitpal → Eeap Examples

sujitpal / Eeap Examples

Code for Document Similarity on Reuters dataset using Encode, Embed, Attend, Predict recipe

Projects that are alternatives of or similar to Eeap Examples

Coursework
summer school coursework
Stars: ✭ 249 (-1.19%)
Mutual labels:  jupyter-notebook
Deep Learning Book
Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"
Stars: ✭ 2,705 (+973.41%)
Mutual labels:  jupyter-notebook
Genomic Ulmfit
ULMFiT for Genomic Sequence Data
Stars: ✭ 250 (-0.79%)
Mutual labels:  jupyter-notebook
Pointrend Pytorch
A PyTorch implementation of PointRend: Image Segmentation as Rendering
Stars: ✭ 249 (-1.19%)
Mutual labels:  jupyter-notebook
Ai Projects
Artificial Intelligence projects, documentation and code.
Stars: ✭ 250 (-0.79%)
Mutual labels:  jupyter-notebook
Whirlwindtourofpython
The Jupyter Notebooks behind my OReilly report, "A Whirlwind Tour of Python"
Stars: ✭ 3,002 (+1091.27%)
Mutual labels:  jupyter-notebook
Drowsiness detection
Stars: ✭ 250 (-0.79%)
Mutual labels:  jupyter-notebook
Machine Learning Online 2018
ML Online Course Repository. Course videos on online.codingblocks.com
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook
Effective python notebook
Stars: ✭ 251 (-0.4%)
Mutual labels:  jupyter-notebook
Tutorials
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook
Ml sagemaker studies
Case studies, examples, and exercises for learning to deploy ML models using AWS SageMaker.
Stars: ✭ 249 (-1.19%)
Mutual labels:  jupyter-notebook
Pytorch Exercise
Practical Exercise Codes for PyTorch
Stars: ✭ 250 (-0.79%)
Mutual labels:  jupyter-notebook
Team Learning Program
主要存储Datawhale组队学习中“编程、数据结构与算法”方向的资料。
Stars: ✭ 247 (-1.98%)
Mutual labels:  jupyter-notebook
Pysolar
Pysolar is a collection of Python libraries for simulating the irradiation of any point on earth by the sun. It includes code for extremely precise ephemeris calculations.
Stars: ✭ 249 (-1.19%)
Mutual labels:  jupyter-notebook
Shared
Shared Blogs and Notebooks
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook
Mixup Generator
An implementation of "mixup: Beyond Empirical Risk Minimization"
Stars: ✭ 250 (-0.79%)
Mutual labels:  jupyter-notebook
Nbdev
Create delightful python projects using Jupyter Notebooks
Stars: ✭ 3,061 (+1114.68%)
Mutual labels:  jupyter-notebook
Selene
a framework for training sequence-level deep learning networks
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook
Coastsat
Global shoreline mapping tool from satellite imagery
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook
Modern practical nlp
This course covers how you can use NLP to do stuff.
Stars: ✭ 252 (+0%)
Mutual labels:  jupyter-notebook

eeap-examples

Table of Contents

Introduction

This repository contains some examples of applying the Embed, Encode, Attend, Predict (EEAP) recipe proposed by Matthew Honnibal, creator of the SpaCy deep learning toolkit, for building Deep Learning pipelines.

I also gave a talk about this at my talk at PyData Seattle 2017.

Code is in Python. All models are built using the awesome Keras library. Supporting code uses NLTK and Scikit-Learn.

The examples use 4 custom Attention layers, also available here as a Python include file. The examples themselves are written as Jupyter notebooks.

A good complete implementation of attention can be found here.

Data

Please refer to data/README.md for instructions on how to download the data necessary to run these examples.

Examples

Document Classification Task

The document classification task attempts to build a classification model for documents by treating it as a sequence of sentences, and sentences as sequence of words. We start with the bag of words approach, computing document embeddings as an average of its sentence embeddings, and sentence embeddings as an average of its word embeddings. Next we build a hierarchical model for building sentence embeddings using a bidirectional LSTM, and embed this model within one that builds document embeddings by encoding the output of this model using another bidirectional LSTM. Finally we add attention layers to each level (sentence and document model). Our final model is depicted in the figure below:

The models were run against the Reuters 20 Newsgroups data in order to classify a given document into one of 20 classes. The chart below shows the results of running these different experiments. The interesting value here is the test set accuracy, but we have shown training and validation set accuracies as well for completeness.

As you can see, the accuracy rises from about 71% for the bag of words model to about 82% for the hierarchical model that incorporates the Matrix Vector Attention models.


Document Similarity Task

The Document Similarity task uses a nested model similar to the document classification task, where the sentence model generates a sentence embedding from a sequence of word embeddings, and a document model embeds the sentence model to generate a document embedding. A pair of such networks are set up to produce document vectors from the documents being compared, and the concatenated vector fed into a fully connected network to predict a binary (similar / not similar) outcome.

The dataset for this was manufactured from the Reuters 20 newsgroup dataset. TF-IDF vectors were generated for all 10,000 test set documents, and the similarity between all pairs of these vectors were calculated. Then the top 5 percentile was selected as the positive set and the bottom 5 percentile as the negative set. Even so, there does not appear to be too much differentiation, similarity values differed by about 0.2 between the two sets. A 1% sample was then drawn from either set to make the training set for this network.

We built two models, one without attention at either the sentence or document layer, and one with attention on the document layer. Results are shown below:


Sentence Similarity Task

The Sentence Similarity task uses the Semantic Similarity Task dataset from 2012. The objective is to classify a pair of sentences into a continuous scale of similarity from 0 to 5. We build a regression network as shown below. Our loss function is Mean Squared Error and Optimizer is RMSProp. Evaluation is done by computing the RMSE between the label similarity and the network predictions of the test set. In addition, we also compute the Pearson and Spearman (rank) correlations between the labels and predictions of the test set.

Our baseline is a hierarchical network that computes an encoding for each sentence in the pair, where the encodings without attention are used to generate the prediction. We compare the baseline to Matrix Matrix dot attention proposed by Parikh, et al where the inputs are scaled to [-1, 1] (MM-dot(s)). Next we compare with an unscaled version of this (MM-dot). Finally, we introduce two new attention implementations based on a description on this Tensorflow NMT page - specifically, an additive attention (MM-add) proposed by Bahdanau, et al, and a multiplicative attention (MM-mult) proposed by Luong, et al. Both operate on the encoder outputs without scaling via tanh. Results are shown below. As can be seen, the MM-add and MM-mult result in lower RMSE and generally higher Pearson and Spearman correlations than the baseline.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].