All Projects → siddk → entity-network

siddk / entity-network

Licence: other
Tensorflow implementation of "Tracking the World State with Recurrent Entity Networks" [https://arxiv.org/abs/1612.03969] by Henaff, Weston, Szlam, Bordes, and LeCun.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to entity-network

Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (+817.24%)
Mutual labels:  recurrent-neural-networks, embeddings
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-65.52%)
Mutual labels:  recurrent-neural-networks, embeddings
Deep News Summarization
News summarization using sequence to sequence model with attention in TensorFlow.
Stars: ✭ 167 (+187.93%)
Mutual labels:  recurrent-neural-networks, tensorflow-models
relation-network
Tensorflow Implementation of Relation Networks for the bAbI QA Task, detailed in "A Simple Neural Network Module for Relational Reasoning," [https://arxiv.org/abs/1706.01427] by Santoro et. al.
Stars: ✭ 45 (-22.41%)
Mutual labels:  embeddings, tensorflow-models
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (-62.07%)
Mutual labels:  recurrent-neural-networks, embeddings
MAX-Text-Summarizer
Generate a summarized description of a body of text
Stars: ✭ 27 (-53.45%)
Mutual labels:  tensorflow-models
TF-Model-Deploy-Tutorial
A tutorial exploring multiple approaches to deploy a trained TensorFlow (or Keras) model or multiple models for prediction.
Stars: ✭ 51 (-12.07%)
Mutual labels:  tensorflow-models
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+44.83%)
Mutual labels:  recurrent-neural-networks
info-retrieval
Information Retrieval in High Dimensional Data (class deliverables)
Stars: ✭ 33 (-43.1%)
Mutual labels:  embeddings
deep-scite
🚣 A simple recommendation engine (by way of convolutions and embeddings) written in TensorFlow
Stars: ✭ 20 (-65.52%)
Mutual labels:  embeddings
VarCLR
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
Stars: ✭ 30 (-48.28%)
Mutual labels:  embeddings
deep-learning
Assignmends done for Udacity's Deep Learning MOOC with Vincent Vanhoucke
Stars: ✭ 94 (+62.07%)
Mutual labels:  recurrent-neural-networks
Deep-Learning-Tensorflow
Gathers Tensorflow deep learning models.
Stars: ✭ 50 (-13.79%)
Mutual labels:  recurrent-neural-networks
deep-char-cnn-lstm
Deep Character CNN LSTM Encoder with Classification and Similarity Models
Stars: ✭ 20 (-65.52%)
Mutual labels:  embeddings
bruno
a deep recurrent model for exchangeable data
Stars: ✭ 34 (-41.38%)
Mutual labels:  recurrent-neural-networks
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+1315.52%)
Mutual labels:  embeddings
VariationalNeuralAnnealing
A variational implementation of classical and quantum annealing using recurrent neural networks for the purpose of solving optimization problems.
Stars: ✭ 21 (-63.79%)
Mutual labels:  recurrent-neural-networks
SpeakerDiarization RNN CNN LSTM
Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channels).
Stars: ✭ 56 (-3.45%)
Mutual labels:  recurrent-neural-networks
Archived-SANSA-ML
SANSA Machine Learning Layer
Stars: ✭ 39 (-32.76%)
Mutual labels:  embeddings
navec
Compact high quality word embeddings for Russian language
Stars: ✭ 118 (+103.45%)
Mutual labels:  embeddings

Recurrent Entity Networks

Tensorflow/TFLearn Implementation of "Tracking the World State with Recurrent Entity Networks" by Henaff et. al.

Punchline

By building a set of disparate memory cells, each responsible for different concepts, entities, or other content, Recurrent Entity Networks (EntNets) are able to efficiently and robustly maintain a “world-state” - one that can be updated easily and effectively with the influx of new information.

Furthermore, one can either let EntNet cell keys vary, or specifically seed them with specific embeddings, thereby forcing the model to track a given set of entities/objects/locations, allowing for the easy interpretation of the underlying decision-making process.

Results

Implementation results are as follows (graphs of training/validation loss will be added later). Some of the tasks are fairly computationally intensive, so it might take a while to get benchmark results.

Note that for efficiency, training stopped after validation accuracy passed a threshold of 95%. This is different than the method used in the paper, which runs tasks for 200 epochs, and reports the best model across 10 different runs. The number of runs, epochs to converge, and final train/validation/test accuracies (best on validation over different runs) for this repository relative to the paper results are as follows:

Note that the italics above indicate examples of overfitting. Note that the notes rows consist of single runs of the model - this is probably why multiple runs are necessary. If this continues to happen, I'll look into ways to better regularize the network (via dropout, for example).

The bold above denotes failure to convergence. I'm not sure why this is happening, but I'll note that Jim Fleming reports the same sort of issue in his implementation.

Additionally, plots of the training/validation loss and accuracies through training can be found in eval/qa_id, where id is the id of the task at hand. As an example, here is the plot for the graph of Task 1 - Single Supporting Fact's training:

alt text

Components

Entity Networks consist of three separate components:

  1. An Input Encoder, that takes the input sequence at a given time step, and encodes it into a fixed-size vector representation

  2. The Dynamic Memory (the core of the model), that keeps a disparate set of memory cells, each with a different vector key (the location), and a hidden state memory (the content)

  3. The Output Module, that takes the hidden states, and applies a series of transformations to generate the output .

A breakdown of the components are as follows:

Input Encoder: Takes the input from the environment (i.e. a sentence from a story), and maps it to a fixed size state vector .

This repository (like the paper) utilizes a learned multiplicative mask, where each embedding of the sentence is multiplied element-wise with a mask vector and then summed together.

Alternatively, one could just as easily imagine an LSTM or CNN encoder to generate this initial input.

Dynamic Memory: Core of the model, consists of a series of key vectors and memory (hidden state) vectors .

The keys and state vectors function similarly to how the program keys and program embeddings function in the NPI/NTM - the keys represent location, while the memories are content. Only the content (memories) get updated at inference time, with the influx of new information.

Furthermore, one can seed and fix the key vectors such that they reflect certain words/entities => the paper does this by fixing key vectors to certain word embeddings, and using a simple BoW state encoding. This repository currently only supports random key vector seeds.

The Dynamic Memory updates given an input are as follows - this is very similar to the GRU update equations:

  • - Gating function, determines how much memory j should be affected by the given input.
  • - New state update - U, V, W are model parameters that are shared across all memory cells . - Model can be simplified by constraining U, V, W to be zero, or identity.
  • - Gated update, elementwise product of g with $\tilde{h}$. - Dictates how much the given memory should be updated.

Output Module: Model interface, takes in the memories and a query vector q, and transforms them into the required output.

Functions like a 1-hop Memory Network (Sukhbaatar, Weston), building a weighting mechanism over each input, then combines and feeds them through some intermediate layers.

The actual updates are as follows:

  • - Normalizes states based on cosine similarity.
  • - Weighted sum of hidden states
  • - R, H are trainable model parameters. - As long as you can build some sort of loss using y, then the entirety of the model is trainable via Backpropagation-Through-Time (BPTT).

Repository Structure

Directory is structured in the following way:

  • model/ - Model definition code, including the definition of the Dynamic Memory Cell.

  • preprocessor/ - Preprocessing code to load and vectorize the bAbI Tasks.

  • tasks/ - Raw bAbI Task files.

  • run.py - Core script for training and evaluating the Recurrent Entity Network.

References

Big shout-out to Jim Fleming for his initial Tensorflow Implementation - his Dynamic Memory Cell Implementation specifically made things a lot easier.

Reference: Jim Fleming's EntNet Memory Cell

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].