All Projects → hohoCode → textSimilarityConvNet

hohoCode / textSimilarityConvNet

Licence: other
Semantic Similarity Measurement of Texts using Convolutional Neural Networks (He et al., EMNLP 2015)

Projects that are alternatives of or similar to textSimilarityConvNet

imagenet-autoencoder
Autoencoder trained on ImageNet Using Torch 7
Stars: ✭ 18 (-59.09%)
Mutual labels:  torch7
Suravi
Suravi is a small distribution of Ravi/Lua 5.3 with batteries such as cjson, lpeglabel, luasocket, penlight, torch7, luv, luaossl
Stars: ✭ 56 (+27.27%)
Mutual labels:  torch7
Semantic-Textual-Similarity
Natural Language Processing using NLTK and Spacy
Stars: ✭ 30 (-31.82%)
Mutual labels:  textual-similarity
Binary Human Pose Estimation
This code implements a demo of the Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources paper by Adrian Bulat and Georgios Tzimiropoulos.
Stars: ✭ 210 (+377.27%)
Mutual labels:  torch7
Dynamics
A Compositional Object-Based Approach to Learning Physical Dynamics
Stars: ✭ 159 (+261.36%)
Mutual labels:  torch7
Crnn
Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition.
Stars: ✭ 1,901 (+4220.45%)
Mutual labels:  torch7
Binary Face Alignment
Real time face alignment
Stars: ✭ 145 (+229.55%)
Mutual labels:  torch7
Face Alignment Training
Training code for the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper.
Stars: ✭ 127 (+188.64%)
Mutual labels:  torch7
Nips16 ptn
Torch Implementation of NIPS'16 paper: Perspective Transformer Nets
Stars: ✭ 126 (+186.36%)
Mutual labels:  torch7
Pyramidnet
Torch implementation of the paper "Deep Pyramidal Residual Networks" (https://arxiv.org/abs/1610.02915).
Stars: ✭ 121 (+175%)
Mutual labels:  torch7
Human Pose Estimation
This repository implements a demo of the Human pose estimation via Convolutional Part Heatmap Regression paper.
Stars: ✭ 98 (+122.73%)
Mutual labels:  torch7
3d Resnets
3D ResNets for Action Recognition
Stars: ✭ 95 (+115.91%)
Mutual labels:  torch7
Torch Models
Stars: ✭ 65 (+47.73%)
Mutual labels:  torch7
Crayon
A language-agnostic interface to TensorBoard
Stars: ✭ 776 (+1663.64%)
Mutual labels:  torch7
2d And 3d Face Alignment
This repository implements a demo of the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper.
Stars: ✭ 772 (+1654.55%)
Mutual labels:  torch7
Aten
ATen: A TENsor library for C++11
Stars: ✭ 578 (+1213.64%)
Mutual labels:  torch7
Vrn
👨 Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
Stars: ✭ 4,391 (+9879.55%)
Mutual labels:  torch7
Shake Shake
2.86% and 15.85% on CIFAR-10 and CIFAR-100
Stars: ✭ 291 (+561.36%)
Mutual labels:  torch7
dbcollection
A collection of popular datasets for deep learning.
Stars: ✭ 26 (-40.91%)
Mutual labels:  torch7

Multi-Perspective Convolutional Neural Networks for Modeling Textual Similarity

This repo contains the Torch implementation of multi-perspective convolutional neural networks for modeling textual similarity, described in the following paper:

This model does not require external resources such as WordNet or parsers, does not use sparse features, and achieves good accuracy on standard public datasets.

Installation and Dependencies

  • Please install Torch deep learning library. We recommend this local installation which includes all required packages our tool needs, simply follow the instructions here: https://github.com/torch/distro

  • Currently our tool only runs on CPUs, therefore it is recommended to use INTEL MKL library (or at least OpenBLAS lib) so Torch can run much faster on CPUs.

  • Our tool then requires Glove embeddings by Stanford. Please run fetech_and_preprocess.sh for downloading and preprocessing this data set (around 3 GBs).

Running

  • Command to run (training, tuning and testing all included):
  • th trainSIC.lua or th trainMSRVID.lua

The tool will output pearson scores and also write the predicted similarity scores given each pair of sentences from test data into predictions directory.

Adaption to New Dataset

To run our model on your own dataset, first you need to build the dataset following below format and put it under data folder:

  • a.toks: sentence A, each sentence per line.
  • b.toks: sentence B, each sentence per line.
  • id.txt: sentence pair ID
  • sim.txt: semantic relatedness gold label, can be in any scale. For binary classification, the set of labels will be {0, 1}.

Then build vocabulary for your dataset which writes the vocab-cased.txt into your data folder:

$ python build_vocab.py

The last thing is to change the training and model code slightly to process your dataset:

  • change util/read_data.lua to handle your data.
  • create a new piece of training code following trainSIC.lua to read in your dataset.
  • change Conv.lua in Line 89-102 and 142-148 to handle your own task
  • more details can refer to issue castorini#6

Then you should be able to run your training code.

Trained Model

We also porvide a model which is already trained on STS dataset. So it is easier if you just want to use the model and do not want to re-train the whole thing.

The tarined model download link is HERE. Model file size is 500MB. To use the trained model, then simply use codes below:

modelTrained = torch.load("download_local_location/modelSTS.trained.th", 'ascii')
modelTrained.convModel:evaluate()
modelTrained.softMaxC:evaluate()
local linputs = torch.zeros(rigth_sentence_length, emd_dimension)
linpus = XassignEmbeddingValuesX
local rinputs = torch.zeros(left_sentence_length, emd_dimension)
rinpus = XassignEmbeddingValuesX

local part2 = modelTrained.convModel:forward({linputs, rinputs})
local output = modelTrained.softMaxC:forward(part2)
local val = torch.range(0, 5, 1):dot(output:exp()) 
return val/5

The ouput variable 'val' contains a similarity score between [0,1]. The input linputs1/rinputs are torch tensors and you need to fill in the word embedding values for both.

Example Deployment Script with Our Trained Model

We provide one example file for deployment: testDeployTrainedModel.lua. So it is easier for you to directly use our model. Run:

$ th testDeployTrainedModel.lua

This deployment file will use the trained model (assume you have downloaded the trained model from the above link), and it will generate scores given all test sentences of sick dataset. Please note the trained model is not trained on SICK data.

Ackowledgement

We thank Kai Sheng Tai for providing the preprocessing codes. We also thank the public data providers and Torch developers. Thanks.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].