Quora-Question-Pairs
This repository contains the code for our submission in Kaggle's competition Quora Question Pairs in which we ranked in the top 25%. A detailed report for the project can be found here.
Data
train.csv
contains ~ 400k question pairs along with the corresponding label (duplicate or not) and
test.csv
contains ~ 2300k question pairs. Both the files can be found here.
Model Architecture
We use a Siamese Neural Network architecture with Gated Recurrent Units in combination with traditional Machine Learning algorithms like Random Forest, SVM and Adaboost.
Running the model
Firstly, place the train.csv
, test.csv
(see the Data section above) and the pre-trained GloVe embeddings in the input
folder. You can download the embeddings from here. Then, simply run the bash script:
bash run_model.sh
Contributors
Dependencies
- numpy
- pandas
- nltk
- sklearn
- TensorFlow
Install them using pip.
Note
- If there is any issue running the code, please post it in the issue tracker.
- If you like this repo and find it useful, please consider ★ starring it (on top right of the page) :)