All Projects → yuhsinliu1993 → Quora_QuestionPairs_DL

yuhsinliu1993 / Quora_QuestionPairs_DL

Licence: other
Kaggle Competition: Using deep learning to solve quora's question pairs problem

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Quora QuestionPairs DL

topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (-5.56%)
Mutual labels:  spacy, nlp-machine-learning
Lemminflect
A python module for English lemmatization and inflection.
Stars: ✭ 105 (+94.44%)
Mutual labels:  spacy, nlp-machine-learning
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (-16.67%)
Mutual labels:  spacy, nlp-machine-learning
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-12.96%)
Mutual labels:  spacy
spacymoji
💙 Emoji handling and meta data for spaCy with custom extension attributes
Stars: ✭ 174 (+222.22%)
Mutual labels:  spacy
Entity Embedding
Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"
Stars: ✭ 19 (-64.81%)
Mutual labels:  nlp-machine-learning
deep-semantic-code-search
Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application
Stars: ✭ 63 (+16.67%)
Mutual labels:  nlp-machine-learning
empythy
Automated NLP sentiment predictions- batteries included, or use your own data
Stars: ✭ 17 (-68.52%)
Mutual labels:  nlp-machine-learning
bisemantic
Text pair classification
Stars: ✭ 12 (-77.78%)
Mutual labels:  spacy
Sumrized
Automatic Text Summarization (English/Arabic).
Stars: ✭ 37 (-31.48%)
Mutual labels:  nlp-machine-learning
Machine-Learning-Models
In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.
Stars: ✭ 30 (-44.44%)
Mutual labels:  nlp-machine-learning
rita-dsl
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format
Stars: ✭ 60 (+11.11%)
Mutual labels:  spacy
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (+0%)
Mutual labels:  kaggle-competition
NLP Quickbook
NLP in Python with Deep Learning
Stars: ✭ 516 (+855.56%)
Mutual labels:  spacy
fake-news
This is a further development of the kdnuggets article on fake news classification by George McIntyre
Stars: ✭ 15 (-72.22%)
Mutual labels:  nlp-machine-learning
Deception-Detection-on-Amazon-reviews-dataset
A SVM model that classifies the reviews as real or fake. Used both the review text and the additional features contained in the data set to build a model that predicted with over 85% accuracy without using any deep learning techniques.
Stars: ✭ 42 (-22.22%)
Mutual labels:  nlp-machine-learning
NLP-Flask-Website
A simple Flask website for all NLP tasks which includes Text Preprocessing, Keyword Extraction, Text Summarization etc. Created Date: 30 Jan 2019
Stars: ✭ 43 (-20.37%)
Mutual labels:  nlp-machine-learning
scicle-stopclickbait
Userscript that changes Clickbait headlines by headlines more honest to the news it links to.
Stars: ✭ 16 (-70.37%)
Mutual labels:  nlp-machine-learning
Engine
The Centrifuge process, filter and saves the relevant documents as recommendations to the relevant users
Stars: ✭ 20 (-62.96%)
Mutual labels:  nlp-machine-learning
AI-Sentiment-Analysis-on-IMDB-Dataset
Sentiment Analysis using Stochastic Gradient Descent on 50,000 Movie Reviews Compiled from the IMDB Dataset
Stars: ✭ 55 (+1.85%)
Mutual labels:  nlp-machine-learning

Kaggle Competition: Quora Question Pairs Problem

Only implement on a single model - ESIM

See infomation on https://www.kaggle.com/c/quora-question-pairs

Framework

References

  1. A decomposable attention model for natural language inference (2016) proposed by Aparikh, Oscart, Dipanjand, Uszkoreit.

  2. Reasoning about entailment with neural attention (2016) proposed by Tim Rockta schel.

  3. Neural Machine Translation by Jointly Learning to Align and Translate (2016) proposed by Yoshua Bengio, Dzmitry Bahdanau, KyungHyun Cho.

  4. Enhanced LSTM for Natural Language Inference (2017) proposed by Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen.

Prerequisites

Download spaCy pre-trained Glove embedding weights

# out-of-the-box: download best-matching default model
$ python -m spacy download en

# download best-matching version of specific model for your spaCy installation
$ python -m spacy download en_core_web_md

Usage

To clean the inputs data, and split them into training and validation data run:

$ bash clean.sh

To train a model on default settings: (epochs: 10, embedding size: 300, hidden units: 100, learning rate: 0.0004)

$ python run.py --mode=train --verbose --best_glove

To test a model:

$ python run.py --mode=eval

All options:

usage: run.py [-h] [--num_epochs NUM_EPOCHS] [--batch_size BATCH_SIZE]
              [--embedding_size EMBEDDING_SIZE] [--max_length MAX_LENGTH]
              [--seed SEED] [--input_data INPUT_DATA] [--test_data TEST_DATA]
              [--val_data VAL_DATA] [--num_classes NUM_CLASSES]
              [--num_hidden NUM_HIDDEN] [--num_unknown NUM_UNKNOWN]
              [--learning_rate LEARNING_RATE] [--keep_prob KEEP_PROB]
              [--best_glove] [--tree_truncate] [--verbose]
              [--load_model LOAD_MODEL] --mode MODE

optional arguments:
  -h, --help            show this help message and exit
  --num_epochs NUM_EPOCHS
                        Specify number of epochs
  --batch_size BATCH_SIZE
                        Specify number of batch size
  --embedding_size EMBEDDING_SIZE
                        Specify embedding size
  --max_length MAX_LENGTH
                        Specify the max length of input sentence
  --seed SEED           Specify seed for randomization
  --input_data INPUT_DATA
                        Specify the location of input data
  --test_data TEST_DATA
                        Specify the location of test data
  --val_data VAL_DATA   Specify the location of test data
  --num_classes NUM_CLASSES
                        Specify the number of classes
  --num_hidden NUM_HIDDEN
                        Specify the number of hidden units in each rnn cell
  --num_unknown NUM_UNKNOWN
                        Specify the number of unknown words for putting in the
                        embedding matrix
  --learning_rate LEARNING_RATE
                        Specify dropout rate
  --keep_prob KEEP_PROB
                        Specify the rate (between 0 and 1) of the units that
                        will keep during training
  --best_glove          Glove: using light version or best-matching version
  --tree_truncate       Specify whether do tree_truncate or not
  --verbose             Verbose on training
  --load_model LOAD_MODEL
                        Locate the path of the model
  --mode MODE           Specify mode: train or eval or predict
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].