A SVM model that classifies the reviews as real or fake. Used both the review text and the additional features contained in the data set to build a model that predicted with over 85% accuracy without using any deep learning techniques.

Stars: ✭ 42 (-22.22%)

Mutual labels: nlp-machine-learning

NLP-Flask-Website

A simple Flask website for all NLP tasks which includes Text Preprocessing, Keyword Extraction, Text Summarization etc. Created Date: 30 Jan 2019

Stars: ✭ 43 (-20.37%)

Mutual labels: nlp-machine-learning

scicle-stopclickbait

Userscript that changes Clickbait headlines by headlines more honest to the news it links to.

Stars: ✭ 16 (-70.37%)

Mutual labels: nlp-machine-learning

Engine

The Centrifuge process, filter and saves the relevant documents as recommendations to the relevant users

Stars: ✭ 20 (-62.96%)

Mutual labels: nlp-machine-learning

AI-Sentiment-Analysis-on-IMDB-Dataset

Sentiment Analysis using Stochastic Gradient Descent on 50,000 Movie Reviews Compiled from the IMDB Dataset

Stars: ✭ 55 (+1.85%)

Mutual labels: nlp-machine-learning

View All Similar Projects ➔

Kaggle Competition: Quora Question Pairs Problem

Only implement on a single model - ESIM

See infomation on https://www.kaggle.com/c/quora-question-pairs

Framework

References

A decomposable attention model for natural language inference (2016) proposed by Aparikh, Oscart, Dipanjand, Uszkoreit.
Reasoning about entailment with neural attention (2016) proposed by Tim Rockta schel.
Neural Machine Translation by Jointly Learning to Align and Translate (2016) proposed by Yoshua Bengio, Dzmitry Bahdanau, KyungHyun Cho.
Enhanced LSTM for Natural Language Inference (2017) proposed by Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen.

Prerequisites

python 2.7 or 3+
numpy
Tensorflow
Keras
spaCy

Download spaCy pre-trained Glove embedding weights

# out-of-the-box: download best-matching default model
$ python -m spacy download en

# download best-matching version of specific model for your spaCy installation
$ python -m spacy download en_core_web_md

Usage

To clean the inputs data, and split them into training and validation data run:

$ bash clean.sh

To train a model on default settings: (epochs: 10, embedding size: 300, hidden units: 100, learning rate: 0.0004)

$ python run.py --mode=train --verbose --best_glove

To test a model:

$ python run.py --mode=eval

All options:

usage: run.py [-h] [--num_epochs NUM_EPOCHS] [--batch_size BATCH_SIZE]
              [--embedding_size EMBEDDING_SIZE] [--max_length MAX_LENGTH]
              [--seed SEED] [--input_data INPUT_DATA] [--test_data TEST_DATA]
              [--val_data VAL_DATA] [--num_classes NUM_CLASSES]
              [--num_hidden NUM_HIDDEN] [--num_unknown NUM_UNKNOWN]
              [--learning_rate LEARNING_RATE] [--keep_prob KEEP_PROB]
              [--best_glove] [--tree_truncate] [--verbose]
              [--load_model LOAD_MODEL] --mode MODE

optional arguments:
  -h, --help            show this help message and exit
  --num_epochs NUM_EPOCHS
                        Specify number of epochs
  --batch_size BATCH_SIZE
                        Specify number of batch size
  --embedding_size EMBEDDING_SIZE
                        Specify embedding size
  --max_length MAX_LENGTH
                        Specify the max length of input sentence
  --seed SEED           Specify seed for randomization
  --input_data INPUT_DATA
                        Specify the location of input data
  --test_data TEST_DATA
                        Specify the location of test data
  --val_data VAL_DATA   Specify the location of test data
  --num_classes NUM_CLASSES
                        Specify the number of classes
  --num_hidden NUM_HIDDEN
                        Specify the number of hidden units in each rnn cell
  --num_unknown NUM_UNKNOWN
                        Specify the number of unknown words for putting in the
                        embedding matrix
  --learning_rate LEARNING_RATE
                        Specify dropout rate
  --keep_prob KEEP_PROB
                        Specify the rate (between 0 and 1) of the units that
                        will keep during training
  --best_glove          Glove: using light version or best-matching version
  --tree_truncate       Specify whether do tree_truncate or not
  --verbose             Verbose on training
  --load_model LOAD_MODEL
                        Locate the path of the model
  --mode MODE           Specify mode: train or eval or predict

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yuhsinliu1993 / Quora_QuestionPairs_DL

Programming Languages

Labels

Projects that are alternatives of or similar to Quora QuestionPairs DL

Kaggle Competition: Quora Question Pairs Problem

Framework

References

Prerequisites

Usage