Alternatives and detailed information of Quora-Paraphrase-Question-Identification

hengluchang / Quora-Paraphrase-Question-Identification

Licence: other

Paraphrase question identification using Feature Fusion Network (FFN).

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Quora-Paraphrase-Question-Identification

sklearn-feature-engineering

使用sklearn做特征工程

Stars: ✭ 114 (+500%)

Mutual labels: kaggle, feature-engineering

Home Credit Default Risk

Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline

Stars: ✭ 68 (+257.89%)

Mutual labels: kaggle, feature-engineering

Open Solution Home Credit

Open solution to the Home Credit Default Risk challenge 🏡

Stars: ✭ 397 (+1989.47%)

Mutual labels: kaggle, feature-engineering

question-pair

A siamese LSTM to detect sentence/question pairs.

Stars: ✭ 25 (+31.58%)

Mutual labels: quora, quora-question-pairs

Nyaggle

Code for Kaggle and Offline Competitions

Stars: ✭ 209 (+1000%)

Mutual labels: kaggle, feature-engineering

fastknn

Fast k-Nearest Neighbors Classifier for Large Datasets

Stars: ✭ 64 (+236.84%)

Mutual labels: kaggle, feature-engineering

Kaggle Quora Question Pairs

Kaggle：Quora Question Pairs, 4th/3396 (https://www.kaggle.com/c/quora-question-pairs)

Stars: ✭ 705 (+3610.53%)

Mutual labels: kaggle, feature-engineering

Bike-Sharing-Demand-Kaggle

Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand

Stars: ✭ 33 (+73.68%)

Mutual labels: kaggle, feature-engineering

Lightautoml

LAMA - automatic model creation framework

Stars: ✭ 196 (+931.58%)

Mutual labels: kaggle, feature-engineering

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (+726.32%)

Mutual labels: kaggle, feature-engineering

Kaggler

Code for Kaggle Data Science Competitions

Stars: ✭ 614 (+3131.58%)

Mutual labels: kaggle, feature-engineering

kaggle-berlin

Material of the Kaggle Berlin meetup group!

Stars: ✭ 36 (+89.47%)

Mutual labels: kaggle, feature-engineering

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (+352.63%)

Mutual labels: kaggle, feature-engineering

Data-Science

Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.

Stars: ✭ 15 (-21.05%)

Mutual labels: kaggle, feature-engineering

Kaggle-Quora-Question-Pairs

This is our team's solution report, which achieves top 10% (305/3307) in this competition.

Stars: ✭ 58 (+205.26%)

Mutual labels: kaggle, paraphrase-identification

kaggledatasets

Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)

Stars: ✭ 44 (+131.58%)

Mutual labels: kaggle

kaggle-camera-model-identification

Code for reproducing 2nd place solution for Kaggle competition IEEE's Signal Processing Society - Camera Model Identification

Stars: ✭ 64 (+236.84%)

Mutual labels: kaggle

Dog-Breed-Identification-Gluon

Kaggle 120种狗分类，Gluon实现

Stars: ✭ 45 (+136.84%)

Mutual labels: kaggle

zca

ZCA whitening in python

Stars: ✭ 29 (+52.63%)

Mutual labels: feature-engineering

speech-recognition-transfer-learning

Speech command recognition DenseNet transfer learning from UrbanSound8k in keras tensorflow

Stars: ✭ 18 (-5.26%)

Mutual labels: kaggle

View All Similar Projects ➔

Paraphrase Question Identification using Feature Fusion Network

Identify question pairs that have the same meaning. Feature Fusion Network takes advantage of learning rich features not just from sentence representations but also from hand craft features.

For more detailed information, please see our project research paper: Paraphrase Question Identification using Feature Fusion Network.

Model architecture

Results

0.895 testing accuracy for FFN (train for 100 epoch)

Requirements

Python 3.5 for running FFN
Python 2.7 for running Random Forest (RF) baseline

Package dependencies

RF baseline

scikit-learn 0.18
nltk
pandas

FFN

numpy 1.11
matplotlib 1.5
Keras 1.2
scikit-learn 0.18
h5py 2.6
hdf5 1.8
TensorFlow 0.10

How to run

$ git clone https://github.com/hengluchang/Quora-Paraphrase-Question-Identification

Run Random Forest baseline

create a folder named "dataset".

$ cd Quora-Paraphrase-Question-Identification
$ mkdir -p dataset

Go to Kaggle Quora Question Pairs website and download train.csv.zip and test.csv.zip and unzip both. Place the train.csv and test.csv under /dataset directory.
Create 10 Hand crafted features (HCFs). This will create train_10features.csv and test_10features.csv under /dataset directory.

$ cd ..
$ python feature_gen.py ../dataset/train.csv ../dataset/test.csv

Run Random Forest baseline on these 10 HCFs, this will give you ~ 0.84 testing accuracy.

$ python run_baseline.py ../dataset/train_10features.csv

Run Feature Fusion Network (FFN)

Download the required data here(Google Drive link) to the directory you clone
Train FFN w/o HCF

$ pyhon3 train_noHCF.py -i <QUESTION_PAIRS_FILE> -t <TEST_QUESTION_PAIRS_FILE> -g <GLOVE_FILE> -w <MODEL_WEIGHTS_FILE> -e <WORD_EMBEDDING_MATRIX_FILE> -n <NB_WORDS_DATA_FILE>

For instance:

$ python3 train_noHCF.py -i train_rebalanced.csv -t test.csv -g glove.840B.300d.txt -w question_pairs_weights_100epoch_test10_val10_dropout20_sumOP_noAVG_rebalanced.h5  -e word_embedding_matrix_trainANDtest_rebalanced.npy -n nb_words_trainANDtest_rebalanced.json

Train FFN

$ python3 train_HCF.py -i <QUESTION_PAIRS_FILE> -t <TEST_QUESTION_PAIRS_FILE> -f <HCF_FILE> -g <GLOVE_FILE> -w <MODEL_WEIGHTS_FILE> -e <WORD_EMBEDDING_MATRIX_FILE> -n <NB_WORDS_DATA_FILE>

For instance:

$ python3 train_HCF.py -i train_rebalanced.csv -t test.csv -f train_rebalanced_10features.csv -g glove.840B.300d.txt -w question_pairs_weights_100epoch_test10_val20_dropout20_sumOP_noAVG_HCF_rebalanced.h5  -e word_embedding_matrix_trainANDtest_rebalanced.npy -n nb_words_trainANDtest_rebalanced.json

Test FFN w/o HCF

$ python3 test_noHCF.py -i <QUESTION_PAIRS_FILE> -o <RESULT_FILE> -e <WORD_EMBEDDING_MATRIX_FILE> -n <NB_WORDS_DATA_FILE> -w <MODEL_WEIGHTS_FILE>

For instance:

$ python3 test_noHCF.py -i test.csv  -o result_question_pairs_weights_100epoch_test10_val10_dropout20_sumOP_noAVG_rebalanced.csv -e word_embedding_matrix_trainANDtest_rebalanced.npy -n nb_words_trainANDtest_rebalanced.json -w question_pairs_weights_100epoch_test10_val10_dropout20_sumOP_noAVG_rebalanced.h5

Test FFN

$ python3 test_HCF.py -i <QUESTION_PAIRS_FILE> -o <RESULT_FILE> -e <WORD_EMBEDDING_MATRIX_FILE> -n <NB_WORDS_DATA_FILE> -w <MODEL_WEIGHTS_FILE>

For instance:

$ python3 test_sum_HCF.py -i test.csv -f -test_10features.csv -o result_question_pairs_weights_100epoch_test10_val10_dropout20_sumOP_noAVG_HCF_rebalanced.csv -e word_embedding_matrix_trainANDtest_rebalanced.npy -n nb_words_trainANDtest_rebalanced.json -w question_pairs_weights_100epoch_test10_val10_dropout20_sumOP_noAVG_HCF_rebalanced.h5

Reference

Keras model to identify Quora question pairs: borrowed most of the Deep Neural Network script

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hengluchang / Quora-Paraphrase-Question-Identification

Programming Languages

Labels

Projects that are alternatives of or similar to Quora-Paraphrase-Question-Identification

Paraphrase Question Identification using Feature Fusion Network

Model architecture

Results

Requirements

Package dependencies

RF baseline

FFN

How to run

Run Random Forest baseline

Run Feature Fusion Network (FFN)

Reference