All Projects → KevinLiao159 → Quora

KevinLiao159 / Quora

Licence: MIT license
Kaggle: Quora Insincere Questions Classification - detect toxic content to improve online conversations

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Quora

open-solution-ship-detection
Open solution to the Airbus Ship Detection Challenge
Stars: ✭ 54 (+50%)
Mutual labels:  kaggle
Quora-Paraphrase-Question-Identification
Paraphrase question identification using Feature Fusion Network (FFN).
Stars: ✭ 19 (-47.22%)
Mutual labels:  kaggle
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (-8.33%)
Mutual labels:  kaggle
gender-unbiased BERT-based pronoun resolution
Source code for the ACL workshop paper and Kaggle competition by Google AI team
Stars: ✭ 42 (+16.67%)
Mutual labels:  kaggle
speech-recognition-transfer-learning
Speech command recognition DenseNet transfer learning from UrbanSound8k in keras tensorflow
Stars: ✭ 18 (-50%)
Mutual labels:  kaggle
lux-ai-2021
My published benchmark for a Kaggle Simulations Competition
Stars: ✭ 29 (-19.44%)
Mutual labels:  kaggle
kaggledatasets
Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)
Stars: ✭ 44 (+22.22%)
Mutual labels:  kaggle
Hello-Kaggle-Guide-KOR
Kaggle을 처음 접하는 사람들을 위한 문서
Stars: ✭ 140 (+288.89%)
Mutual labels:  kaggle
PracticalMachineLearning
A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+66.67%)
Mutual labels:  kaggle
kaggle redefining cancer treatment
Personalized Medicine: Redefining Cancer Treatment with deep learning
Stars: ✭ 21 (-41.67%)
Mutual labels:  kaggle
PyData-Pseudolabelling-Keynote
Accompanying notebook and sources to "A Guide to Pseudolabelling: How to get a Kaggle medal with only one model" (Dec. 2020 PyData Boston-Cambridge Keynote)
Stars: ✭ 23 (-36.11%)
Mutual labels:  kaggle
data-science-learning
📊 All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.
Stars: ✭ 32 (-11.11%)
Mutual labels:  kaggle
kuzushiji-recognition
Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.
Stars: ✭ 16 (-55.56%)
Mutual labels:  kaggle
Data-Science-Projects
Data Science projects on various problem statements and datasets using Data Analysis, Machine Learning Algorithms, Deep Learning Algorithms, Natural Language Processing, Business Intelligence concepts by Python
Stars: ✭ 28 (-22.22%)
Mutual labels:  kaggle
How-to-score-0.8134-in-Titanic-Kaggle-Challenge
Solution of the Titanic Kaggle competition
Stars: ✭ 114 (+216.67%)
Mutual labels:  kaggle
fer
Facial Expression Recognition
Stars: ✭ 32 (-11.11%)
Mutual labels:  kaggle
rawr
Extract raw R code directly from webpages, including Github, Kaggle, Stack Overflow, and sites made using Blogdown.
Stars: ✭ 15 (-58.33%)
Mutual labels:  kaggle
intel-cervical-cancer
Team GuYuShiJie~'s 15th (top 2%) solution of cervix type classification in Kaggle 2017 competition, using PyTorch.
Stars: ✭ 19 (-47.22%)
Mutual labels:  kaggle
Deep-Learning-Experiments-implemented-using-Google-Colab
Colab Compatible FastAI notebooks for NLP and Computer Vision Datasets
Stars: ✭ 16 (-55.56%)
Mutual labels:  kaggle
ml-competition-template-titanic
Kaggle Titanic example
Stars: ✭ 51 (+41.67%)
Mutual labels:  kaggle

Kaggle Competition: Quora Insincere Questions Classification


Kaggle: Quora Competition

Table of Contents

Introduction

Awesome Python Dependencies License

This competition is sponsored by Quora. The objective is to predict whether a question asked on Quora is sincere or not. This is a kernels only comeptition with contraint of two-hour runtime.

An insincere question is defined as a question intended to make a statement rather than look for helpful answers. Some characteristics that can signify that a question is insincere:

  • has a non-neutral tone
  • is disparaging or inflammatory
  • isn't grounded in reality
  • uses sexual content

Submissions are evaluated on F1 score between the predicted and the observed targets

Model Development

Data Science Workflow

I have a standard workflow for model development. First starts with simple linear-based model, then add complexities if needed. Eventually, I will deploy neural network models with ensemble technique for final submission. Following is each step during my model development:

  1. Establish a strong baseline with the hybrid "NB-SVM" model (link to model V0)

  2. Try tree-based model LightGBM (link to model V1)

  3. Try a blending model: "NB-SVM" + LightGBM (link to the blending model V11)

  4. Establish baseline for neural network model (link to model V2)

  • 1st layer: embedding layer without pretrained
  • 2nd layer: spatial dropout
  • 3rd layer: bidirectional with LSTM
  • 4th layer: global max pooling 1D
  • 5th layer: output dense layer
  1. Try neural network model with pretrained embedding weights I used a very similar neural network architecture like above. The only changes are 1) adding text cleaning 2). using pretrained word embedding weights
  1. Try to use LSTM Attention with Glove word embedding (link to model V40)

  2. Use both LSTM Attention and Capsule Neural Network (CapsNet) (link to model V5)

Kaggle Public LeaderBoard Ranking

model public score public leaderboard
model V0 0.641 1600th (top66%)
model V30 0.683 1075th (top40%)
model V40 0.690 700th (top28%)
model V5 0.697 91th (top4%)

Reference

https://www.kaggle.com/fizzbuzz/beginner-s-guide-to-capsule-networks

https://www.kaggle.com/ashishpatel26/nlp-text-analytics-solution-quora

https://www.kaggle.com/gmhost/gru-capsule

https://www.kaggle.com/larryfreeman/toxic-comments-code-for-alexander-s-9872-model

https://www.kaggle.com/shujian/single-rnn-with-5-folds-snapshot-ensemble

https://www.kaggle.com/thebrownviking20/analyzing-quora-for-the-insinceres

https://www.kaggle.com/mjbahmani/a-data-science-framework-for-quora

https://www.kaggle.com/christofhenkel/how-to-preprocessing-when-using-embeddings

https://www.kaggle.com/sudalairajkumar/a-look-at-different-embeddings

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].