A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.

Stars: ✭ 60 (+66.67%)

Mutual labels: kaggle

kaggle redefining cancer treatment

Personalized Medicine: Redefining Cancer Treatment with deep learning

Stars: ✭ 21 (-41.67%)

Mutual labels: kaggle

PyData-Pseudolabelling-Keynote

Accompanying notebook and sources to "A Guide to Pseudolabelling: How to get a Kaggle medal with only one model" (Dec. 2020 PyData Boston-Cambridge Keynote)

Stars: ✭ 23 (-36.11%)

Mutual labels: kaggle

data-science-learning

📊 All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.

Stars: ✭ 32 (-11.11%)

Mutual labels: kaggle

kuzushiji-recognition

Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.

Stars: ✭ 16 (-55.56%)

Mutual labels: kaggle

Data-Science-Projects

Data Science projects on various problem statements and datasets using Data Analysis, Machine Learning Algorithms, Deep Learning Algorithms, Natural Language Processing, Business Intelligence concepts by Python

Stars: ✭ 28 (-22.22%)

Mutual labels: kaggle

How-to-score-0.8134-in-Titanic-Kaggle-Challenge

Solution of the Titanic Kaggle competition

Stars: ✭ 114 (+216.67%)

Mutual labels: kaggle

fer

Facial Expression Recognition

Stars: ✭ 32 (-11.11%)

Mutual labels: kaggle

rawr

Extract raw R code directly from webpages, including Github, Kaggle, Stack Overflow, and sites made using Blogdown.

Stars: ✭ 15 (-58.33%)

Mutual labels: kaggle

intel-cervical-cancer

Team GuYuShiJie~'s 15th (top 2%) solution of cervix type classification in Kaggle 2017 competition, using PyTorch.

Stars: ✭ 19 (-47.22%)

Mutual labels: kaggle

Deep-Learning-Experiments-implemented-using-Google-Colab

Colab Compatible FastAI notebooks for NLP and Computer Vision Datasets

Stars: ✭ 16 (-55.56%)

Mutual labels: kaggle

ml-competition-template-titanic

Kaggle Titanic example

Stars: ✭ 51 (+41.67%)

Mutual labels: kaggle

View All Similar Projects ➔

Kaggle Competition: Quora Insincere Questions Classification

Introduction
Model Development
Kaggle Public LeaderBoard Ranking
Reference

Introduction

This competition is sponsored by Quora. The objective is to predict whether a question asked on Quora is sincere or not. This is a kernels only comeptition with contraint of two-hour runtime.

An insincere question is defined as a question intended to make a statement rather than look for helpful answers. Some characteristics that can signify that a question is insincere:

has a non-neutral tone
is disparaging or inflammatory
isn't grounded in reality
uses sexual content

Submissions are evaluated on F1 score between the predicted and the observed targets

Model Development

I have a standard workflow for model development. First starts with simple linear-based model, then add complexities if needed. Eventually, I will deploy neural network models with ensemble technique for final submission. Following is each step during my model development:

Establish a strong baseline with the hybrid "NB-SVM" model (link to model V0)
Try tree-based model LightGBM (link to model V1)
Try a blending model: "NB-SVM" + LightGBM (link to the blending model V11)
Establish baseline for neural network model (link to model V2)

1st layer: embedding layer without pretrained
2nd layer: spatial dropout
3rd layer: bidirectional with LSTM
4th layer: global max pooling 1D
5th layer: output dense layer

Try neural network model with pretrained embedding weights I used a very similar neural network architecture like above. The only changes are 1) adding text cleaning 2). using pretrained word embedding weights

Neural Networks with Glove word embedding (link to model V30)
Neural Networks with Paragram word embedding (link to model V31)
Neural Networks with FastText word embedding (link to model V32)

Try to use LSTM Attention with Glove word embedding (link to model V40)
Use both LSTM Attention and Capsule Neural Network (CapsNet) (link to model V5)

Kaggle Public LeaderBoard Ranking

model	public score	public leaderboard
model V0	0.641	1600th (top66%)
model V30	0.683	1075th (top40%)
model V40	0.690	700th (top28%)
model V5	0.697	91th (top4%)

Reference

https://www.kaggle.com/fizzbuzz/beginner-s-guide-to-capsule-networks

https://www.kaggle.com/ashishpatel26/nlp-text-analytics-solution-quora

https://www.kaggle.com/gmhost/gru-capsule

https://www.kaggle.com/larryfreeman/toxic-comments-code-for-alexander-s-9872-model

https://www.kaggle.com/shujian/single-rnn-with-5-folds-snapshot-ensemble

https://www.kaggle.com/thebrownviking20/analyzing-quora-for-the-insinceres

https://www.kaggle.com/mjbahmani/a-data-science-framework-for-quora

https://www.kaggle.com/christofhenkel/how-to-preprocessing-when-using-embeddings

https://www.kaggle.com/sudalairajkumar/a-look-at-different-embeddings

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

KevinLiao159 / Quora

Programming Languages

Labels

Projects that are alternatives of or similar to Quora

Kaggle Competition: Quora Insincere Questions Classification

Table of Contents

Introduction

Model Development

Kaggle Public LeaderBoard Ranking

Reference