Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → AtsunoriFujita → Jigsaw-Unintended-Bias-in-Toxicity-Classification

AtsunoriFujita / Jigsaw-Unintended-Bias-in-Toxicity-Classification

Licence: other

7th Place Solution for Jigsaw Unintended Bias in Toxicity Classification on Kaggle

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch kaggle kaggle-competition jigsaw

Projects that are alternatives of or similar to Jigsaw-Unintended-Bias-in-Toxicity-Classification

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (+881.25%)

Mutual labels: kaggle, kaggle-competition

Kaggle | 14th place solution for TGS Salt Identification Challenge

Stars: ✭ 73 (+356.25%)

Mutual labels: kaggle, kaggle-competition

Data-Science-Hackathon-And-Competition

Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI

Stars: ✭ 165 (+931.25%)

Mutual labels: kaggle, kaggle-competition

Tensorflow implementation : U-net and FCN with global convolution

Stars: ✭ 101 (+531.25%)

Mutual labels: kaggle, kaggle-competition

Bike-Sharing-Demand-Kaggle

Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand

Stars: ✭ 33 (+106.25%)

Mutual labels: kaggle, kaggle-competition

Kaggle Airbnb Recruiting New User Bookings

2nd Place Solution in Kaggle Airbnb New User Bookings competition

Stars: ✭ 118 (+637.5%)

Mutual labels: kaggle, kaggle-competition

digit recognizer

CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).

Stars: ✭ 27 (+68.75%)

Mutual labels: kaggle, kaggle-competition

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+7243.75%)

Mutual labels: kaggle, kaggle-competition

open-solution-ship-detection

Open solution to the Airbus Ship Detection Challenge

Stars: ✭ 54 (+237.5%)

Mutual labels: kaggle, kaggle-competition

Facial Expression Recognition

Stars: ✭ 32 (+100%)

Mutual labels: kaggle, kaggle-competition

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+7837.5%)

Mutual labels: kaggle, kaggle-competition

Apartment-Interest-Prediction

Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.

Stars: ✭ 17 (+6.25%)

Mutual labels: kaggle, kaggle-competition

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (+437.5%)

Mutual labels: kaggle, kaggle-competition

Open Solution Toxic Comments

Open solution to the Toxic Comment Classification Challenge

Stars: ✭ 154 (+862.5%)

Mutual labels: kaggle, kaggle-competition

Kaggle Notebooks

Sample notebooks for Kaggle competitions

Stars: ✭ 77 (+381.25%)

Mutual labels: kaggle, kaggle-competition

StoreItemDemand

(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.

Stars: ✭ 24 (+50%)

Mutual labels: kaggle, kaggle-competition

Kaggle Web Traffic Time Series Forecasting

Solution to Kaggle - Web Traffic Time Series Forecasting

Stars: ✭ 29 (+81.25%)

Mutual labels: kaggle, kaggle-competition

Ml competition platform

Kaggle-like machine learning competition platform

Stars: ✭ 42 (+162.5%)

Mutual labels: kaggle, kaggle-competition

histopathologic cancer detector

CNN histopathologic tumor identifier.

Stars: ✭ 26 (+62.5%)

Mutual labels: kaggle, kaggle-competition

Hello-Kaggle-Guide-KOR

Kaggle을 처음 접하는 사람들을 위한 문서

Stars: ✭ 140 (+775%)

Mutual labels: kaggle, kaggle-competition

View All Similar Projects ➔

Jigsaw Unintended Bias in Toxicity Classification

This respository contains my code for competition in kaggle.

7th Place Solution for Jigsaw Unintended Bias in Toxicity Classification

Team: Abhishek Thakur, Duy, R0seNb1att, atfujita

All models(Team)
Public LB: 0.94729(3rd)
Private LB: 0.94660(7th)

Note: This repository contains only my models and only train script.

My models(5 Model averaging)
Public LB: 0.94719
Private LB: 0.94651

Thanks to Abhishek and Duy's wonderful models and support, I was able to get better results.

Set up

Particularly important libraries are listed in requirements.txt

Models

I created 5 models

LSTM
- Based on the Quora competition model
- Architecture: LSTM + GRU + Self Attention + Max pooling
- Word embeddings: concat glove and fasttext.
- Optimizer: AdamW
- Train:
  - max_len = 220
  - n_splits = 10
  - batch_size = 512
  - train_epochs = 7
  - base_lr, max_lr = 0.0005, 0.003
  - Weight Decay = 0.0001
  - Learning schedule: CyclicLR
BERT
- The model is based on yuval reina's graet kernel
- Changes are loss function and preprocessing.
- I created 4 BERT models.
  - BERT-Base Uncased
  - BERT-Base Cased
  - BERT-Large Uncased(Whole Word Masking)
  - BERT-Large Cased(Whole Word Masking)
- Train:
  - max_len = 220
  - train samples = 1.7M, val samples= 0.1M
  - batch_size = 32(Base), 4(Large)
  - accumulation_steps = 1(Base), 16(Large)
  - train_epochs = 2
  - lr = 2e-5

Worked well

The loss function was very important in this competition.
In fact, all winners used different loss functions.

My loss function is below.

y_columns = ['target']

y_aux_train = train_df[['target', 'severe_toxicity', 'obscene',
                        'identity_attack', 'insult',
                        'threat',
                        'sexual_explicit'
                        ]]

y_aux_train = y_aux_train.fillna(0)

identity_columns = [
    'male', 'female', 'homosexual_gay_or_lesbian', 'christian', 'jewish',
    'muslim', 'black', 'white', 'psychiatric_or_mental_illness']
# Overall
weights = np.ones((len(train_df),)) / 4
# Subgroup
weights += (train_df[identity_columns].fillna(0).values >= 0.5).sum(
    axis=1).astype(bool).astype(np.int) / 4
# Background Positive, Subgroup Negative
weights += (((train_df['target'].values >= 0.5).astype(bool).astype(np.int) +
             (train_df[identity_columns].fillna(0).values < 0.5).sum(
                 axis=1).astype(bool).astype(np.int)) > 1).astype(
    bool).astype(np.int) / 4
# Background Negative, Subgroup Positive
weights += (((train_df['target'].values < 0.5).astype(bool).astype(np.int) +
             (train_df[identity_columns].fillna(0).values >= 0.5).sum(
                 axis=1).astype(bool).astype(np.int)) > 1).astype(
    bool).astype(np.int) / 4

y_train = np.vstack(
    [(train_df['target'].values >= 0.5).astype(np.int), weights]).T

y_train = np.hstack([y_train, y_aux_train])


def custom_loss(data, targets):
    ''' Define custom loss function for weighted BCE on 'target' column '''
    bce_loss_1 = nn.BCEWithLogitsLoss(
        weight=targets[:, 1:2])(data[:, :1], targets[:, :1])
    bce_loss_2 = nn.BCEWithLogitsLoss()(data[:, 1:2], targets[:, 2:3])
    bce_loss_3 = nn.BCEWithLogitsLoss()(data[:, 2:3], targets[:, 3:4])
    bce_loss_4 = nn.BCEWithLogitsLoss()(data[:, 3:4], targets[:, 4:5])
    bce_loss_5 = nn.BCEWithLogitsLoss()(data[:, 4:5], targets[:, 5:6])
    bce_loss_6 = nn.BCEWithLogitsLoss()(data[:, 5:6], targets[:, 6:7])
    bce_loss_7 = nn.BCEWithLogitsLoss()(data[:, 6:7], targets[:, 7:8])
    bce_loss_8 = nn.BCEWithLogitsLoss()(data[:, 7:8], targets[:, 8:9])

    return bce_loss_1 + bce_loss_2 + bce_loss_3 + bce_loss_4 \
           + bce_loss_5 + bce_loss_6 + bce_loss_7 + bce_loss_8

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 16

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗