Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ChenglongChen → Kaggle_The_Hunt_for_Prohibited_Content

ChenglongChen / Kaggle_The_Hunt_for_Prohibited_Content

Licence: other

4th Place Solution for The Hunt for Prohibited Content Competition on Kaggle (http://www.kaggle.com/c/avito-prohibited-content)

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

natural-language-processing kaggle kaggle-competition

Projects that are alternatives of or similar to Kaggle The Hunt for Prohibited Content

Jigsaw-Unintended-Bias-in-Toxicity-Classification

7th Place Solution for Jigsaw Unintended Bias in Toxicity Classification on Kaggle

Stars: ✭ 16 (-44.83%)

Mutual labels: kaggle, kaggle-competition

Kaggle | 14th place solution for TGS Salt Identification Challenge

Stars: ✭ 73 (+151.72%)

Mutual labels: kaggle, kaggle-competition

Data-Science-Hackathon-And-Competition

Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI

Stars: ✭ 165 (+468.97%)

Mutual labels: kaggle, kaggle-competition

Kaggle Airbnb Recruiting New User Bookings

2nd Place Solution in Kaggle Airbnb New User Bookings competition

Stars: ✭ 118 (+306.9%)

Mutual labels: kaggle, kaggle-competition

Bike-Sharing-Demand-Kaggle

Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand

Stars: ✭ 33 (+13.79%)

Mutual labels: kaggle, kaggle-competition

Open Solution Toxic Comments

Open solution to the Toxic Comment Classification Challenge

Stars: ✭ 154 (+431.03%)

Mutual labels: kaggle, kaggle-competition

digit recognizer

CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).

Stars: ✭ 27 (-6.9%)

Mutual labels: kaggle, kaggle-competition

Kaggle Notebooks

Sample notebooks for Kaggle competitions

Stars: ✭ 77 (+165.52%)

Mutual labels: kaggle, kaggle-competition

open-solution-ship-detection

Open solution to the Airbus Ship Detection Challenge

Stars: ✭ 54 (+86.21%)

Mutual labels: kaggle, kaggle-competition

Facial Expression Recognition

Stars: ✭ 32 (+10.34%)

Mutual labels: kaggle, kaggle-competition

Tensorflow implementation : U-net and FCN with global convolution

Stars: ✭ 101 (+248.28%)

Mutual labels: kaggle, kaggle-competition

Apartment-Interest-Prediction

Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.

Stars: ✭ 17 (-41.38%)

Mutual labels: kaggle, kaggle-competition

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+4279.31%)

Mutual labels: kaggle, kaggle-competition

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (+441.38%)

Mutual labels: kaggle, kaggle-competition

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (+196.55%)

Mutual labels: kaggle, kaggle-competition

StoreItemDemand

(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.

Stars: ✭ 24 (-17.24%)

Mutual labels: kaggle, kaggle-competition

Ml competition platform

Kaggle-like machine learning competition platform

Stars: ✭ 42 (+44.83%)

Mutual labels: kaggle, kaggle-competition

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+3951.72%)

Mutual labels: kaggle, kaggle-competition

histopathologic cancer detector

CNN histopathologic tumor identifier.

Stars: ✭ 26 (-10.34%)

Mutual labels: kaggle, kaggle-competition

Hello-Kaggle-Guide-KOR

Kaggle을 처음 접하는 사람들을 위한 문서

Stars: ✭ 140 (+382.76%)

Mutual labels: kaggle, kaggle-competition

View All Similar Projects ➔

Kaggle's The Hunt for Prohibited Content Competition

This repo holds the code I used to make submision to Kaggle's The Hunt for Prohibited Content Competition. The score using this implementation is 0.98527, ranking 4th out of 289 teams. (That entry is placed in ./Submission folder.)

I initally entered this competition to familiarize myself with VW and Linux & Shell (I used to be a Windows user). So the code provided here might not be as efficient and elegant as they can be.

Method

It uses LR to build classifier on a bunch of features including

BOW/Tf-idf 1/2gram features of the title, description, attributes, etc.
All the raw features such as category, subcategory, price, etc.
Some cross-features between the above features, such as subcategory & price, etc seem to help a lot.

I initally trained on the whole dataset, and later found some imporvement by ensembling ranking predicitions from a model using only is_proven bloced ads and unblockded ads.
I have tried all the cost functions provided in VW, i.e., log-loss, hinge loss, squared loss, and quantile loss, but found log-loss give consistently better results. Ensemble models from different loss doesn't seem to buy me anything.

Code layout

Main functions

run_all.sh : run everything in one shot
grid_search.sh : perform grid search and bagging (called by run_all.sh)
generate_vw_file.py: generate VW format training and testing data (called by run_all.sh)
generate_bagging_submission.py: generate final bagging submission (called by run_all.sh)

Helper functions

generate_submission.py: convert VW format prediction to Kaggle submission
generate_weighted_sample.py: convert training data to importance weighted one (used in grid search for the best sample weights)
generate_bootstrap.py: generate bootstrap samples (used in bagging)
APatK.py: compute AP@k (used in grid search)
ngram.py: construct n-gram

Requirement

Vowpal Wabbit: I used the latest version of VW for all the traininng.
gensim: I used gensim for extracting tf-idf features.

Instruction

download data from the competition website and put all the data into ./Data dir
put all the code into ./Python dir:
run bash ./Python/run_all.sh to create csv submission to Kaggle.

Discussion

It seems promissing to train seperate model for each category as discussed here.
Semi-supervised learning (SSL) is shown to be useful for the winning team. The idea of SSL is also exploited in another competition: Kaggles' Greek Media Monitoring Multilabel Classification (WISE 2014) as shown here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 29

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗