All Projects → ChenglongChen → Kaggle Crowdflower

ChenglongChen / Kaggle Crowdflower

1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
Logos
282 projects
Yacc
648 projects
perl
6916 projects
TeX
3793 projects

Projects that are alternatives of or similar to Kaggle Crowdflower

Kaggle Homedepot
3rd Place Solution for HomeDepot Product Search Results Relevance Competition on Kaggle.
Stars: ✭ 452 (-73.54%)
Mutual labels:  kaggle, natural-language-processing
D2l Vn
Một cuốn sách tương tác về học sâu có mã nguồn, toán và thảo luận. Đề cập đến nhiều framework phổ biến (TensorFlow, Pytorch & MXNet) và được sử dụng tại 175 trường Đại học.
Stars: ✭ 402 (-76.46%)
Mutual labels:  kaggle, natural-language-processing
Deeptoxic
top 1% solution to toxic comment classification challenge on Kaggle.
Stars: ✭ 180 (-89.46%)
Mutual labels:  kaggle, natural-language-processing
D2l En
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Stars: ✭ 11,837 (+593.03%)
Mutual labels:  kaggle, natural-language-processing
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-92.21%)
Mutual labels:  natural-language-processing
Chars2vec
Character-based word embeddings model based on RNN for handling real world texts
Stars: ✭ 130 (-92.39%)
Mutual labels:  natural-language-processing
Pytorch Speech Commands
Speech commands recognition with PyTorch
Stars: ✭ 128 (-92.51%)
Mutual labels:  kaggle
Rasa Chatbot Templates
RASA chatbot use case boilerplate
Stars: ✭ 127 (-92.56%)
Mutual labels:  natural-language-processing
Kaggle Humpback
Code for 3rd place solution in Kaggle Humpback Whale Identification Challenge.
Stars: ✭ 135 (-92.1%)
Mutual labels:  kaggle
Mams For Absa
A Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.
Stars: ✭ 135 (-92.1%)
Mutual labels:  natural-language-processing
Uda
Unsupervised Data Augmentation (UDA)
Stars: ✭ 1,877 (+9.89%)
Mutual labels:  natural-language-processing
Konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Stars: ✭ 130 (-92.39%)
Mutual labels:  natural-language-processing
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (-92.27%)
Mutual labels:  natural-language-processing
Kaggle
Code for Kaggle Competitions
Stars: ✭ 128 (-92.51%)
Mutual labels:  kaggle
Rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+673.95%)
Mutual labels:  natural-language-processing
Medquad
Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
Stars: ✭ 129 (-92.45%)
Mutual labels:  natural-language-processing
Persian Stopwords
Persian (Farsi) Stop Words List
Stars: ✭ 131 (-92.33%)
Mutual labels:  natural-language-processing
Cocoaai
🤖 The Cocoa Artificial Intelligence Lab
Stars: ✭ 134 (-92.15%)
Mutual labels:  natural-language-processing
Tensorflow 1.4 Billion Password Analysis
Deep Learning model to analyze a large corpus of clear text passwords.
Stars: ✭ 1,720 (+0.7%)
Mutual labels:  natural-language-processing
Prenlp
Preprocessing Library for Natural Language Processing
Stars: ✭ 130 (-92.39%)
Mutual labels:  natural-language-processing

Kaggle_CrowdFlower

1st Place Solution for Search Results Relevance Competition on Kaggle

The best single model we have obtained during the competition was an XGBoost model with linear booster of Public LB score 0.69322 and Private LB score 0.70768. Our final winning submission was a median ensemble of 35 best Public LB submissions. This submission scored 0.70807 on Public LB and 0.72189 on Private LB.

What's New

FlowChart

FlowChart

Documentation

See ./Doc/Kaggle_CrowdFlower_ChenglongChen.pdf for documentation.

Instruction

  • download data from the competition website and put all the data into folder ./Data.
  • run python ./Code/Feat/run_all.py to generate features. This will take a few hours.
  • run python ./Code/Model/generate_best_single_model.py to generate best single model submission. In our experience, it only takes a few trials to generate model of best performance or similar performance. See the training log in ./Output/Log/[Pre@solution]_[Feat@svd100_and_bow_Jun27]_[Model@reg_xgb_linear]_hyperopt.log for example.
  • run python ./Code/Model/generate_model_library.py to generate model library. This is quite time consuming. But you don't have to wait for this script to finish: you can run the next step once you have some models trained.
  • run python ./Code/Model/generate_ensemble_submission.py to generate submission via ensemble selection.
  • if you don't want to run the code, just submit the file in ./Output/Subm.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].