All Projects β†’ davidgasquez β†’ kaggle-airbnb

davidgasquez / kaggle-airbnb

Licence: MIT license
🌍 Where will a new guest book their first travel experience?

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kaggle-airbnb

fast retraining
Show how to perform fast retraining with LightGBM in different business cases
Stars: ✭ 56 (+5.66%)
Mutual labels:  kaggle
kaggle-recruit-restaurant
πŸ† Kaggle 8th place solution
Stars: ✭ 102 (+92.45%)
Mutual labels:  kaggle
awesome-kaggle-kernels
Compilation of good Kaggle Kernels.
Stars: ✭ 51 (-3.77%)
Mutual labels:  kaggle
StoreItemDemand
(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.
Stars: ✭ 24 (-54.72%)
Mutual labels:  kaggle
Kaggle-Avito-NN
The 18th Place Solution to Avito Demand Prediction Challenge
Stars: ✭ 25 (-52.83%)
Mutual labels:  kaggle
AutoX
AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
Stars: ✭ 431 (+713.21%)
Mutual labels:  kaggle
kaggler-tv-schedule
Kaggler TV
Stars: ✭ 54 (+1.89%)
Mutual labels:  kaggle
digit recognizer
CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).
Stars: ✭ 27 (-49.06%)
Mutual labels:  kaggle
dku-kaggle-class
λ‹¨κ΅­λŒ€ SWμ€‘μ‹¬λŒ€ν•™ 2020년도 μ˜€ν”ˆμ†ŒμŠ€SW섀계 - μΊκΈ€λ½€κ°œκΈ° μˆ˜μ—… 일정 및 κ°•μ˜μžλ£Œ
Stars: ✭ 48 (-9.43%)
Mutual labels:  kaggle
Recruit-Restaurant-Visitor-Forecasting
6th place solution for Recruit-Restaurant-Visitor-Forecasting
Stars: ✭ 16 (-69.81%)
Mutual labels:  kaggle
Plant-Disease-Identification-using-CNN
Plant Disease Identification Using Convulutional Neural Network
Stars: ✭ 89 (+67.92%)
Mutual labels:  kaggle
kaggle-champs
Code for the CHAMPS Predicting Molecular Properties Kaggle competition
Stars: ✭ 49 (-7.55%)
Mutual labels:  kaggle
kaggle-dstl-satellite-imagery-feature-detection
6th place solution
Stars: ✭ 16 (-69.81%)
Mutual labels:  kaggle
Data-Science-Hackathon-And-Competition
Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI
Stars: ✭ 165 (+211.32%)
Mutual labels:  kaggle
Kaggle-Quora-Question-Pairs
This is our team's solution report, which achieves top 10% (305/3307) in this competition.
Stars: ✭ 58 (+9.43%)
Mutual labels:  kaggle
Data-Science
Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.
Stars: ✭ 15 (-71.7%)
Mutual labels:  kaggle
Quantitative-Big-Imaging-2018
(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (-5.66%)
Mutual labels:  kaggle
argus-tgs-salt
Kaggle | 14th place solution for TGS Salt Identification Challenge
Stars: ✭ 73 (+37.74%)
Mutual labels:  kaggle
Kaggle-Cdiscount-Image-Classification-Challenge
No description or website provided.
Stars: ✭ 15 (-71.7%)
Mutual labels:  kaggle
kaggle-berlin
Material of the Kaggle Berlin meetup group!
Stars: ✭ 36 (-32.08%)
Mutual labels:  kaggle

Airbnb Kaggle Competition: New User Bookings

Code Health

This repository contains the code developed for the Airbnb's Kaggle competition. It's written in Python, some in the form of Jupyter Notebooks, and other in pure Python 3.

The code produces predictions with scores around 0.88090% in the public leader-board, enough to be in the best 5% participants(0.001% behind the best) and 0.88509% in the private leader-board(0.0018% behind the winner)

The entire run should not take more than 4 hours(thanks to the parallel preprocessing) in a modern/recent computer, though you may run into memory issues with less than 8GB RAM.

Feel free to contribute to the code or open an issue if you see something wrong.

Description

New users on Airbnb can book a place to stay in 34,000+ cities across 190+ countries. By accurately predicting where a new user will book their first travel experience, Airbnb can share more personalized content with their community, decrease the average time to first booking, and better forecast demand.

In this competition, the goal is to predict in which country a new user will make his or her first booking. There are 12 possible outcomes of the destination country and the datasets consist of a list of users with their demographics, web session records, and some summary statistics.

Data

Due to the Competition Rules, the data sets can not be shared. If you want to take a look at the data, head over the competition page and download it.

You need to download train_users_2.csv, test_users.csv and sessions.csv files and unzip them into the 'data' folder.

Note: Since the train users file is the one re-uploaded by the competition administrators, rename train_users_2.csv as train_users.csv.

Main Ideas

  1. The provided datasets have lot of NaNs and some other random values, so, a good preprocessing is the primary key to get a good solution:

    • Replace -unknown- values with NaNs
    • Clean age values
    • Extract day, weekday, month, year from date_account_created and timestamp_first_active
    • Add number of missing values per user
    • General user session information:
      • Number of different values in action, action_type, action_detail and device_type
  2. That kind of classification task works nicely with tree-based methods, I used xgboost library and the Gradient Boosting Classifier that provides along scikit-learn to make the probabilities predictions.

Requirements

To replicate the findings and execute the code in this repository you will need basically the next Python packages:

Resources

  • XGBoost Documentation - A library designed and optimized for boosted (tree) algorithms.
  • Pattern Classification - Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.

License

Copyright Β© 2015 David Gasquez Licensed under the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].