data-doctors / kaggle-house-prices-advanced-regression-techniques

Licence: other

Repository for source code of kaggle competition: House Prices: Advanced Regression Techniques

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kaggle-house-prices-advanced-regression-techniques

StoreItemDemand

(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.

Stars: ✭ 24 (-35.14%)

Mutual labels: regression, kaggle-competition

Recheck Web

recheck for web apps – change comparison tool with local Golden Masters, Git-like ignore syntax and "Unbreakable Selenium" tests.

Stars: ✭ 224 (+505.41%)

Mutual labels: regression

Math Php

Powerful modern math library for PHP: Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra

Stars: ✭ 2,009 (+5329.73%)

Mutual labels: regression

Dynaml

Scala Library/REPL for Machine Learning Research

Stars: ✭ 195 (+427.03%)

Mutual labels: regression

Machine learning

Estudo e implementação dos principais algoritmos de Machine Learning em Jupyter Notebooks.

Stars: ✭ 161 (+335.14%)

Mutual labels: regression

Lightautoml

LAMA - automatic model creation framework

Stars: ✭ 196 (+429.73%)

Mutual labels: regression

Applied Ml

Code and Resources for "Applied Machine Learning"

Stars: ✭ 156 (+321.62%)

Mutual labels: regression

Simple Statistics

simple statistics for node & browser javascript

Stars: ✭ 2,679 (+7140.54%)

Mutual labels: regression

Deepfashion

Apparel detection using deep learning

Stars: ✭ 223 (+502.7%)

Mutual labels: regression

Peroxide

Rust numeric library with R, MATLAB & Python syntax

Stars: ✭ 191 (+416.22%)

Mutual labels: regression

Uci Ml Api

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

Stars: ✭ 190 (+413.51%)

Mutual labels: regression

Data Science Toolkit

Collection of stats, modeling, and data science tools in Python and R.

Stars: ✭ 169 (+356.76%)

Mutual labels: regression

Morpheus Core

The foundational library of the Morpheus data science framework

Stars: ✭ 203 (+448.65%)

Mutual labels: regression

Remixautoml

R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.

Stars: ✭ 159 (+329.73%)

Mutual labels: regression

Statistical Learning

Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course

Stars: ✭ 223 (+502.7%)

Mutual labels: regression

Java Deep Learning Cookbook

Code for Java Deep Learning Cookbook

Stars: ✭ 156 (+321.62%)

Mutual labels: regression

Correlation

🔗 Methods for Correlation Analysis

Stars: ✭ 192 (+418.92%)

Mutual labels: regression

Image To 3d Bbox

Build a CNN network to predict 3D bounding box of car from 2D image.

Stars: ✭ 200 (+440.54%)

Mutual labels: regression

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+8418.92%)

Mutual labels: regression

Margins

An R Port of Stata's 'margins' Command

Stars: ✭ 225 (+508.11%)

Mutual labels: regression

View All Similar Projects ➔

kaggle-house-prices-advanced-regression-techniques

Repository for source code of kaggle competition: House Prices: Advanced Regression Techniques

Overview

There are several factors that influence the price a buyer is willing to pay for a house. Some are apparent and obvious and some are not. Nevertheless, a rational approach facilitated by machine learning can be very useful in predicting the house price. A large data set with 79 different features (like living area, number of rooms, location etc) along with their prices are provided for residential homes in Ames, Iowa. The challenge is to learn a relationship between the important features and the price and use it to predict the prices of a new set of houses.

Getting started

You can make a clone of the repository from Github on your local machine using the following command (prerequisite: you need git installed on your system):

$ git clone https://github.com/data-doctors/kaggle-house-prices-advanced-regression-techniques

Data

data folder contains original data

Repository structure

01-eda: Exploratory data analysis

Plot distribution of the numerical features examine the skewness Plot correlation matrix between the features

02-cleaning: Cleaning and preprocessing of data

remove skewenes of target features handle missing values in categorical features handle missing values in numerical features feature selection

03-feature_engineering: Engineering new features

Some examples:

A total area was created as a new feature by adding the basement area and living area. The number of bathrooms were added together to create a new feature. For numerical features with significant skewness, logarithms were taken to create new features. Some features were dropped that did not contribute significantly in predicting the SalePrice.

04-modelling: Fitting different models on the cleaned data and predict the house price on test set

Training

Training all models (bulk training)

The hyperparameters of all the single models were optimized by maximizing the cross validation score using the training set In order to train all the models (kept in models/tuning folder) in series the following shell script can be executed:

''' $ ./run_all.sh RMSE-xgb-CV(7)=0.15017262592+-0.0403780999289 RMSE-lgb-CV(7)=0.230416102431+-0.0982360336472 RMSE-rf-CV(7)=0.178752572944+-0.0495588133233 RMSE-et-CV(7)=0.177138296419+-0.0523244324721 RMSE-lasso-CV(7)=0.167043833535+-0.0590946122368 RMSE-ridge-CV(7)=0.16305872566+-0.0592719750453 RMSE-elasticnet-CV(7)=0.166431639245+-0.0591651827043 '''

Then the optimized parameters were plugged in the single models that are kept in models/single folder.

Saving submision of single model

$ python models/single/model_xgb.py save

Scores

Best single models:

Model	CV	LB
DecisionTreeRegressor	0.19013+-0.01304	0.18804
RandomForestRegressor	0.14744+-0.00871	0.14623
ExtraTreesRegressor	0.13888+-0.01208	0.15194
XGBoost	0.12137+-0.01128	0.12317
LightGBM	0.20030+-0.01182	0.21416
Lasso	0.11525+-0.01191	0.12091
Ridge	0.11748+-0.01170	0.12263
ElasticNet	0.11364+-0.01677	0.11976
SVM	0.19752+-0.01386	0.20416

Ensembling

We used 10 single models to individually predict the results. It is well established that a stacking/blending of the predictions by single models can improve the final results. Also it is ideal to select a few best performing but uncorrelated models for this purpose instead of considering all of them.

Inside 04-modelling/ensembling folder the correlations and performances of the single models were explored using the corr-coeff notebook.

5 best performing and least correlated models were selected and stacked together (using 04-modelling/ensembling/stacking notebook) to make the final prediction.

Acknowledgments

The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.

Team

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

data-doctors / kaggle-house-prices-advanced-regression-techniques

Programming Languages

Labels

Projects that are alternatives of or similar to kaggle-house-prices-advanced-regression-techniques

kaggle-house-prices-advanced-regression-techniques

Overview

Getting started

Data

Repository structure

Training

Training all models (bulk training)

Saving submision of single model

Scores

Ensembling

Acknowledgments

Team