All Projects → buds-lab → ashrae-great-energy-predictor-3-solution-analysis

buds-lab / ashrae-great-energy-predictor-3-solution-analysis

Licence: MIT License
Analysis of top give winning solutions of the ASHRAE Great Energy Predictor III competition

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ashrae-great-energy-predictor-3-solution-analysis

awesome-energy-forecasting
list of papers, code, and other resources
Stars: ✭ 31 (-29.55%)
Mutual labels:  energy-data, building-energy, building-energy-forecasting
scout
A tool for estimating the future energy use, carbon emissions, and capital and operating cost impacts of energy efficiency and demand flexibility technologies in the U.S. residential and commercial building sectors.
Stars: ✭ 34 (-22.73%)
Mutual labels:  energy-data, energy-consumption, building-energy
Jigsaw-Unintended-Bias-in-Toxicity-Classification
7th Place Solution for Jigsaw Unintended Bias in Toxicity Classification on Kaggle
Stars: ✭ 16 (-63.64%)
Mutual labels:  kaggle, kaggle-competition
histopathologic cancer detector
CNN histopathologic tumor identifier.
Stars: ✭ 26 (-40.91%)
Mutual labels:  kaggle, kaggle-competition
Kaggle The Hunt for Prohibited Content
4th Place Solution for The Hunt for Prohibited Content Competition on Kaggle (http://www.kaggle.com/c/avito-prohibited-content)
Stars: ✭ 29 (-34.09%)
Mutual labels:  kaggle, kaggle-competition
building-data-genome-project-2
Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition
Stars: ✭ 112 (+154.55%)
Mutual labels:  energy-consumption, building-energy
digit recognizer
CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).
Stars: ✭ 27 (-38.64%)
Mutual labels:  kaggle, kaggle-competition
Open Solution Toxic Comments
Open solution to the Toxic Comment Classification Challenge
Stars: ✭ 154 (+250%)
Mutual labels:  kaggle, kaggle-competition
open-solution-ship-detection
Open solution to the Airbus Ship Detection Challenge
Stars: ✭ 54 (+22.73%)
Mutual labels:  kaggle, kaggle-competition
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (-25%)
Mutual labels:  kaggle, kaggle-competition
Hello-Kaggle-Guide-KOR
Kaggle을 처음 접하는 사람들을 위한 문서
Stars: ✭ 140 (+218.18%)
Mutual labels:  kaggle, kaggle-competition
StoreItemDemand
(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.
Stars: ✭ 24 (-45.45%)
Mutual labels:  kaggle, kaggle-competition
Data-Science-Hackathon-And-Competition
Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI
Stars: ✭ 165 (+275%)
Mutual labels:  kaggle, kaggle-competition
argus-tgs-salt
Kaggle | 14th place solution for TGS Salt Identification Challenge
Stars: ✭ 73 (+65.91%)
Mutual labels:  kaggle, kaggle-competition
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+256.82%)
Mutual labels:  kaggle, kaggle-competition
Apartment-Interest-Prediction
Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.
Stars: ✭ 17 (-61.36%)
Mutual labels:  kaggle, kaggle-competition
Segmentation
Tensorflow implementation : U-net and FCN with global convolution
Stars: ✭ 101 (+129.55%)
Mutual labels:  kaggle, kaggle-competition
Kaggle Airbnb Recruiting New User Bookings
2nd Place Solution in Kaggle Airbnb New User Bookings competition
Stars: ✭ 118 (+168.18%)
Mutual labels:  kaggle, kaggle-competition
fer
Facial Expression Recognition
Stars: ✭ 32 (-27.27%)
Mutual labels:  kaggle, kaggle-competition
open-energy-view
View resource consumption trends, history, analysis, and insights.
Stars: ✭ 32 (-27.27%)
Mutual labels:  energy-data, energy-consumption

ASHRAE Great Energy Predictor III (GEPIII) - Top 5 Winning Solutions Explained

This repository contains the code and documentation of top-5 winning solutions from the ASHRAE - Great Energy Predictor III cometition that was held in late 2019 on the Kaggle platform. It also contains comparative analysis of these solutions with respect to their characteristics such as workflow, computation time, and score distributation with respect to meter type, site, and primary space usage, etc.

An video overview of the competition can be found from the ASHRAE 2020 Online Conference by Clayton Miller from the BUDS Lab at the National University of Singapore

Overview Publication

A full overview of the GEPIII competition can be found in a Science and Technology for the Built Environment Journal - Preprint found on arXiv

To cite this competition or analysis:

Clayton Miller, Pandarasamy Arjunan, Anjukan Kathirgamanathan, Chun Fu, Jonathan Roth, June Young Park, Chris Balbach, Krishnan Gowri, Zoltan Nagy, Anthony D. Fontanini & Jeff Haberl (2020) The ASHRAE Great Energy Predictor III competition: Overview and results, Science and Technology for the Built Environment, DOI: 10.1080/23744731.2020.1795514

Overview Abstract

In late 2019, ASHRAE hosted the Great Energy Predictor III (GEPIII) machine learning competition on the Kaggle platform. This launch marked the third energy prediction competition from ASHRAE and the first since the mid-1990s. In this updated version, the competitors were provided with over 20 million points of training data from 2,380 energy meters collected for 1,448 buildings from 16 sources. This competition’s overall objective was to find the most accurate modeling solutions for the prediction of over 41 million private and public test data points. The competition had 4,370 participants, split across 3,614 teams from 94 countries who submitted 39,403 predictions. In addition to the top five winning workflows, the competitors publicly shared 415 reproducible online machine learning workflow examples (notebooks), including over 40 additional, full solutions. This paper gives a high-level overview of the competition preparation and dataset, competitors and their discussions, machine learning workflows and models generated, winners and their submissions, discussion of lessons learned, and competition outputs and next steps. The most popular and accurate machine learning workflows used large ensembles of mostly gradient boosting tree models, such as LightGBM. Similar to the first predictor competition, preprocessing of the data sets emerged as a key differentiator.

The data set from the competition is now opened as the Building Data Genome 2 project that is outlined in a publication in Nature Scientific Data -- preprint on arxiv.

Key takeaways:

  • Large ensembles of models are essential in the application of ML on building energy prediction at scale
  • Gradient boosting tree models were the most common in these ensembles, especially LightGBM.
  • Pre-processing of the training data, including outlier removal, was the key differentiator in the top winners and was usually not an automated process.
  • Many of the winners had engineering backgrounds and even previous experience in meter prediction

Special credit goes to Dr. Samy Arjunan for organizing and replicating the solutions in this repository

Detailed Reproduction of Solutions Overview

Instructions to fully reproduce each solution are found in the wiki for this repository and other details found below.

The raw data data for the top 5 winning solutions - code and docs (original submission by the winners)

Explanatory Overview Videos from the Winners

The top five winning solutions can be understood through a series of explainer videos hosted here, including extended presentations at the ASHRAE 2020 Online Conferece in June 2020. Potential users of these solutions should note that each winner gave advice on the solution complexity vs. accuracy. These videos are also listed below individually for each solution.

Solutions Overview Details

First Ranked Solution

Second Ranked Solution

Third Ranked Solution

Fourth Ranked Solution

Fifth Ranked Solution

Solutions High Level Comparisons

Final Rank Team Final Private Leaderboard Score Preprocessing Strategy Features Strategy Overview Modeling Strategy Overview Post-Processing strategy
1 Matthew Motoki and Isamu Yamashita (Isamu and Matt) 1.231 Removed anomalies in meter data and imputed missing values in weather data 28 features, Extensively focused on feature engineering and selected LightGBM, CatBoost, and MLP models trained on different subsets of the training and public data Ensembled the model predictions using weighted generalized mean.
2 Rohan Rao, Anton Isakin, Yangguang Zang, and Oleg Knaub (cHa0s) 1.232 Visual analytics and manual inspection Raw energy meter data, temporal features, building metadata, simple statistical features of weather data. XGBoost, LightGBM, Catboost, and Feed-forward Neural Network models trained on different subset of the training set Weighted mean. (different weights were used for different meter types)
3 Xavier Capdepon (eagle4) 1.234 Eliminated 0s in the same period in the same site 21 features including raw data, weather, and various meta data Keras CNN, LightGBM and Catboost Weighted average
4 Jun Yang (不用leakage上分太难了) 1.235 Deleted outliers during the training phase 23 features including raw data, aggregate, weather lag features, and target encoding. Features are selected using sub-training sets. XGBoost (2-fold, 5-fold) and Light GBM (3-fold) Ensembled three models. Weights were determined using the leaked data.
5 Tatsuya Sano, Minoru Tomioka, and Yuta Kobayashi (mma) 1.237 Dropped long streaks of constant values and zero target values. Target encoding using percentile and proportion and used the weather data temporal features LightGBM in two steps -- identify model parameters on a subset and then train on the whole set for each building. Weighted average.

Execution Time Comparison

Solution Preprocessing Feature engineering Training Prediction Ensembling Total (minutes)
Rank 1 9 128 7440 708 35 8320
Rank 2 36 24 1850 94 7 2011
Rank 3 178 12 501 100 14 805
Rank 4 40 7 85 46 6 184
Rank 5 3 9 13 20 16 61

Note: all solutions were reproduced on AWS EC2 (g4dn.4xlarge) using Deep Learning AMI.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].