Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → pangeo-data → Weatherbench

pangeo-data / Weatherbench

Licence: mit

A benchmark dataset for data-driven weather forecasting

Labels

jupyter-notebook deep-learning dataset benchmark

Projects that are alternatives of or similar to Weatherbench

Caffenet Benchmark

Evaluation of the CNN design choices performance on ImageNet-2012.

Stars: ✭ 700 (+208.37%)

Mutual labels: jupyter-notebook, dataset, benchmark

Medmnist

[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis

Stars: ✭ 338 (+48.9%)

Mutual labels: jupyter-notebook, dataset, benchmark

Benchmarks

Comparison tools

Stars: ✭ 139 (-38.77%)

Mutual labels: jupyter-notebook, benchmark

Lacmus

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

Stars: ✭ 142 (-37.44%)

Mutual labels: jupyter-notebook, dataset

Cifar 10.1

Release of CIFAR-10.1, a new test set for CIFAR-10.

Stars: ✭ 166 (-26.87%)

Mutual labels: jupyter-notebook, dataset

Coronawatchnl

Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.

Stars: ✭ 135 (-40.53%)

Mutual labels: jupyter-notebook, dataset

Datasets

🎁 3,000,000+ Unsplash images made available for research and machine learning

Stars: ✭ 1,805 (+695.15%)

Mutual labels: jupyter-notebook, dataset

Motion Sense

MotionSense Dataset for Human Activity and Attribute Recognition ( time-series data generated by smartphone's sensors: accelerometer and gyroscope)

Stars: ✭ 159 (-29.96%)

Mutual labels: jupyter-notebook, dataset

Know Your Intent

State of the Art results in Intent Classification using Sematic Hashing for three datasets: AskUbuntu, Chatbot and WebApplication.

Stars: ✭ 116 (-48.9%)

Mutual labels: jupyter-notebook, dataset

Data Science Resources

👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋

Stars: ✭ 171 (-24.67%)

Mutual labels: jupyter-notebook, dataset

Shape Detection

🟣 Object detection of abstract shapes with neural networks

Stars: ✭ 170 (-25.11%)

Mutual labels: jupyter-notebook, dataset

Hand pose action

Dataset and code for the paper "First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations", CVPR 2018.

Stars: ✭ 173 (-23.79%)

Mutual labels: dataset, benchmark

Sensaturban

🔥Urban-scale point cloud dataset (CVPR 2021)

Stars: ✭ 135 (-40.53%)

Mutual labels: dataset, benchmark

Hpatches Benchmark

Python & Matlab code for local feature descriptor evaluation with the HPatches dataset.

Stars: ✭ 129 (-43.17%)

Mutual labels: dataset, benchmark

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (-39.65%)

Mutual labels: jupyter-notebook, dataset

Contactpose

Large dataset of hand-object contact, hand- and object-pose, and 2.9 M RGB-D grasp images.

Stars: ✭ 129 (-43.17%)

Mutual labels: jupyter-notebook, dataset

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+968.28%)

Mutual labels: dataset, benchmark

Trump Lies

Tutorial: Web scraping in Python with Beautiful Soup

Stars: ✭ 201 (-11.45%)

Mutual labels: jupyter-notebook, dataset

Protest Detection Violence Estimation

Implementation of the model used in the paper Protest Activity Detection and Perceived Violence Estimation from Social Media Images (ACM Multimedia 2017)

Stars: ✭ 114 (-49.78%)

Mutual labels: jupyter-notebook, dataset

Pglib Opf

Benchmarks for the Optimal Power Flow Problem

Stars: ✭ 114 (-49.78%)

Mutual labels: dataset, benchmark

View All Similar Projects ➔

WeatherBench: A benchmark dataset for data-driven weather forecasting

If you are using this dataset please cite

Stephan Rasp, Peter D. Dueben, Sebastian Scher, Jonathan A. Weyn, Soukayna Mouatadid, and Nils Thuerey, 2020. WeatherBench: A benchmark dataset for data-driven weather forecasting. arXiv: https://arxiv.org/abs/2002.00469

This repository contains all the code for downloding and processing the data as well as code for the baseline models in the paper.

Note! The data has been changed from the original release. Here is a list of changes:

New vertical levels. Used to be [1, 10, 100, 200, 300, 400, 500, 600, 700, 850, 1000], now is [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000]. This is to be compatible with CMIP output. The new levels include all of the old ones with the exception of [1, 10].
CMIP data. Regridded CMIP data of some variables was added. This is the historical simulation of the MPI-ESM-HR model.

If you have any questions about this dataset, please use the Github Issue feature on this page!

Leaderboard

Model	Z500 RMSE (3 / 5 days) [m²/s²]	T850 RMSE (3 / 5 days) [K]	Notes	Reference
Operational IFS	154 / 334	1.36 / 2.03	ECWMF physical model (10 km)	Rasp et al. 2020
Rasp and Thuerey 2020 (direct/continuous)	268 / 499	1.65 / 2.41	Resnet with CMIP pretraining (5.625 deg)	Rasp and Thuerey 2020
IFS T63	268 / 463	1.85 / 2.52	Lower resolution physical model (approx. 1.9 deg)	Rasp et al. 2020
Weyn et al. 2020 (iterative)	373 / 611	1.98 / 2.87	UNet with cube-sphere mapping (2 deg)	Weyn et al. 2020
IFS T42	489 / 743	3.09 / 3.83	Lower resolution physical model (approx. 2.8 deg)	Rasp et al. 2020
Weekly climatology	816	3.50	Climatology for each calendar week	Rasp et al. 2020
Persistence	936 / 1033	4.23 / 4.56		Rasp et al. 2020
Climatology	1075	5.51		Rasp et al. 2020

Quick start

You can follow the quickstart guide in this notebook or lauch it directly from Binder.

Download the data

The data is hosted here with the following directory structure

.
|-- 1.40625deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- old
|   |   `-- temperature
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- 2.8125deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- 5.625deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- geopotential_500
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- temperature_850
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- baselines
|   `-- saved_models
|-- CMIP
|   `-- MPI-ESM
|       |-- 2.8125deg
|       |   |-- geopotential
|       |   |-- specific_humidity
|       |   |-- temperature
|       |   |-- u_component_of_wind
|       |   `-- v_component_of_wind
|       `-- 5.625deg
|           |-- geopotential
|           |-- specific_humidity
|           |-- temperature
|           |-- u_component_of_wind
|           `-- v_component_of_wind
|-- IFS_T42
|   `-- raw
|-- IFS_T63
|   `-- raw
`-- tigge
    |-- 1.40625deg
    |   |-- geopotential_500
    |   `-- temperature_850
    |-- 2.8125deg
    |   |-- geopotential_500
    |   `-- temperature_850
    `-- 5.625deg
        |-- 2m_temperature
        |-- geopotential_500
        |-- temperature_850
        `-- total_precipitation

To start out download either the entire 5.625 degree data (175G) using

wget "https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg&files=all_5.625deg.zip" -O all_5.625deg.zip

or simply the single level (500 hPa) geopotential data using

wget "https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg%2Fgeopotential_500&files=geopotential_500_5.625deg.zip" -O geopotential_500_5.625deg.zip

and then unzip the files using unzip <file>.zip. You can also use ftp or rsync to download the data. For instructions, follow the download link.

Baselines and evaluation

IMPORTANT: The format of the predictions file is a NetCDF dataset with dimensions [init_time, lead_time, lat, lon]. Consult the notebooks for examples. You are stongly encouraged to format your predictions in the same way and then use the same evaluation functions to ensure consistent evaluation.

Baselines

The baselines are created using Jupyter notebooks in notebooks/. In all notebooks, the forecasts are saved as a NetCDF file in the predictions directory of the dataset.

CNN baselines

An example of how to load the data and train a CNN using Keras is given in notebooks/3-cnn-example.ipynb. In addition a command line script for training CNNs is provided in src/train_nn.py. For the baseline CNNs in the paper the config files are given in src/nn_configs/. To reproduce the results in the paper run e.g. python -m src.train_nn -c src/nn_configs/fccnn_3d.yml.

Evaluation

Evaluation and comparison of the different baselines in done in notebooks/4-evaluation.ipynb. The scoring is done using the functions in src/score.py. The RMSE values for the baseline models are also saved in the predictions directory of the dataset. This is useful for plotting your own models alongside the baselines.

Data processing

The dataset already contains the most important processed data. If you would like to download a different variable , regrid to a different resolution or extract single levels from the 3D files, here is how to do that!

Downloading and processing the raw data from the ERA5 archive

The workflow to get to the processed data that ended up in the data repository above is:

Download monthly files from the ERA5 archive (src/download.py)
Regrid the raw data to the required resolutions (src/regrid.py)

The raw data is from the ERA5 reanalysis archive. Information on how to download the data can be found here and here.

Because downloading the data can take a long time (several weeks), the workflow is encoded using Snakemake. See Snakefile and the configuration files for each variable in scripts/config_ {variable}.yml. These files can be modified if additional variables are required. To execute Snakemake for a particular variable type : snakemake -p -j 4 all --configfile scripts/config_toa_incident_solar_radiation.yml.

In addition to the time-dependent fields, the constant fields were downloaded and processed using scripts /download_and_regrid_constants.sh

Downloading the TIGGE IFS baseline

To obtain the operational IFS baseline, we use the TIGGE Archive. Downloading the data for Z500 and T850 is done in scripts/download_tigge.py; regridding is done in scripts /convert_and_regrid_tigge.sh.

Regridding the T21 IFS baseline

The T21 baseline was created by Peter Dueben. The raw output can be found in the dataset. To regrid the data scripts /convert_and_regrid_IFS_TXX.sh was used.

Downloading and regridding CMIP historical climate model data.

To download historical climate model data use the Snakemake file in snakemake_configs_CMIP. Here, we downloaded data from the MIP-ESM-HR model. To download other models, search for the download links on the CMIP website and modify the scripts accordingly.

Extracting single levels from 3D files

If you would like to extract a single level from 3D data, e.g. 850 hPa temperature, you can use src /extract_level.py. This could be useful to reduce the amount of data that needs to be loaded into RAM. An example usage would be: python extract_level.py --input_fns DATADIR/5.625deg/temperature/*.nc --output_dir OUTDIR --level 850

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 227

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗