All Projects → intake → intake-esm

intake / intake-esm

Licence: Apache-2.0 license
An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to intake-esm

cmip6 preprocessing
Analysis ready CMIP6 data in python the easy way with pangeo tools.
Stars: ✭ 126 (+61.54%)
Mutual labels:  cmip6, pangeo
oai
OAI-PMH R client
Stars: ✭ 13 (-83.33%)
Mutual labels:  data-access
metamapper
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Stars: ✭ 60 (-23.08%)
Mutual labels:  data-catalog
pydov
Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)
Stars: ✭ 29 (-62.82%)
Mutual labels:  data-access
Datahub
The Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+5325.64%)
Mutual labels:  data-catalog
inmetr
DEPRECATED A R-package to Import Historical Data from Brazilian Meteorological Stations
Stars: ✭ 18 (-76.92%)
Mutual labels:  data-access
getCRUCLdata
CRU CL v. 2.0 Climatology Client for R
Stars: ✭ 17 (-78.21%)
Mutual labels:  data-access
herd-mdl
Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.
Stars: ✭ 11 (-85.9%)
Mutual labels:  data-catalog
cmor
Climate Model Output Rewriter
Stars: ✭ 42 (-46.15%)
Mutual labels:  cmip6
ILAMB
Python software used in the International Land Model Benchmarking (ILAMB) project
Stars: ✭ 28 (-64.1%)
Mutual labels:  earth-system-model
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+43.59%)
Mutual labels:  data-catalog
Amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Stars: ✭ 2,901 (+3619.23%)
Mutual labels:  data-catalog
climate-data-science
Climate Data Science and Earth Observation with Python.
Stars: ✭ 103 (+32.05%)
Mutual labels:  pangeo
whale
🐳 The stupidly simple CLI workspace for your data warehouse.
Stars: ✭ 696 (+792.31%)
Mutual labels:  data-catalog
Geoweaver
a web system to allow users to automatically record history and manage complicated scientific workflows in web browsers involving the online spatial data facilities, high-performance computation platforms, and open-source libraries.
Stars: ✭ 32 (-58.97%)
Mutual labels:  pangeo
tutorials
A place to find tutorials on how to use PO.DAAC tools and services
Stars: ✭ 21 (-73.08%)
Mutual labels:  data-access
psyplot
Python package for interactive data visualization
Stars: ✭ 64 (-17.95%)
Mutual labels:  earth-system-model
ATLAS
Datasets, code and virtual workspace for the Climate Change ATLAS
Stars: ✭ 56 (-28.21%)
Mutual labels:  cmip6
SqExpress
SqExpress is a sql query builder which allows creating SQL expressions directly in C# code with strong typing and intellisense.
Stars: ✭ 80 (+2.56%)
Mutual labels:  data-access
SqlFun
Idiomatic data access for F#
Stars: ✭ 33 (-57.69%)
Mutual labels:  data-access

Intake-esm

Badges

CI GitHub Workflow Status Code Coverage Status
Docs Documentation Status
Package Conda PyPI
License License
Citation Zenodo

Motivation

Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc...). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it.

Finding, investigating, loading these assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm aims to address these issues by providing necessary functionality for searching, discovering, data access/loading.

Overview

intake-esm is a data cataloging utility built on top of intake, pandas, and xarray, and it's pretty awesome!

  • Opening an ESM catalog definition file: An ESM (Earth System Model) catalog file is a JSON file that conforms to the ESM Collection Specification. When provided a link/path to an esm catalog file, intake-esm establishes a link to a database (CSV file) that contains data assets locations and associated metadata (i.e., which experiment, model, the come from). The catalog JSON file can be stored on a local filesystem or can be hosted on a remote server.

    In [1]: import intake
    
    In [2]: import intake_esm
    
    In [3]: cat_url = intake_esm.tutorial.get_url("google_cmip6")
    
    In [4]: cat = intake.open_esm_datastore(cat_url)
    
    In [5]: cat
    Out[5]: <GOOGLE-CMIP6 catalog with 4 dataset(s) from 261 asset(s>
  • Search and Discovery: intake-esm provides functionality to execute queries against the catalog:

    In [5]: cat_subset = cat.search(
       ...:     experiment_id=["historical", "ssp585"],
       ...:     table_id="Oyr",
       ...:     variable_id="o2",
       ...:     grid_label="gn",
       ...: )
    
    In [6]: cat_subset
    Out[6]: <GOOGLE-CMIP6 catalog with 4 dataset(s) from 261 asset(s)>
  • Access: when the user is satisfied with the results of their query, they can ask intake-esm to load data assets (netCDF/HDF files and/or Zarr stores) into xarray datasets:

      In [7]: dset_dict = cat_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})
    
      --> The keys in the returned dictionary of datasets are constructed as follows:
              'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
      |███████████████████████████████████████████████████████████████| 100.00% [2/2 00:18<00:00]

See documentation for more information.

Installation

Intake-esm can be installed from PyPI with pip:

python -m pip install intake-esm

It is also available from conda-forge for conda installations:

conda install -c conda-forge intake-esm
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].