All Projects → pangeo-forge → pangeo-forge-recipes

pangeo-forge / pangeo-forge-recipes

Licence: Apache-2.0 License
Python library for building Pangeo Forge recipes.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pangeo-forge-recipes

Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (+179.69%)
Mutual labels:  etl, data-engineering
xarray-beam
Distributed Xarray with Apache Beam
Stars: ✭ 83 (+29.69%)
Mutual labels:  xarray, zarr
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-68.75%)
Mutual labels:  etl, data-engineering
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+3626.56%)
Mutual labels:  etl, data-engineering
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-26.56%)
Mutual labels:  etl, data-engineering
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+96.88%)
Mutual labels:  etl, data-engineering
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+335.94%)
Mutual labels:  etl, data-engineering
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+434.38%)
Mutual labels:  etl, data-engineering
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-10.94%)
Mutual labels:  etl, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-17.19%)
Mutual labels:  etl, data-engineering
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+23.44%)
Mutual labels:  etl, data-engineering
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+125%)
Mutual labels:  etl, data-engineering
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+23.44%)
Mutual labels:  etl, data-engineering
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+7585.94%)
Mutual labels:  etl, data-engineering
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+889.06%)
Mutual labels:  etl, data-engineering
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (-42.19%)
Mutual labels:  etl, data-engineering
etl manager
A python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-78.12%)
Mutual labels:  etl, data-engineering
Benthos
Fancy stream processing made operationally mundane
Stars: ✭ 3,705 (+5689.06%)
Mutual labels:  etl, data-engineering
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+20.31%)
Mutual labels:  etl, data-engineering
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+856.25%)
Mutual labels:  etl, data-engineering

pangeo-forge-recipes

PyPI version CI Codecov Documentation Status Code style: black NSF Award 2026932

pangeo-forge is an open-source tool designed to aid the extraction, transformation, and loading of datasets. The goal of pangeo-forge is to make it easy to extract datasets from traditional data repositories and deposit them into cloud object storage in analysis-ready, cloud-optimized format.

pangeo-forge is inspired by conda-forge, a community-led collection of recipes for building Conda packages. We hope that pangeo-forge can play the same role for datasets.

Documentation

More can be learned about pangeo-forge, its progress, and related subprojects in its official documentation.

Contributing

pangeo-forge is still early in development - there are several ways to contribute:

  1. Create a recipe for a dataset you are interested in
  2. Open an issue or pull request here or in any of the related subprojects (pangeo-smithy, staged-recipes)
  3. Check out the project roadmap

Get in touch

Discussions on pangeo-forge are generally hosted biweekly on Mondays at 7pm UTC via Whereby. More details on the scheduling of these meetings can be found here.

License

This project is licensed under the Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].