Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+3626.56%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+434.38%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-26.56%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+96.88%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+125%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-78.12%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+23.44%)
xarray-beamDistributed Xarray with Apache Beam
Stars: ✭ 83 (+29.69%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-17.19%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+335.94%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-68.75%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+5689.06%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-31.25%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+20.31%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+889.06%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+23.44%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+1.56%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+7585.94%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-10.94%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+856.25%)
climate systemNotes and practicals for my "Physics of the Climate System" lecture
Stars: ✭ 13 (-79.69%)
hypothesis-gufuncExtension to hypothesis for testing numpy general universal functions
Stars: ✭ 32 (-50%)
maxwell-sinkconsume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (-75%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+79.69%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-60.94%)
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (-42.19%)
gcpyPython toolkit for GEOS-Chem.
Stars: ✭ 34 (-46.87%)
xpublishPublish Xarray Datasets via a REST API.
Stars: ✭ 86 (+34.38%)
openrefine-dockerOpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.
Stars: ✭ 19 (-70.31%)
resteePython package to call processed EE objects via the REST API to local data
Stars: ✭ 26 (-59.37%)
persistityA persistence framework for game developers
Stars: ✭ 34 (-46.87%)
kozaData transformation framework for LinkML data models
Stars: ✭ 21 (-67.19%)
dswarman open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (-10.94%)
openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+4.69%)
es2postgresElasticSearch to PostgreSQL loader
Stars: ✭ 18 (-71.87%)
clisopsClimate Simulation Operations
Stars: ✭ 17 (-73.44%)
oesophagusEnterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (-81.25%)
dflibIn-memory Java DataFrame library
Stars: ✭ 50 (-21.87%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+860.94%)
wxeeA Python interface between Earth Engine and xarray for processing time series data
Stars: ✭ 113 (+76.56%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+1521.88%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+23.44%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (-56.25%)
aospyPython package for automated analysis and management of gridded climate data
Stars: ✭ 80 (+25%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-75%)
PDAP-ScrapersCode relating to scraping public police data.
Stars: ✭ 72 (+12.5%)
esmlabEarth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
Stars: ✭ 23 (-64.06%)
floxFast & furious GroupBy operations for dask.array
Stars: ✭ 42 (-34.37%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-26.56%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+71.88%)