MarsMars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+277.12%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-79.41%)
PantheraData-frames & arrays on Clojure
Stars: ✭ 168 (-72.55%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-96.73%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-93.79%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-89.05%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+289.71%)
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+58.5%)
saddleSADDLE: Scala Data Library
Stars: ✭ 23 (-96.24%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (-93.46%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-93.63%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-61.6%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-97.71%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-92.81%)
Baby Names AnalysisData ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Stars: ✭ 557 (-8.99%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-96.73%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+5.72%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-96.08%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (-85.29%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-97.71%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-39.22%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+505.39%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (-24.84%)
DatscanDatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-97.88%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-89.38%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-95.42%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-41.01%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (-44.12%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-88.07%)
StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-89.54%)
Pyetlpython ETL framework
Stars: ✭ 33 (-94.61%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+19.93%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-87.09%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-87.09%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-82.52%)
Openkettlewebui一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 125 (-79.58%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (-81.21%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+3.43%)
Hale(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (-86.27%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (-79.58%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (-76.47%)
EtlboxA lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (-66.83%)
BenderBender - Serverless ETL Framework
Stars: ✭ 171 (-72.06%)
Example Airflow DagsExample DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (-60.29%)
EngeznyEngezny is a python package that quickly generates all possible charts from your dataframe and saves them for you, and engezny is only supporting now uni-parameter visualization using the pie, bar and barh visualizations.
Stars: ✭ 25 (-95.92%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+703.76%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-94.77%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (-54.41%)
datascienvdatascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (-91.34%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-87.42%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-97.22%)