waggle-danceHive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Stars: ✭ 194 (+424.32%)
Mutual labels: hive, metastore, hive-metastore
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+16.22%)
Mutual labels: hive, metastore, hive-metastore
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-35.14%)
Mutual labels: hive, etl
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+1562.16%)
Mutual labels: hive, etl
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-45.95%)
Mutual labels: etl, data-engineering
simple-ddl-parserSimple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
Stars: ✭ 76 (+105.41%)
Mutual labels: hive, ddls
apiaryApiary provides modules which can be combined to create a federated cloud data lake
Stars: ✭ 30 (-18.92%)
Mutual labels: hive, hive-metastore
web-click-flow网站点击流离线日志分析
Stars: ✭ 14 (-62.16%)
Mutual labels: hive, etl
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+905.41%)
Mutual labels: hive, etl
Pyetlpython ETL framework
Stars: ✭ 33 (-10.81%)
Mutual labels: hive, etl
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (+94.59%)
Mutual labels: hive, etl
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-43.24%)
Mutual labels: hive, etl
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+13194.59%)
Mutual labels: etl, data-engineering
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+213.51%)
Mutual labels: hive, etl
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+240.54%)
Mutual labels: etl, data-engineering
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+113.51%)
Mutual labels: etl, data-engineering
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+6345.95%)
Mutual labels: etl, data-engineering
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-24.32%)
Mutual labels: hive, etl
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+3129.73%)
Mutual labels: hive, etl