beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+16.22%)
waggle-danceHive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Stars: ✭ 194 (+424.32%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+75.68%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-45.95%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+9913.51%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+6345.95%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+54.05%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+27.03%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+113.51%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+240.54%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1554.05%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+3129.73%)
Pyetlpython ETL framework
Stars: ✭ 33 (-10.81%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-62.16%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+654.05%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+108.11%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (+18.92%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (+94.59%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+824.32%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+278.38%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+113.51%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-35.14%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+1562.16%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+289.19%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+905.41%)
apiaryApiary provides modules which can be combined to create a federated cloud data lake
Stars: ✭ 30 (-18.92%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+43.24%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+1610.81%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+13194.59%)
simple-ddl-parserSimple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
Stars: ✭ 76 (+105.41%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-43.24%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-24.32%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+213.51%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+197.3%)
HiveLightweight and blazing fast key-value database written in pure Dart.
Stars: ✭ 2,681 (+7145.95%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (+186.49%)
Php Thrift SqlA PHP library for connecting to Hive or Impala over Thrift
Stars: ✭ 107 (+189.19%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (+8.11%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+375.68%)
Ecency MobileEcency Mobile - reimagined social blogging, contribute and get rewarded (for Android and iOS)
Stars: ✭ 103 (+178.38%)
PyhivePython interface to Hive and Presto. 🐝
Stars: ✭ 1,378 (+3624.32%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+172.97%)
Esteem SurferEcency desktop formerly known as Esteem Surfer - reimagined desktop social wallet, contribute and get rewarded (for Windows, Mac, Linux)
Stars: ✭ 100 (+170.27%)
id3cData logistics system enabling real-time pathogen surveillance. Built for the Seattle Flu Study.
Stars: ✭ 21 (-43.24%)
thainThain is a distributed flow schedule platform.
Stars: ✭ 81 (+118.92%)
Springboot Templatesspringboot和dubbo、netty的集成,redis mongodb的nosql模板, kafka rocketmq rabbit的MQ模板, solr solrcloud elasticsearch查询引擎
Stars: ✭ 100 (+170.27%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+6178.38%)
BitalarmAn app to keep track of different cryptocurrencies, written in dart + flutter
Stars: ✭ 94 (+154.05%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+148.65%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+200%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+34918.92%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (+151.35%)