All Projects → Metorikku → Similar Projects or Alternatives

1712 Open source projects that are alternatives of or similar to Metorikku

Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-58.45%)
Mutual labels:  sql, spark, big-data, distributed-computing
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-78.12%)
Mutual labels:  spark, big-data, etl
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+8658.45%)
Mutual labels:  sql, spark, big-data
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-89.2%)
Mutual labels:  big-data, etl, etl-framework
Spark Website
Apache Spark Website
Stars: ✭ 75 (-79.22%)
Mutual labels:  sql, spark, big-data
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (-60.11%)
Mutual labels:  big-data, etl, etl-framework
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-57.89%)
Mutual labels:  spark, big-data, distributed-computing
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-72.02%)
Mutual labels:  sql, big-data
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (-57.34%)
Mutual labels:  sql, etl
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+0.28%)
Mutual labels:  spark, big-data
Clickhouse
ClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+5741.83%)
Mutual labels:  sql, big-data
Locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-79.78%)
Mutual labels:  sql, etl
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-78.12%)
Mutual labels:  sql, etl
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+404.43%)
Mutual labels:  sql, spark
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (-69.81%)
Mutual labels:  sql, spark
Presto Go Client
A Presto client for the Go programming language.
Stars: ✭ 183 (-49.31%)
Mutual labels:  sql, big-data
Bulk Writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (-41.83%)
Mutual labels:  sql, etl
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+543.49%)
Mutual labels:  sql, spark
link-move
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-91.14%)
Mutual labels:  etl, etl-framework
vixtract
www.vixtract.ru
Stars: ✭ 40 (-88.92%)
Mutual labels:  etl, etl-framework
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-89.2%)
Mutual labels:  big-data, distributed-computing
Sylph
Stream computing platform for bigdata
Stars: ✭ 362 (+0.28%)
Mutual labels:  sql, big-data
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+69.53%)
Mutual labels:  etl, etl-framework
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-95.29%)
Mutual labels:  big-data, etl
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-93.35%)
Mutual labels:  etl, etl-framework
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-73.68%)
Mutual labels:  big-data, spark
Kamu Cli
Next generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-80.89%)
Mutual labels:  sql, spark
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+220.5%)
Mutual labels:  sql, etl
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-88.64%)
Mutual labels:  sql, etl
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-94.18%)
Mutual labels:  etl, etl-framework
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-74.79%)
Mutual labels:  big-data, spark
frovedis
Framework of vectorized and distributed data analytics
Stars: ✭ 59 (-83.66%)
Mutual labels:  spark, distributed-computing
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (-63.99%)
Mutual labels:  sql, big-data
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+3489.2%)
Mutual labels:  sql, big-data
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+164.82%)
Mutual labels:  sql, etl
Xsql
Unified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-51.25%)
Mutual labels:  sql, spark
Bitcoin Etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (-51.8%)
Mutual labels:  sql, etl
Calcite
Apache Calcite
Stars: ✭ 2,816 (+680.06%)
Mutual labels:  sql, big-data
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+512.47%)
Mutual labels:  sql, etl
Datafuse
Datafuse is a free Cloud-Native Analytics DBMS(Inspired by ClickHouse) implemented in Rust
Stars: ✭ 327 (-9.42%)
Mutual labels:  sql, distributed-computing
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-94.46%)
Mutual labels:  etl, etl-framework
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-80.06%)
Mutual labels:  big-data, distributed-computing
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (-64.82%)
Mutual labels:  big-data, distributed-computing
OpenKettleWebUI
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (-61.77%)
Mutual labels:  etl, etl-framework
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-89.47%)
Mutual labels:  etl, etl-framework
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-81.44%)
Mutual labels:  etl, etl-framework
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-94.18%)
Mutual labels:  etl, etl-framework
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-95.57%)
Mutual labels:  etl, etl-framework
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.4%)
Mutual labels:  big-data, spark
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-92.24%)
Mutual labels:  etl, etl-framework
bandar-log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-94.46%)
Mutual labels:  big-data, etl
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-93.07%)
Mutual labels:  spark, etl
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-96.12%)
Mutual labels:  big-data, spark
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-24.65%)
Mutual labels:  spark, etl
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (-28.81%)
Mutual labels:  spark, big-data
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1168.98%)
Mutual labels:  sql, big-data
Smooks
An extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (-18.84%)
Mutual labels:  big-data, etl
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-95.57%)
Mutual labels:  sql, spark
Phoenix
Mirror of Apache Phoenix
Stars: ✭ 867 (+140.17%)
Mutual labels:  sql, big-data
BETL-old
BETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-95.29%)
Mutual labels:  etl, etl-framework
1-60 of 1712 similar projects