All Projects → Metorikku → Similar Projects or Alternatives

1712 Open source projects that are alternatives of or similar to Metorikku

Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (-61.22%)
Mutual labels:  big-data, etl
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (-34.35%)
Mutual labels:  big-data, distributed-computing
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+183.93%)
Mutual labels:  big-data, distributed-computing
Delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+981.16%)
Mutual labels:  spark, big-data
Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+956.23%)
Mutual labels:  spark, big-data
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (+40.44%)
Mutual labels:  spark, big-data
Bandar Log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-94.74%)
Mutual labels:  big-data, etl
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+119.67%)
Mutual labels:  spark, etl-framework
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+106.37%)
Mutual labels:  spark, big-data
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-84.21%)
Mutual labels:  spark, big-data
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+75.35%)
Mutual labels:  spark, etl
Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (-80.06%)
Mutual labels:  spark, etl
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-80.33%)
Mutual labels:  spark, big-data
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (+46.26%)
Mutual labels:  big-data, distributed-computing
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-69.53%)
Mutual labels:  spark, big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+2944.6%)
Mutual labels:  spark, big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+613.57%)
Mutual labels:  spark, big-data
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-73.13%)
Mutual labels:  spark, big-data
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-61.5%)
Mutual labels:  spark, big-data
Js Spark
Realtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-48.2%)
Mutual labels:  spark, distributed-computing
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+270.64%)
Mutual labels:  spark, big-data
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-31.58%)
Mutual labels:  spark, big-data
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-31.86%)
Mutual labels:  spark, big-data
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+0.55%)
Mutual labels:  sql, spark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-40.17%)
Mutual labels:  spark, big-data
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+69.25%)
Mutual labels:  sql, spark
Beam
Apache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+1326.32%)
Mutual labels:  sql, big-data
Sylph
Stream computing platform for bigdata
Stars: ✭ 362 (+0.28%)
Mutual labels:  sql, big-data
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+1191.41%)
Mutual labels:  big-data, distributed-computing
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+220.5%)
Mutual labels:  sql, etl
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-88.64%)
Mutual labels:  sql, etl
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-93.07%)
Mutual labels:  spark, etl
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+164.82%)
Mutual labels:  sql, etl
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (-63.99%)
Mutual labels:  sql, big-data
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (-69.81%)
Mutual labels:  sql, spark
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-96.12%)
Mutual labels:  big-data, spark
bandar-log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-94.46%)
Mutual labels:  big-data, etl
Bitcoin Etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (-51.8%)
Mutual labels:  sql, etl
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+512.47%)
Mutual labels:  sql, etl
Bulk Writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (-41.83%)
Mutual labels:  sql, etl
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+543.49%)
Mutual labels:  sql, spark
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-94.46%)
Mutual labels:  etl, etl-framework
vixtract
www.vixtract.ru
Stars: ✭ 40 (-88.92%)
Mutual labels:  etl, etl-framework
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-80.06%)
Mutual labels:  big-data, distributed-computing
Phoenix
Mirror of Apache Phoenix
Stars: ✭ 867 (+140.17%)
Mutual labels:  sql, big-data
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+69.53%)
Mutual labels:  etl, etl-framework
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (-64.82%)
Mutual labels:  big-data, distributed-computing
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-89.47%)
Mutual labels:  etl, etl-framework
OpenKettleWebUI
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (-61.77%)
Mutual labels:  etl, etl-framework
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-94.18%)
Mutual labels:  etl, etl-framework
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-95.57%)
Mutual labels:  etl, etl-framework
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.4%)
Mutual labels:  big-data, spark
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (-81.44%)
Mutual labels:  etl, etl-framework
Bender
Bender - Serverless ETL Framework
Stars: ✭ 171 (-52.63%)
Mutual labels:  etl, etl-framework
Etlbox
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (-43.77%)
Mutual labels:  etl, etl-framework
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-95.57%)
Mutual labels:  sql, spark
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (-28.81%)
Mutual labels:  spark, big-data
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-69.25%)
Mutual labels:  big-data, spark
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-92.24%)
Mutual labels:  etl, etl-framework
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-24.65%)
Mutual labels:  spark, etl
61-120 of 1712 similar projects