All Projects → spark-acid → Similar Projects or Alternatives

818 Open source projects that are alternatives of or similar to spark-acid

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+4189.01%)

Mutual labels: big-data, spark, acid

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+11978.02%)

Mutual labels: big-data, spark, hive

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+4.4%)

Mutual labels: big-data, spark

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-37.36%)

Mutual labels: big-data, spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-28.57%)

Mutual labels: big-data, spark

Hadoop Docker

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Stars: ✭ 238 (+161.54%)

Mutual labels: spark, hive

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+182.42%)

Mutual labels: big-data, spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+5958.24%)

Mutual labels: big-data, spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+6115.38%)

Mutual labels: big-data, spark

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (+10.99%)

Mutual labels: big-data, hive

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (+52.75%)

Mutual labels: big-data, spark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+64.84%)

Mutual labels: big-data, spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+2452.75%)

Mutual labels: spark, hive

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+76.92%)

Mutual labels: spark, hive

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+3085.71%)

Mutual labels: big-data, spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-85.71%)

Mutual labels: big-data, spark

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (+361.54%)

Mutual labels: big-data, spark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+24128.57%)

Mutual labels: big-data, spark

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+34645.05%)

Mutual labels: big-data, spark

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+1137.36%)

Mutual labels: big-data, spark

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-61.54%)

Mutual labels: spark, hive

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (+6.59%)

Mutual labels: big-data, spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (+50.55%)

Mutual labels: big-data, spark

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (+20.88%)

Mutual labels: big-data, spark

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (+135.16%)

Mutual labels: big-data, hive

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+136.26%)

Mutual labels: big-data, spark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+170.33%)

Mutual labels: big-data, spark

Spark Authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark

Stars: ✭ 141 (+54.95%)

Mutual labels: spark, hive

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Stars: ✭ 1,821 (+1901.1%)

Mutual labels: spark, hive

Xsql

Unified SQL Analytics Engine Based on SparkSQL

Stars: ✭ 176 (+93.41%)

Mutual labels: spark, hive

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+38.46%)

Mutual labels: spark, hive

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-84.62%)

Mutual labels: big-data, spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+21.98%)

Mutual labels: big-data, spark

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+4934.07%)

Mutual labels: big-data, hive

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+13068.13%)

Mutual labels: spark, hive

Hive

Apache Hive

Stars: ✭ 4,031 (+4329.67%)

Mutual labels: big-data, hive

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+296.7%)

Mutual labels: big-data, spark

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+457.14%)

Mutual labels: big-data, spark

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+297.8%)

Mutual labels: big-data, spark

Sparkjni

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-87.91%)

Mutual labels: big-data, spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+718.68%)

Mutual labels: big-data, spark

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+1.1%)

Mutual labels: spark, hive

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1370.33%)

Mutual labels: big-data, spark

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-13.19%)

Mutual labels: big-data, spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+3245.05%)

Mutual labels: big-data, spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-17.58%)

Mutual labels: big-data, spark

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+1704.4%)

Mutual labels: big-data, spark

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+1679.12%)

Mutual labels: big-data, hive

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+53.85%)

Mutual labels: big-data, hive

Labs

Research on distributed system

Stars: ✭ 73 (-19.78%)

Mutual labels: big-data, spark

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (+83.52%)

Mutual labels: big-data, spark

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+14138.46%)

Mutual labels: big-data, hive

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+137.36%)

Mutual labels: big-data, spark

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+67.03%)

Mutual labels: big-data, spark

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-9.89%)

Mutual labels: spark, hive

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-7.69%)

Mutual labels: spark, hive

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-21.98%)

Mutual labels: big-data, spark

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (+68.13%)

Mutual labels: big-data, spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+171.43%)

Mutual labels: big-data, spark

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-52.75%)

Mutual labels: big-data, hive

1-60 of 818 similar projects

›

next*5