All Projects → spark-acid → Similar Projects or Alternatives

818 Open source projects that are alternatives of or similar to spark-acid

Delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+4189.01%)
Mutual labels:  big-data, spark, acid
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+11978.02%)
Mutual labels:  big-data, spark, hive
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+4.4%)
Mutual labels:  big-data, spark
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-37.36%)
Mutual labels:  big-data, spark
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-28.57%)
Mutual labels:  big-data, spark
Hadoop Docker
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+161.54%)
Mutual labels:  spark, hive
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (+182.42%)
Mutual labels:  big-data, spark
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+5958.24%)
Mutual labels:  big-data, spark
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+6115.38%)
Mutual labels:  big-data, spark
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+10.99%)
Mutual labels:  big-data, hive
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (+52.75%)
Mutual labels:  big-data, spark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+64.84%)
Mutual labels:  big-data, spark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+2452.75%)
Mutual labels:  spark, hive
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+76.92%)
Mutual labels:  spark, hive
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+3085.71%)
Mutual labels:  big-data, spark
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-85.71%)
Mutual labels:  big-data, spark
Listenbrainz Server
Server for the ListenBrainz project
Stars: ✭ 420 (+361.54%)
Mutual labels:  big-data, spark
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+24128.57%)
Mutual labels:  big-data, spark
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+34645.05%)
Mutual labels:  big-data, spark
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+1137.36%)
Mutual labels:  big-data, spark
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-61.54%)
Mutual labels:  spark, hive
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (+6.59%)
Mutual labels:  big-data, spark
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (+50.55%)
Mutual labels:  big-data, spark
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (+20.88%)
Mutual labels:  big-data, spark
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+135.16%)
Mutual labels:  big-data, hive
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+136.26%)
Mutual labels:  big-data, spark
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+170.33%)
Mutual labels:  big-data, spark
Spark Authorizer
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (+54.95%)
Mutual labels:  spark, hive
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+1901.1%)
Mutual labels:  spark, hive
Xsql
Unified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+93.41%)
Mutual labels:  spark, hive
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+38.46%)
Mutual labels:  spark, hive
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-84.62%)
Mutual labels:  big-data, spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+21.98%)
Mutual labels:  big-data, spark
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+4934.07%)
Mutual labels:  big-data, hive
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+13068.13%)
Mutual labels:  spark, hive
Hive
Apache Hive
Stars: ✭ 4,031 (+4329.67%)
Mutual labels:  big-data, hive
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+296.7%)
Mutual labels:  big-data, spark
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (+457.14%)
Mutual labels:  big-data, spark
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+297.8%)
Mutual labels:  big-data, spark
Sparkjni
A heterogeneous Apache Spark framework.
Stars: ✭ 11 (-87.91%)
Mutual labels:  big-data, spark
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+718.68%)
Mutual labels:  big-data, spark
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+1.1%)
Mutual labels:  spark, hive
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1370.33%)
Mutual labels:  big-data, spark
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-13.19%)
Mutual labels:  big-data, spark
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+3245.05%)
Mutual labels:  big-data, spark
Spark Website
Apache Spark Website
Stars: ✭ 75 (-17.58%)
Mutual labels:  big-data, spark
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1704.4%)
Mutual labels:  big-data, spark
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1679.12%)
Mutual labels:  big-data, hive
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+53.85%)
Mutual labels:  big-data, hive
Labs
Research on distributed system
Stars: ✭ 73 (-19.78%)
Mutual labels:  big-data, spark
Geopyspark
GeoTrellis for PySpark
Stars: ✭ 167 (+83.52%)
Mutual labels:  big-data, spark
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+14138.46%)
Mutual labels:  big-data, hive
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+137.36%)
Mutual labels:  big-data, spark
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+67.03%)
Mutual labels:  big-data, spark
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-9.89%)
Mutual labels:  spark, hive
Hops Examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-7.69%)
Mutual labels:  spark, hive
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-21.98%)
Mutual labels:  big-data, spark
Spark.jl
Julia binding for Apache Spark
Stars: ✭ 153 (+68.13%)
Mutual labels:  big-data, spark
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+171.43%)
Mutual labels:  big-data, spark
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-52.75%)
Mutual labels:  big-data, hive
1-60 of 818 similar projects