All Projects → Alluxio → Similar Projects or Alternatives

988 Open source projects that are alternatives of or similar to Alluxio

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-98.81%)

Mutual labels: data-analysis, hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+309.89%)

Mutual labels: spark, hadoop

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+122.77%)

Mutual labels: spark, presto

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-98.53%)

Mutual labels: spark, data-analysis

visions

Type System for Data Analysis in Python

Stars: ✭ 136 (-97.47%)

Mutual labels: spark, data-analysis

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-99.7%)

Mutual labels: spark, presto

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+11.69%)

Mutual labels: spark, hadoop

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (-84.81%)

Mutual labels: spark, hadoop

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-99.54%)

Mutual labels: spark, hadoop

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-99.61%)

Mutual labels: spark, hadoop

Big Whale

Spark、Flink等离线任务的调度以及实时任务的监控

Stars: ✭ 163 (-96.97%)

Mutual labels: spark, hadoop

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-97.94%)

Mutual labels: spark, hadoop

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-98.7%)

Mutual labels: spark, data-analysis

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (-14.84%)

Mutual labels: hadoop, presto

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (-91.56%)

Mutual labels: data-analysis

Docker practice

Learn and understand Docker technologies, with real DevOps practice!

Stars: ✭ 19,768 (+267.5%)

Mutual labels: spark

Pandastable

Table analysis in Tkinter using pandas DataFrames.

Stars: ✭ 376 (-93.01%)

Mutual labels: data-analysis

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (-90.54%)

Mutual labels: spark

Presto Ethereum

Presto Ethereum Connector -- SQL on Ethereum

Stars: ✭ 450 (-91.63%)

Mutual labels: presto

Bap

Bayesian Analysis with Python (Second Edition)

Stars: ✭ 379 (-92.95%)

Mutual labels: data-analysis

Prettypandas

A Pandas Styler class for making beautiful tables

Stars: ✭ 376 (-93.01%)

Mutual labels: data-analysis

Tensorflowonspark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Stars: ✭ 3,748 (-30.32%)

Mutual labels: spark

Cookbook 2nd Code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

Stars: ✭ 541 (-89.94%)

Mutual labels: data-analysis

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-90.57%)

Mutual labels: spark

Bigdataie

大数据博客、笔试题、教程、项目、面经的整理

Stars: ✭ 445 (-91.73%)

Mutual labels: spark

Hive

Apache Hive

Stars: ✭ 4,031 (-25.06%)

Mutual labels: hadoop

Data Science

Collection of useful data science topics along with code and articles

Stars: ✭ 315 (-94.14%)

Mutual labels: data-analysis

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (-93.1%)

Mutual labels: spark

Weibospider

⚡ A distributed crawler for weibo, building with celery and requests.

Stars: ✭ 4,670 (-13.18%)

Mutual labels: data-analysis

High Performance Spark Examples

Examples for High Performance Spark

Stars: ✭ 436 (-91.89%)

Mutual labels: spark

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (-93.16%)

Mutual labels: spark

Sidekick

High Performance HTTP Sidecar Load Balancer

Stars: ✭ 366 (-93.2%)

Mutual labels: spark

Pydata Notebook

利用Python进行数据分析第二版 (2017) 中文翻译笔记

Stars: ✭ 4,300 (-20.06%)

Mutual labels: data-analysis

Dataexplorer

Automate Data Exploration and Treatment

Stars: ✭ 362 (-93.27%)

Mutual labels: data-analysis

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (-93.25%)

Mutual labels: spark

Qs ledger

Quantified Self Personal Data Aggregator and Data Analysis

Stars: ✭ 559 (-89.61%)

Mutual labels: data-analysis

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (-90%)

Mutual labels: spark

Antd Umi Sys

企业BI系统，数据可视化平台，主要技术：react、antd、umi、dva、es6、less等，与君共勉，互相学习，如果喜欢请start ⭐。

Stars: ✭ 503 (-90.65%)

Mutual labels: data-analysis

Jupyter pivottablejs

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

Stars: ✭ 428 (-92.04%)

Mutual labels: data-analysis

Articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

Stars: ✭ 350 (-93.49%)

Mutual labels: data-analysis

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (-93.29%)

Mutual labels: spark

Pandas Summary

An extension to pandas dataframes describe function.

Stars: ✭ 361 (-93.29%)

Mutual labels: data-analysis

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-93.27%)

Mutual labels: spark

Awesome R

A curated list of awesome R packages, frameworks and software.

Stars: ✭ 4,858 (-9.69%)

Mutual labels: data-analysis

Iclr2020 Openreviewdata

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

Stars: ✭ 426 (-92.08%)

Mutual labels: data-analysis

Quantitative Notebooks

Educational notebooks on quantitative finance, algorithmic trading, financial modelling and investment strategy

Stars: ✭ 356 (-93.38%)

Mutual labels: data-analysis

Sparkstreaming

Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计；SpringBoot+Echarts实现数据可视化展示

Stars: ✭ 349 (-93.51%)

Mutual labels: spark

Dji Firmware Tools

Tools for handling firmwares of DJI products, with focus on quadcopters.

Stars: ✭ 424 (-92.12%)

Mutual labels: spark

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (-93.62%)

Mutual labels: spark

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (-93.59%)

Mutual labels: spark

Lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

Stars: ✭ 530 (-90.15%)

Mutual labels: spark

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (-90.93%)

Mutual labels: hadoop

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (-92.12%)

Mutual labels: spark

Scalnet

A Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs