All Projects → big_data → Similar Projects or Alternatives

795 Open source projects that are alternatives of or similar to big_data

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+14.71%)

Mutual labels: big-data, hadoop, pyspark, spark-sql

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+32226.47%)

Mutual labels: big-data, hadoop, bigdata, mapreduce

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+532.35%)

Mutual labels: big-data, hadoop, bigdata

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+67.65%)

Mutual labels: big-data, bigdata, spark-sql

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+341.18%)

Mutual labels: big-data, hadoop, pyspark

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Stars: ✭ 72 (+111.76%)

Mutual labels: big-data, pyspark, mapreduce

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+8.82%)

Mutual labels: hadoop, bigdata, mapreduce

bigdatatutorial

Stars: ✭ 34 (+0%)

Mutual labels: hadoop, bigdata, spark-sql

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-85.29%)

Mutual labels: big-data, hadoop, bigdata

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-61.76%)

Mutual labels: big-data, hadoop, bigdata

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+2420.59%)

Mutual labels: hadoop, bigdata, mapreduce

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop, bigdata, mapreduce

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+226.47%)

Mutual labels: big-data, hadoop, pyspark

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+235.29%)

Mutual labels: big-data, hadoop, mapreduce

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+108.82%)

Mutual labels: big-data, bigdata, mapreduce

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+64747.06%)

Mutual labels: big-data, hadoop, mapreduce

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+3835.29%)

Mutual labels: big-data, bigdata, pyspark

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (+38.24%)

Mutual labels: big-data, hadoop, spark-sql

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+4661.76%)

Mutual labels: big-data, hadoop

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+244.12%)

Mutual labels: big-data, hadoop

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+4729.41%)

Mutual labels: big-data, hadoop

Big Data Study

🐳 big data study

Stars: ✭ 141 (+314.71%)

Mutual labels: big-data, bigdata

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (+282.35%)

Mutual labels: big-data, hadoop

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+38008.82%)

Mutual labels: big-data, hadoop

Calcite

Apache Calcite

Stars: ✭ 2,816 (+8182.35%)

Mutual labels: big-data, hadoop

optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Stars: ✭ 1,351 (+3873.53%)

Mutual labels: bigdata, pyspark

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (+26.47%)

Mutual labels: big-data, bigdata

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+850%)

Mutual labels: big-data, bigdata

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+276.47%)

Mutual labels: big-data, hadoop

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+4441.18%)

Mutual labels: big-data, bigdata

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+311.76%)

Mutual labels: big-data, hadoop

clusterdock

clusterdock is a framework for creating Docker-based container clusters

Stars: ✭ 26 (-23.53%)

Mutual labels: big-data, hadoop

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+420.59%)

Mutual labels: big-data, hadoop

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (+220.59%)

Mutual labels: big-data, bigdata

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (+214.71%)

Mutual labels: big-data, bigdata

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+270.59%)

Mutual labels: big-data, bigdata

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-14.71%)

Mutual labels: big-data, pyspark

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+329.41%)

Mutual labels: hadoop, mapreduce

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+620.59%)

Mutual labels: big-data, bigdata

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (+14.71%)

Mutual labels: hadoop, mapreduce

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+535.29%)

Mutual labels: big-data, pyspark

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (+58.82%)

Mutual labels: big-data, bigdata

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (+61.76%)

Mutual labels: pyspark, spark-sql

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (-11.76%)

Mutual labels: hadoop, bigdata

dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Stars: ✭ 135 (+297.06%)

Mutual labels: bigdata, spark-sql

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (+52.94%)

Mutual labels: hadoop, pyspark

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop, mapreduce

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-20.59%)

Mutual labels: big-data, mapreduce

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-52.94%)

Mutual labels: big-data, hadoop

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-58.82%)

Mutual labels: hadoop, mapreduce

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-14.71%)

Mutual labels: hadoop, bigdata

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop, mapreduce

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (+126.47%)

Mutual labels: bigdata, pyspark

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above