All Projects → big_data → Similar Projects or Alternatives

795 Open source projects that are alternatives of or similar to big_data

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-41.18%)

Mutual labels: big-data, mapreduce

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+2800%)

Mutual labels: bigdata, pyspark

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+835.29%)

Mutual labels: hadoop, mapreduce

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+800%)

Mutual labels: hadoop, bigdata

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+2632.35%)

Mutual labels: bigdata, mapreduce

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+2691.18%)

Mutual labels: hadoop, mapreduce

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+620.59%)

Mutual labels: big-data, bigdata

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+170.59%)

Mutual labels: hadoop, mapreduce

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+2302.94%)

Mutual labels: hadoop, bigdata

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (+76.47%)

Mutual labels: big-data, bigdata

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+270.59%)

Mutual labels: hadoop, bigdata

Shifu

An end-to-end machine learning and data mining framework on Hadoop

Stars: ✭ 207 (+508.82%)

Mutual labels: hadoop, bigdata

Bigslice

A serverless cluster computing system for the Go programming language

Stars: ✭ 469 (+1279.41%)

Mutual labels: bigdata, mapreduce

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+311.76%)

Mutual labels: big-data, hadoop

Calcite

Apache Calcite

Stars: ✭ 2,816 (+8182.35%)

Mutual labels: big-data, hadoop

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (+20.59%)

Mutual labels: big-data, hadoop

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-17.65%)

Mutual labels: hadoop, bigdata

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (+0%)

Mutual labels: big-data, bigdata

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+8426.47%)

Mutual labels: big-data, pyspark

Tez

Apache Tez

Stars: ✭ 313 (+820.59%)

Mutual labels: big-data, hadoop

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (+817.65%)

Mutual labels: big-data, bigdata

Ignite

Apache Ignite

Stars: ✭ 4,027 (+11744.12%)

Mutual labels: big-data, hadoop

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-50%)

Mutual labels: big-data, hadoop

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+1152.94%)

Mutual labels: big-data, bigdata

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+2091.18%)

Mutual labels: big-data, bigdata

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+1076.47%)

Mutual labels: big-data, hadoop

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+2914.71%)

Mutual labels: big-data, hadoop

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-29.41%)

Mutual labels: big-data, pyspark

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (+61.76%)

Mutual labels: pyspark, spark-sql

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (+220.59%)

Mutual labels: big-data, bigdata

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (+214.71%)

Mutual labels: big-data, bigdata

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (+58.82%)

Mutual labels: big-data, bigdata

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+329.41%)

Mutual labels: hadoop, mapreduce

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+276.47%)

Mutual labels: big-data, hadoop

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+244.12%)

Mutual labels: big-data, hadoop

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop, mapreduce

dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Stars: ✭ 135 (+297.06%)

Mutual labels: bigdata, spark-sql

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+38008.82%)

Mutual labels: big-data, hadoop

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

Stars: ✭ 20 (-41.18%)

Mutual labels: bigdata, mapreduce

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-52.94%)

Mutual labels: big-data, hadoop

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+9767.65%)

Mutual labels: big-data, pyspark

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-44.12%)

Mutual labels: pyspark, spark-sql

Big Data Study

🐳 big data study

Stars: ✭ 141 (+314.71%)

Mutual labels: big-data, bigdata

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+535.29%)

Mutual labels: big-data, pyspark

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-55.88%)

Mutual labels: big-data, hadoop

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+167.65%)

Mutual labels: big-data, pyspark

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+270.59%)

Mutual labels: big-data, bigdata

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (+14.71%)

Mutual labels: hadoop, mapreduce

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (+26.47%)

Mutual labels: big-data, bigdata

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-20.59%)

Mutual labels: big-data, mapreduce

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-14.71%)

Mutual labels: hadoop, bigdata

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (+126.47%)

Mutual labels: bigdata, pyspark

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (+91.18%)

Mutual labels: hadoop, bigdata

optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark