All Projects → big_data → Similar Projects or Alternatives

795 Open source projects that are alternatives of or similar to big_data

MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-41.18%)
Mutual labels:  big-data, mapreduce
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+2800%)
Mutual labels:  bigdata, pyspark
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+835.29%)
Mutual labels:  hadoop, mapreduce
Spline
Data Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+800%)
Mutual labels:  hadoop, bigdata
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+2632.35%)
Mutual labels:  bigdata, mapreduce
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+2691.18%)
Mutual labels:  hadoop, mapreduce
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+620.59%)
Mutual labels:  big-data, bigdata
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+170.59%)
Mutual labels:  hadoop, mapreduce
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+2302.94%)
Mutual labels:  hadoop, bigdata
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+76.47%)
Mutual labels:  big-data, bigdata
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+270.59%)
Mutual labels:  hadoop, bigdata
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+508.82%)
Mutual labels:  hadoop, bigdata
Bigslice
A serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+1279.41%)
Mutual labels:  bigdata, mapreduce
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+311.76%)
Mutual labels:  big-data, hadoop
Calcite
Apache Calcite
Stars: ✭ 2,816 (+8182.35%)
Mutual labels:  big-data, hadoop
big-data-lite
Samples to the Oracle Big Data Lite VM
Stars: ✭ 41 (+20.59%)
Mutual labels:  big-data, hadoop
learning-spark
Tidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-17.65%)
Mutual labels:  hadoop, bigdata
NiFi-Rule-engine-processor
Drools processor for Apache NiFi
Stars: ✭ 34 (+0%)
Mutual labels:  big-data, bigdata
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+8426.47%)
Mutual labels:  big-data, pyspark
Tez
Apache Tez
Stars: ✭ 313 (+820.59%)
Mutual labels:  big-data, hadoop
Uproot3
ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+817.65%)
Mutual labels:  big-data, bigdata
Ignite
Apache Ignite
Stars: ✭ 4,027 (+11744.12%)
Mutual labels:  big-data, hadoop
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-50%)
Mutual labels:  big-data, hadoop
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+1152.94%)
Mutual labels:  big-data, bigdata
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+2091.18%)
Mutual labels:  big-data, bigdata
Kafka Connect Hdfs
Kafka Connect HDFS connector
Stars: ✭ 400 (+1076.47%)
Mutual labels:  big-data, hadoop
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+2914.71%)
Mutual labels:  big-data, hadoop
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-29.41%)
Mutual labels:  big-data, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+61.76%)
Mutual labels:  pyspark, spark-sql
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+220.59%)
Mutual labels:  big-data, bigdata
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+214.71%)
Mutual labels:  big-data, bigdata
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+58.82%)
Mutual labels:  big-data, bigdata
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+329.41%)
Mutual labels:  hadoop, mapreduce
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (+276.47%)
Mutual labels:  big-data, hadoop
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+244.12%)
Mutual labels:  big-data, hadoop
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop, mapreduce
dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+297.06%)
Mutual labels:  bigdata, spark-sql
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+38008.82%)
Mutual labels:  big-data, hadoop
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-41.18%)
Mutual labels:  bigdata, mapreduce
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-52.94%)
Mutual labels:  big-data, hadoop
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+9767.65%)
Mutual labels:  big-data, pyspark
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-44.12%)
Mutual labels:  pyspark, spark-sql
Big Data Study
🐳 big data study
Stars: ✭ 141 (+314.71%)
Mutual labels:  big-data, bigdata
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+535.29%)
Mutual labels:  big-data, pyspark
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-55.88%)
Mutual labels:  big-data, hadoop
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+167.65%)
Mutual labels:  big-data, pyspark
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+270.59%)
Mutual labels:  big-data, bigdata
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+14.71%)
Mutual labels:  hadoop, mapreduce
twitter-archive-reader
Full featured TypeScript Twitter archive reader and browser
Stars: ✭ 43 (+26.47%)
Mutual labels:  big-data, bigdata
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-20.59%)
Mutual labels:  big-data, mapreduce
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-14.71%)
Mutual labels:  hadoop, bigdata
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+126.47%)
Mutual labels:  bigdata, pyspark
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+91.18%)
Mutual labels:  hadoop, bigdata
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3873.53%)
Mutual labels:  bigdata, pyspark
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+705.88%)
Mutual labels:  pyspark, mapreduce
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+47.06%)
Mutual labels:  bigdata, pyspark
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (+135.29%)
Mutual labels:  big-data, bigdata
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop, mapreduce
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-44.12%)
Mutual labels:  big-data, bigdata
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-5.88%)
Mutual labels:  hadoop, pyspark
61-120 of 795 similar projects