Top 164 bigdata open source projects

Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Dpark
Python clone of Spark, a MapReduce alike framework in Python
Simple It English
Simple-IT-English: smart wordbook from community for community
Hadoop Attack Library
A collection of pentest tools and resources targeting Hadoop environments
Tdengine
An open-source big data platform designed and optimized for the Internet of Things (IoT).
Node Hbase
Asynchronous HBase client for NodeJs using REST
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Flink Boot
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Awesome Learning
实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Flinkx
Based on Apache Flink. support data synchronization/integration and streaming SQL computation.
Bigdata practice
大数据分析可视化实践
Java Notes
☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据
Javainterview
最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Athenacli
AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Avro
Apache Avro is a data serialization system.
Poli
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Ecommercerecommendsystem
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Tipdm
TipDM建模平台,开源的数据挖掘工具。
Fpart
Sort files and pack them into partitions
Volcano
A Cloud Native Batch System (Project under CNCF)
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Liteflow
liteflow是一个基于任务版本来实现的分布式任务流调度系统
Lambda Arch
Applying Lambda Architecture with Spark, Kafka, and Cassandra.
Books
技术书籍等
Flinkstreamsql
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Daudit
🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Griddb
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Flink Notes
flink学习笔记
Sparktutorial
Source code for James Lee's Aparch Spark with Java course
Splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Covid19 Market Waiting Times
A project to help people stand in line at the market as little as possible
Mnemonic
Apache Mnemonic - A non-volatile hybrid memory storage oriented library
Biglasso
biglasso: Extending Lasso Model Fitting to Big Data in R
Ignite Book Code Samples
All code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Mlsql
The Programming Language Designed For Big Data and AI
Athena Cli
Presto-like CLI tool for AWS Athena
Hudi Resources
汇总Apache Hudi相关资料
Uproot4
ROOT I/O in pure Python and NumPy.
Cleanframes
type-class based data cleansing library for Apache Spark SQL
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
1-60 of 164 bigdata projects