Top 164 bigdata open source projects

Reddit sse stream
A Server Side Event stream to deliver Reddit comments and submissions in near real-time to a client.
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Autocrawler
Google, Naver multiprocess image web crawler (Selenium)
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Panther
Detect threats with log data and improve cloud security posture
Spark Streaming Monitoring With Lightning
Plot live-stats as graph from ApacheSpark application using Lightning-viz
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Mobius
C# and F# language binding and extensions to Apache Spark
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Kube Batch
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Coding Now
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Gearpump
Lightweight real-time big data streaming engine over Akka
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Cds
Data syncing in golang for ClickHouse.
Bigslice
A serverless cluster computing system for the Go programming language
Tensorbase
TensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Bigdataie
大数据博客、笔试题、教程、项目、面经的整理
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Circosjs
d3 library to build circular graphs
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Big data architect skills
一个大数据架构师应该掌握的技能
Sidekick
High Performance HTTP Sidecar Load Balancer
Jigsaw
Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Datawave
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Api.rss
RSS as RESTful. This service allows you to transform RSS feed into an awesome API.
Datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Spline
Data Lineage Tracking And Visualization Solution
Janusgraph.cn
分布式图数据库 JanusGraph 中文社区,关于 JanusGraph 的一切
Arvados
An open source platform for managing and analyzing biomedical big data
Ldetool
Code generator for fast log file parsers
Docker Spark Cluster
A simple spark standalone cluster for your testing environment purposses
Big Data Rosetta Code
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
DetEdit
A graphical user interface for annotating and editing events detected in long-term acoustic monitoring data
jigsaw-seed
这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程,建议所有新增的app都以这个工程作为种子开始构建。
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
proteic
Streaming and static data visualization for the modern web.
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
centurion
Kotlin Bigdata Toolkit
v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
pulsar-user-group-loc-cn
Workspace for China local user group.
room-renting
用Python爬取安居客房源信息,并用高德地图进行可视化
ETL-Starter-Kit
📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
bqv
The simplest tool to manage views of BigQuery.
vulkn
Love your Data. Love the Environment. Love VULKИ.
flokkr
Documentation placeholder and utilities for all the other containers.
61-120 of 164 bigdata projects