awesome-bigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 11,093 (+328.96%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+3.17%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+305.18%)
hudi-demos汇总Apache Hudi中的一些Demo,便于快速上手Apache Hudi(Apache Hudi Demos to help beginners know about Hudi)
Stars: ✭ 63 (-97.56%)
GearpumpLightweight real-time big data streaming engine over Akka
Stars: ✭ 745 (-71.19%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-95.86%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (-18.25%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-95.94%)
Go Kafka ExampleGolang Kafka consumer and producer example
Stars: ✭ 108 (-95.82%)
GriddbGridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (-38.63%)
Awesome Streaminga curated list of awesome streaming frameworks, applications, etc
Stars: ✭ 1,879 (-27.34%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-95.94%)
Neo4j StreamsNeo4j Kafka Integrations, Docs =>
Stars: ✭ 126 (-95.13%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-48.26%)
Awesome Single CellCommunity-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Stars: ✭ 1,937 (-25.1%)
RikoA Python stream processing engine modeled after Yahoo! Pipes
Stars: ✭ 1,571 (-39.25%)
MnemonicApache Mnemonic - A non-volatile hybrid memory storage oriented library
Stars: ✭ 91 (-96.48%)
Ignite Book Code SamplesAll code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Stars: ✭ 86 (-96.67%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (-40.29%)
MlsqlThe Programming Language Designed For Big Data and AI
Stars: ✭ 1,262 (-51.2%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (-33.45%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+339.98%)
GsfGrid Solutions Framework
Stars: ✭ 106 (-95.9%)
FpartSort files and pack them into partitions
Stars: ✭ 127 (-95.09%)
MediapipeCross-platform, customizable ML solutions for live and streaming media.
Stars: ✭ 15,338 (+493.12%)
LeofsThe LeoFS Storage System
Stars: ✭ 1,439 (-44.35%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-95.13%)
WayebWayeb is a Complex Event Processing and Forecasting (CEP/F) engine written in Scala.
Stars: ✭ 138 (-94.66%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-96.25%)
Pulsar FlinkElastic data processing with Apache Pulsar and Apache Flink
Stars: ✭ 126 (-95.13%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (-22.47%)
Biglassobiglasso: Extending Lasso Model Fitting to Big Data in R
Stars: ✭ 87 (-96.64%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-95.67%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-96.67%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (-28.81%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-95.71%)
Athena CliPresto-like CLI tool for AWS Athena
Stars: ✭ 85 (-96.71%)
KsppA high performance/ real-time C++ Kafka streams framework (C++17)
Stars: ✭ 80 (-96.91%)
TwitworkMonitor twitter stream
Stars: ✭ 133 (-94.86%)
Books技术书籍等
Stars: ✭ 110 (-95.75%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-96.91%)
MachineMachine is a workflow/pipeline library for processing data
Stars: ✭ 78 (-96.98%)
WallyDistributed Stream Processing
Stars: ✭ 1,461 (-43.5%)
Fs2 KafkaKafka client for functional streams for scala (fs2)
Stars: ✭ 75 (-97.1%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-97.1%)
SamsaraSamsara is a real-time analytics platform
Stars: ✭ 132 (-94.9%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (-34.96%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-97.14%)