Books技术书籍等
Stars: ✭ 110 (+243.75%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (+228.13%)
HudiUpserts, Deletes And Incremental Processing on Big Data.
Stars: ✭ 2,586 (+7981.25%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+293.75%)
Ignite Book Code SamplesAll code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Stars: ✭ 86 (+168.75%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+32643.75%)
Node HbaseAsynchronous HBase client for NodeJs using REST
Stars: ✭ 226 (+606.25%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+4081.25%)
FpartSort files and pack them into partitions
Stars: ✭ 127 (+296.88%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+471.88%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+4725%)
Hadoop Attack LibraryA collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+612.5%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+240.63%)
NmflibraryMATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (+378.13%)
GriddbGridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (+4859.38%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+678.13%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (+6165.63%)
MnemonicApache Mnemonic - A non-volatile hybrid memory storage oriented library
Stars: ✭ 91 (+184.38%)
Flink Boot懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (+553.13%)
MlsqlThe Programming Language Designed For Big Data and AI
Stars: ✭ 1,262 (+3843.75%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+5278.13%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (+150%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (+515.63%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+6506.25%)
Simple It EnglishSimple-IT-English: smart wordbook from community for community
Stars: ✭ 233 (+628.13%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (+250%)
FlinkxBased on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+8184.38%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (+246.88%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (+5156.25%)
Java Notes☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据
Stars: ✭ 160 (+400%)
Daudit🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Stars: ✭ 108 (+237.5%)
TdengineAn open-source big data platform designed and optimized for the Internet of Things (IoT).
Stars: ✭ 17,434 (+54381.25%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+234.38%)
Javainterview最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Stars: ✭ 154 (+381.25%)
codefoundryExamples for gauravbytes.com
Stars: ✭ 57 (+78.13%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+228.13%)
AthenacliAthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Stars: ✭ 151 (+371.88%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+571.88%)
PoliAn easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Stars: ✭ 1,850 (+5681.25%)
Biglassobiglasso: Extending Lasso Model Fitting to Big Data in R
Stars: ✭ 87 (+171.88%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+665.63%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+168.75%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+337.5%)
Athena CliPresto-like CLI tool for AWS Athena
Stars: ✭ 85 (+165.63%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+546.88%)
TwitworkMonitor twitter stream
Stars: ✭ 133 (+315.63%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+4121.88%)
workflUXAn open-source, cloud-ready web application for simplified deployment of big data workflows.
Stars: ✭ 26 (-18.75%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+8237.5%)
TipdmTipDM建模平台,开源的数据挖掘工具。
Stars: ✭ 130 (+306.25%)