bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-90.48%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-17.69%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-7.48%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+167.35%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-57.14%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+13605.44%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+13347.62%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (-80.95%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+545.58%)
Silexsomething to help you spark
Stars: ✭ 61 (-58.5%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+2493.88%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-76.87%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+21408.84%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+497.28%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-38.1%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-73.47%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-11.56%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-89.12%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+2449.66%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-14.29%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+7640.14%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-59.18%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+153.06%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-40.14%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-35.37%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-92.52%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-18.37%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-76.19%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+482.99%)
Spark-ArResources for Spark AR
Stars: ✭ 43 (-70.75%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-86.39%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-82.31%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (-65.31%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-21.77%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+531.97%)
dlsaDistributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-82.99%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-41.5%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-89.12%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-83.67%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+1008.84%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+150.34%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+148.98%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+876.19%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-59.18%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+146.94%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+145.58%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+653.74%)