tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (+75%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (+22.22%)
Spark Fast TestsApache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Stars: ✭ 249 (+591.67%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-25%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+583.33%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (+63.89%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+7311.11%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+163.89%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (+80.56%)
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+547.22%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-27.78%)
sparkar-voltsAn extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-58.33%)
Ruby SparkRuby wrapper for Apache Spark
Stars: ✭ 221 (+513.89%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-61.11%)
Spark ExcelA Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (+500%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-41.67%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+497.22%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-11.11%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+469.44%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+402.78%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-55.56%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+455.56%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+66.67%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+427.78%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-5.56%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (+416.67%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-5.56%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+6733.33%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-8.33%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (+397.22%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (+383.33%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+34002.78%)
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (-33.33%)
wow-spark🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Stars: ✭ 20 (-44.44%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (+363.89%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+352.78%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-58.33%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (+341.67%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+8.33%)
opaque-sqlAn encrypted data analytics platform
Stars: ✭ 169 (+369.44%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+50%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-50%)