spark-rootApache Spark Data Source for ROOT File Format
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
mongo-spark-jupyterDocker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
osm4scalaScala and Spark library focused on reading OpenStreetMap Pbf files.
spark-on-k8s-gcp-examplesExample Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub
uberscriptqueryUberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
iisInformation Inference Service of the OpenAIRE system
epitweetrECDC Early warning tool using Twitter data
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
LSTM-TensorSparkImplementation of a LSTM with TensorFlow and distributed on Apache Spark
openblockchain{START HERE} docker engine to roll your own openblockchain
rulegin基于JavaScript Engine的轻量级规则引擎系统,重构于开源IOT项目thingboard
algoboxOpen Source algorithmic trading platform in Java / Python
almaren-frameworkThe Almaren Framework provides a simplified consistent minimalistic layer over Apache Spark. While still allowing you to take advantage of native Apache Spark features. You can still combine it with standard Spark code.
T-WatchReal Time Twitter Sentiment Analysis Product
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
CuBitGeneral-purpose, formally-verified, 64-bit operating system in SPARK/Ada for x86-64
generator-mitosisA micro-service infrastructure generator based on Yeoman/Chatbot, Kubernetes/Docker Swarm, Traefik, Ansible, Jenkins, Spark, Hadoop, Kafka, etc.
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Spark ALS基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
spark-gdeltBinding the GDELT universe in a Spark environment
nlp ryanStudy for Natural Language Processing & Deep Learning Framework
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.