Silexsomething to help you spark
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Spark NkpNatural Korean Processor for Apache Spark
Awesome Recommendation EngineThe purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
GatkOfficial code repository for GATK versions 4 and up
PixiedustPython Helper library for Jupyter Notebooks
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
SparkmagicJupyter magics and kernels for working with remote Spark clusters
PucketBucketing and partitioning system for Parquet
HeraclesHigh performance HBase / Spark SQL engine
SparkApache Spark - A unified analytics engine for large-scale data processing
FlintA Time Series Library for Apache Spark
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
UrhoxUrho3D extension library
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
MlfeatureFeature engineering toolkit for Spark MLlib.
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
SparkjniA heterogeneous Apache Spark framework.
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
MobiusC# and F# language binding and extensions to Apache Spark
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.