SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+51732.79%)
SilexstarterStarter app based on Silex framework with mvc and modular arch, scaffold generator, and admin panel
Stars: ✭ 11 (-81.97%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1463.93%)
ChroniclerScala toolchain for InfluxDB
Stars: ✭ 24 (-60.66%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-26.23%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-77.05%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-11.48%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+1288.52%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1516.39%)
SilexSilex is a static website builder in the cloud.
Stars: ✭ 958 (+1470.49%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+1401.64%)
Awesome Recommendation EngineThe purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Stars: ✭ 47 (-22.95%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+1455.74%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+1339.34%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-80.33%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-4.92%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-81.97%)
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
Stars: ✭ 25 (-59.02%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-13.11%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-50.82%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-6.56%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-52.46%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-55.74%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-29.51%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-77.05%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-9.84%)
UrhoxUrho3D extension library
Stars: ✭ 13 (-78.69%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+1542.62%)
MlfeatureFeature engineering toolkit for Spark MLlib.
Stars: ✭ 12 (-80.33%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-81.97%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1536.07%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1304.92%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+1654.1%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-57.38%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+1531.15%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1422.95%)
Cache Service ProviderA Cache Service Provider for Silex, using the doctrine/cache package
Stars: ✭ 23 (-62.3%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-62.3%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-1.64%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-1.64%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-18.03%)
Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (-44.26%)