80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+88.84%)

Mutual labels: spark, hadoop

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (+80.93%)

Mutual labels: big-data, hadoop

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (+95.35%)

Mutual labels: spark, big-data

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (+94.88%)

Mutual labels: spark, apache-spark

Cleanframes

type-class based data cleansing library for Apache Spark SQL

Stars: ✭ 75 (-65.12%)

Mutual labels: spark, bigdata

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-61.4%)

Mutual labels: spark, apache-spark

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-54.88%)

Mutual labels: spark, big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+135.81%)

Mutual labels: spark, big-data

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+5485.58%)

Mutual labels: spark, hadoop

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+653.02%)

Mutual labels: big-data, hadoop

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+185.12%)

Mutual labels: hadoop, apache-spark

Spark On K8s Operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Stars: ✭ 1,780 (+727.91%)

Mutual labels: spark, apache-spark

Sparktutorial

Source code for James Lee's Aparch Spark with Java course

Stars: ✭ 105 (-51.16%)

Mutual labels: spark, bigdata

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (-50.23%)

Mutual labels: big-data, bigdata

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+268.84%)

Mutual labels: spark, apache-spark

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (-38.6%)

Mutual labels: spark, hadoop

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-49.3%)

Mutual labels: big-data, bigdata

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-48.84%)

Mutual labels: spark, big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+618.14%)

Mutual labels: big-data, bigdata

Lambda Arch

Applying Lambda Architecture with Spark, Kafka, and Cassandra.

Stars: ✭ 111 (-48.37%)

Mutual labels: spark, bigdata

Xlearning Xdml

extremely distributed machine learning

Stars: ✭ 113 (-47.44%)

Mutual labels: spark, hadoop

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-78.14%)

Mutual labels: big-data, hadoop

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-88.84%)

Mutual labels: apache-spark, hadoop

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+341.4%)

Mutual labels: spark, hadoop

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+455.81%)

Mutual labels: spark, hadoop

121-180 of 1035 similar projects