Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-52.68%)

Mutual labels: spark

Spark Kafka Writer

Write your Spark data to Kafka seamlessly

Stars: ✭ 175 (-14.63%)

Mutual labels: spark

Spark Dependencies

Spark job for dependency links

Stars: ✭ 82 (-60%)

Mutual labels: spark

Opaque

An encrypted data analytics platform

Stars: ✭ 129 (-37.07%)

Mutual labels: spark

Lehar

Visualize data using relative ordering

Stars: ✭ 81 (-60.49%)

Mutual labels: spark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (-22.93%)

Mutual labels: spark

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-61.46%)

Mutual labels: spark

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-13.66%)

Mutual labels: spark-streaming

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (-21.46%)

Mutual labels: spark

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-32.2%)

Mutual labels: spark

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-52.68%)

Mutual labels: spark

Home

ApacheCN 开源组织：公告、介绍、成员、活动、交流方式

Stars: ✭ 1,199 (+484.88%)

Mutual labels: spark

Azuredatabricksbestpractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs

Stars: ✭ 186 (-9.27%)

Mutual labels: spark

Cleanframes

type-class based data cleansing library for Apache Spark SQL

Stars: ✭ 75 (-63.41%)

Mutual labels: spark

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (-37.56%)

Mutual labels: spark

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+482.93%)

Mutual labels: spark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (-24.39%)

Mutual labels: spark

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-63.41%)

Mutual labels: spark

Spring Boot Quick

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌

Stars: ✭ 1,819 (+787.32%)

Mutual labels: spark

Labs

Research on distributed system

Stars: ✭ 73 (-64.39%)

Mutual labels: spark

Spark

Firely's open source FHIR server

Stars: ✭ 174 (-15.12%)

Mutual labels: spark

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (-64.88%)

Mutual labels: spark

Lift

The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.

Stars: ✭ 127 (-38.05%)

Mutual labels: spark

Usersessionbehaviorofflineanalysis

四川大学拓思爱诺用户session行为数据离线分析项目

Stars: ✭ 69 (-66.34%)

Mutual labels: spark

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Stars: ✭ 2,092 (+920.49%)

Mutual labels: spark-streaming

Kontextfrei

Writing application logic for Spark jobs that can be unit-tested without a SparkContext

Stars: ✭ 67 (-67.32%)

Mutual labels: spark

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-38.54%)

Mutual labels: spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-68.29%)

Mutual labels: spark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (-2.44%)

Mutual labels: spark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (-68.78%)

Mutual labels: spark

Scala Samples

There are pieces of scala code that explain Scala syntax and related things - like what you can do with all this

Stars: ✭ 125 (-39.02%)

Mutual labels: spark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-69.27%)

Mutual labels: spark

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (-25.37%)

Mutual labels: spark

Roffildlibrary

Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS

Stars: ✭ 63 (-69.27%)

Mutual labels: spark

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-40.49%)

Mutual labels: spark

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-70.73%)

Mutual labels: spark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+1128.29%)

Mutual labels: spark

Zparkio

Boiler plate framework to use Spark and ZIO together.

Stars: ✭ 121 (-40.98%)

Mutual labels: spark

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-71.71%)

Mutual labels: spark

Streamline

StreamLine - Streaming Analytics

Stars: ✭ 151 (-26.34%)

Mutual labels: spark-streaming

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

Stars: ✭ 57 (-72.2%)

Mutual labels: spark

Net.jgp.labs.spark

Apache Spark examples exclusively in Java

Stars: ✭ 55 (-73.17%)

Mutual labels: spark

Kotlin Spark Api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Stars: ✭ 183 (-10.73%)

Mutual labels: spark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (-53.17%)

Mutual labels: spark

Spark Submit Ui

This is a based on playframwork for submit spark app

Stars: ✭ 53 (-74.15%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+813.66%)

Mutual labels: spark

Isolation Forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Stars: ✭ 139 (-32.2%)

Mutual labels: spark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+552.68%)

Mutual labels: spark

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-55.12%)

Mutual labels: spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+1033.17%)

Mutual labels: spark

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Stars: ✭ 1,821 (+788.29%)

Mutual labels: spark

Spark Summit 2017 Sanfrancisco

spark summit 2017 SanFrancisco

Stars: ✭ 93 (-54.63%)

Mutual labels: spark

Big Data

🔧 Use dplyr to analyze Big Data 🐘