Alternatives and detailed information of spark-streaming-visualize

Ranlot / spark-streaming-visualize

Licence: other

Simple demonstration of how to build a complex real time machine learning visualization tool.

Programming Languages

python

139335 projects - #7 most used programming language

scala

5932 projects

shell

77523 projects

HTML

75241 projects

Projects that are alternatives of or similar to spark-streaming-visualize

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+1443.75%)

Mutual labels: apache-spark, streaming-data

Awesome Kafka

A list about Apache Kafka

Stars: ✭ 397 (+2381.25%)

Mutual labels: apache-spark, streaming-data

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+256.25%)

Mutual labels: apache-spark

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (+325%)

Mutual labels: apache-spark

MQL5-JSON-API

Metaquotes MQL5 - JSON - API

Stars: ✭ 183 (+1043.75%)

Mutual labels: zeromq

re-gent

A Distributed Clojure agent for running remote functions

Stars: ✭ 18 (+12.5%)

Mutual labels: zeromq

icicle

Icicle Streaming Query Language

Stars: ✭ 16 (+0%)

Mutual labels: streaming-data

spark-sql-internals

The Internals of Spark SQL

Stars: ✭ 331 (+1968.75%)

Mutual labels: apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-18.75%)

Mutual labels: apache-spark

ZMQ.jl

Julia interface to ZMQ

Stars: ✭ 114 (+612.5%)

Mutual labels: zeromq

OpenLogReplicator

Open Source Oracle database CDC written purely in C++. Reads transactions directly from database redo log files and streams in JSON or Protobuf format to: Kafka, RocketMQ, flat file, network stream (plain TCP/IP or ZeroMQ)

Stars: ✭ 112 (+600%)

Mutual labels: zeromq

spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Stars: ✭ 39 (+143.75%)

Mutual labels: apache-spark

zerorpc-dotnet

A .NET implementation of ZeroRPC

Stars: ✭ 21 (+31.25%)

Mutual labels: zeromq

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+618.75%)

Mutual labels: apache-spark

net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Stars: ✭ 72 (+350%)

Mutual labels: apache-spark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (+143.75%)

Mutual labels: apache-spark

richflow

A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.

Stars: ✭ 17 (+6.25%)

Mutual labels: streaming-data

spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.

Stars: ✭ 129 (+706.25%)

Mutual labels: apache-spark

transit

Massively real-time city transit streaming application

Stars: ✭ 20 (+25%)

Mutual labels: streaming-data

pravega-samples

Sample Applications for Pravega.

Stars: ✭ 43 (+168.75%)

Mutual labels: streaming-data

View All Similar Projects ➔

"Real-time predictive analytics" has emerged as a topic of growing interest in the data science community. One factor contributing to the appeal of statistical learning methods based on live streaming data is the ability to generate models that react and adapt themselves to non-stationary data distribution in real time (as opposed to batch processing that needs to retrain models periodically).

While numerous implementations of online machine learning algorithms are publicly available, it it is not always easy to find candid demonstrations of how to incorporate them into a lightweight real time visualization platform. Among its many applications one can see how such a tool would allow not only a deeper insight into the dynamics of the models as well as the possibility of being quickly alerted when models start to misbehave.

The purpose of this project is to provide a simple demonstration of how one may "hack" together such a flow of data. Obviously, one should regard this as basic toy tutorial "do it yourself" in order to get started rather than a complete real world implementation.

For the sake of simplicity, we prepare a synthetic data set consisting of random points (y, x1, x2) which approximately satisfy the following linear relationship y = c1 x1 + c2 x2 + noise where the coefficients (c1, c2) and the intensity of the noise serve as control parameters.

Adopting supervised learning terminology, one may refer to y as a label and to each instance of (x1, x2) as a feature vector. Naturally, the objective then becomes to uncover the values of the coefficients (c1, c2) given the feature vectors and their labels.
In order to mimic streaming data, one can generate batches of feature vectors and labels (60 at a time in our case) and save them as new HDFS files every second or so in a directory that the spark streaming application uses a input source.

(You can do this by running the bash script dataStreamer.sh directly from the command line.)

Every time a new batch of data is produced, the spark application applies a least squares minimizer (StreamingLinearRegressionWithSGD in our case) which updates the regression coefficients (c1, c2).

(You can do this by running linearPublisher.scala directly from your IDE for simplicity)

Of course, in a real world scenario, generating real time labels would probably have its own intrinsic ambiguities depending on the particular business you happen to be operating in. Furthermore, the underlying data would not be from a simple bash script but would come from more sophisticated destinations such as IoT devices, financial / weather / social network updates....

The final step consists in providing a real time visualization of the model and of its history. This can be accomplished through the publish-subscribe messaging pattern by using ZeroMQ. In our case, the spark streaming application acts as the publisher in order to communicates via a TCP socket with a HTTP web server which acts as the subscriber and prepares a visual rendering of the dynamics of the model (localhost:5556).

(For this, you'll need to have started the flask server by running flaskSubscriber.py)

The illustration provides a cartoon summary of the flow of data described above:

Disclaimer:

As claimed in the beginning, this project is intended to be a demonstration / tutorial showing that a complex visualization system requiring the wiring together of many disparate technologies can be accomplished quite simply in a few lines of code. As such, no special care has been given to "portability" or "professionalism". Rather, the whole enterprise should be considered as a "hack" that may (hopefully) be a source of inspiration for others.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Ranlot / spark-streaming-visualize

Programming Languages

Labels

Projects that are alternatives of or similar to spark-streaming-visualize