All Projects → yahoo → Streaming Benchmarks

yahoo / Streaming Benchmarks

Licence: apache-2.0
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Streaming Benchmarks

Ovenmediaengine
OvenMediaEngine (OME) is a streaming engine for real-time live broadcasting with sub-second latency.
Stars: ✭ 760 (+43.4%)
Mutual labels:  streaming, low-latency
Srs
SRS is a simple, high efficiency and realtime video server, supports RTMP, WebRTC, HLS, HTTP-FLV, SRT and GB28181.
Stars: ✭ 16,734 (+3057.36%)
Mutual labels:  low-latency, streaming
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (-23.02%)
Mutual labels:  streaming
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-3.21%)
Mutual labels:  streaming
Butter Desktop
All the free parts of Popcorn Time
Stars: ✭ 4,329 (+716.79%)
Mutual labels:  streaming
Real Time Stock Market Prediction
In this repository, I have developed the entire server-side principal architecture for real-time stock market prediction with Machine Learning. I have used Tensorflow.js for constructing ml model architecture, and Kafka for real-time data streaming and pipelining.
Stars: ✭ 414 (-21.89%)
Mutual labels:  streaming
Goffmpeg
FFMPEG wrapper written in GO
Stars: ✭ 469 (-11.51%)
Mutual labels:  streaming
Parsec Sdk
Low latency, peer-to-peer, interactive game streaming.
Stars: ✭ 400 (-24.53%)
Mutual labels:  low-latency
Swell
Swell: API development tool that enables developers to test endpoints served over streaming technologies including Server-Sent Events (SSE), WebSockets, HTTP2, GraphQL, and gRPC.
Stars: ✭ 517 (-2.45%)
Mutual labels:  streaming
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+779.62%)
Mutual labels:  low-latency
Pearplayer.js
支持WebRTC的多源多协议混合P2P-CDN的流媒体播放器
Stars: ✭ 512 (-3.4%)
Mutual labels:  streaming
Jocko
Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)
Stars: ✭ 4,445 (+738.68%)
Mutual labels:  streaming
Fastbinaryencoding
Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby, Swift
Stars: ✭ 421 (-20.57%)
Mutual labels:  low-latency
Scalecube Services
ScaleCube Services is a high throughput, low latency reactive microservices library built to scale. it features: API-Gateways, service-discovery, service-load-balancing, the architecture supports plug-and-play service communication modules and features. built to provide performance and low-latency real-time stream-processing. its open and designed to accommodate changes. (no sidecar in a form of broker or any kind)
Stars: ✭ 482 (-9.06%)
Mutual labels:  low-latency
Livego
go Implementation of live streaming services
Stars: ✭ 411 (-22.45%)
Mutual labels:  streaming
Beam
Apache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+871.51%)
Mutual labels:  streaming
Ramcloud
**No Longer Maintained** Official RAMCloud repo
Stars: ✭ 405 (-23.58%)
Mutual labels:  low-latency
Io
Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Stars: ✭ 427 (-19.43%)
Mutual labels:  streaming
Argus
Time series monitoring and alerting platform.
Stars: ✭ 468 (-11.7%)
Mutual labels:  low-latency
N3.js
Lightning fast, spec-compatible, streaming RDF for JavaScript
Stars: ✭ 521 (-1.7%)
Mutual labels:  streaming

Yahoo Streaming Benchmarks

Code licensed under the Apache 2.0 license. See LICENSE file for terms.

Background

At Yahoo we have adopted Apache Storm as our stream processing platform of choice. But that was in 2012 and the landscape has changed significantly since then. Because of this we really want to know what Storm is good at, where it needs to be improved compared to other systems, and what its limitations are compared to other tools so we can recommend the best tool for the job to our customers. To do this we started to look for stream processing benchmarks that we could use to do this evaluation, but all of them ended up lacking in several fundamental areas. Primarily they did not test anything close to a read world use case, so we decided to write a simple one. This is the first round of these tests. The tool here is not polished and only covers three tools and one specific use case. We hope to expand this in the future in terms of the tools tested, the variety of processing tested, and the metrics gathered.

Setup

We provide a script stream-bench.sh to setup and run the tests on a single node, and to act as an example of what to do when running the tests on a multi-node system. Also, you need to have leiningen installed on your machines before you start the tests (e.g., on Mac OS, you can install by "brew install leiningen").

It takes a list of operations to perform, and options are passed into the script through environment variables. The most significant of these are

Operations

  • SETUP - download dependencies (Storm, Spark, Flink, Redis, and Kafka) cleans out any temp files and compiles everything
  • STORM_TEST - Run the test using Storm on a single node
  • SPARK_TEST - Run the test using Spark on a single node
  • FLINK_TEST - Run the test using Flink on a single node
  • APEX_TEST - Run the test using Apex on a single node
  • STOP_ALL - If something goes wrong stop all processes that were launched for the test.

Environment Variables

  • STORM_VERSION - the version of Storm to compile and run against (default 0.10.0)
  • SPARK_VERSION - the version of Spark to compile and run against (default 1.5.1)
  • FLINK_VERSION - the version of Flink to compile and run against (default 0.10.1)
  • APEX_VERSION - the version of Apex to compile and run against (default 3.4.0)
  • LOAD - the number of messages per second to send to be processed (default 1000)
  • TEST_TIME - the number of seconds to run the test for (default 240)
  • LEIN - the location of the lein executable (default lein)

The Test

The initial test is a simple advertising use case.

Ad events arrive through kafka in a JSON format. They are parsed to a more usable format, filtered for the ad view events that this processing cares about, the unneeded fields are removed, and then new fields are added by joining the event with campaign data stored in Redis. Finally the ad views are aggregated by campaign and by time window and stored back into redis, along with a timestamp to indicate when they are updated.

Results

The current set of results that we care about are comparing the latency that a particular processing system can produce at a given input load. The result of running a test creates a few files data/seen.txt and data/updated.txt data/seen.txt contains the counts of events for different campaigns and time windows. data/updated.txt is the latency in ms from when the last event was emitted to kafka for that particular campaign window and when it was written into Redis.

References

Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, Paul Poulosky. "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming. " First Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware. IEEE, 2016.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].