All Projects → mozilla → telemetry-streaming

mozilla / telemetry-streaming

Licence: other
Spark Streaming ETL jobs for Mozilla Telemetry

Programming Languages

scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to telemetry-streaming

Xboxkeyboardmouse
Keyboard and mouse for Xbox One streaming on Windows 10
Stars: ✭ 235 (+1368.75%)
Mutual labels:  streaming
Psi
Platform for Situated Intelligence
Stars: ✭ 249 (+1456.25%)
Mutual labels:  streaming
continuous-analytics-examples
A collection of examples of continuous analytics.
Stars: ✭ 17 (+6.25%)
Mutual labels:  streaming
Streamlabs Obs
Free and open source streaming software built on OBS and Electron.
Stars: ✭ 3,473 (+21606.25%)
Mutual labels:  streaming
Waveline Server
Simple self-hosted music streaming server
Stars: ✭ 248 (+1450%)
Mutual labels:  streaming
PHPench
Realtime benchmarks for PHP code
Stars: ✭ 53 (+231.25%)
Mutual labels:  realtime-metrics
Tributary
Streaming reactive and dataflow graphs in Python
Stars: ✭ 231 (+1343.75%)
Mutual labels:  streaming
matrixone
Hyperconverged cloud-edge native database
Stars: ✭ 1,057 (+6506.25%)
Mutual labels:  streaming
Pulsar Client Go
Apache Pulsar Go Client Library
Stars: ✭ 251 (+1468.75%)
Mutual labels:  streaming
wow-spark
🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Stars: ✭ 20 (+25%)
Mutual labels:  structured-streaming
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+1443.75%)
Mutual labels:  streaming
Betfair
betfairlightweight - python wrapper for Betfair API-NG (with streaming)
Stars: ✭ 246 (+1437.5%)
Mutual labels:  streaming
spark-gdelt
Binding the GDELT universe in a Spark environment
Stars: ✭ 20 (+25%)
Mutual labels:  structured-streaming
Specification
OpenMessaging Specification
Stars: ✭ 242 (+1412.5%)
Mutual labels:  streaming
WWDChrome
Chrome extension which lets you watch WWDC Developer Videos in Google Chrome (thus not having to use Safari)
Stars: ✭ 18 (+12.5%)
Mutual labels:  streaming
Azure Event Hubs
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+1356.25%)
Mutual labels:  streaming
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (+12.5%)
Mutual labels:  structured-streaming
qubes-video-companion
Securely stream webcams and share screens across virtual machines *THIS PROJECT IS CURRENTLY STILL IN DEVELOPMENT (Mostly finishing switch to MJPEG for big performance boost; see FAQ)*
Stars: ✭ 38 (+137.5%)
Mutual labels:  streaming
uvc-streamer
MJPEG webcam network streamer for linux
Stars: ✭ 25 (+56.25%)
Mutual labels:  streaming
open-stream-processing-benchmark
This repository contains the code base for the Open Stream Processing Benchmark.
Stars: ✭ 37 (+131.25%)
Mutual labels:  structured-streaming

Build Status codecov.io

This repository is no longer in use at Mozilla! It was designed to be run on our AWS-based telemetry infrastructure

telemetry-streaming

Spark Streaming ETL jobs for Mozilla Telemetry

This service currently contains jobs that aggregate error data on 5 minute intervals. It is responsible for generating the (internal only) error_aggregates and experiment_error_aggregates parquet tables at Mozilla.

Issue Tracking

Please file bugs related to the error aggregates streaming job in the Datasets: Error Aggregates component.

Deployment

The jobs defined in this repository are generally deployed as streaming jobs within our hosted Databricks account, but some are deployed as periodic batch jobs via Airflow using wrappers codified in telemetry-airflow that spin up EMR clusters whose configuration is governed by emr-bootstrap-spark. Changes in production behavior that don't seem to correspond to changes in this repository's code could be related to changes in those other projects.

Amplitude Event Configuration

Some of the jobs defined in telemetry-streaming exist to transform telemetry events and republish to Amplitude for further analysis. Filtering and transforming events is accomplished via JSON configurations. If you're creating or updating such a schema, see:

Development

The recommended workflow for running tests is to use your favorite editor for editing the source code and running the tests via sbt. Some common invocations for sbt:

  • sbt test # run the basic set of tests (good enough for most purposes)
  • sbt "testOnly *ErrorAgg*" # run the tests only for packages matching ErrorAgg
  • sbt "testOnly *ErrorAgg* -- -z version" # run the tests only for packages matching ErrorAgg, limited to test cases with "version" in them
  • sbt dockerComposeTest # run the docker compose tests (slow)
  • sbt "dockerComposeTest -tags:DockerComposeTag" # run only tests with DockerComposeTag (while using docker)
  • sbt scalastyle test:scalastyle # run linter
  • sbt ci # run the full set of continuous integration tests

Some tests need Kafka to run. If one prefers to run them via IDE, it's required to run the test cluster:

sbt dockerComposeUp

or via plain docker-compose:

export DOCKER_KAFKA_HOST=$(./docker_setup.sh)
docker-compose -f docker/docker-compose.yml up

It's also good to shut down the cluster afterwards:

sbt dockerComposeStop
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].