All Projects → mozilla → probe-scraper

mozilla / probe-scraper

Licence: MPL-2.0 license
Scrape and publish Telemetry probe data from Firefox

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to probe-scraper

Applicationinsights Dotnet Server
Microsoft Application Insights for .NET Web Applications
Stars: ✭ 130 (+622.22%)
Mutual labels:  telemetry
Winslap
Swiftly configure a fresh Windows 10 installation with useful tweaks and antispy settings.
Stars: ✭ 175 (+872.22%)
Mutual labels:  telemetry
Wifibroadcast
Transmitter and receiver of UDP packets using raw WiFi radio
Stars: ✭ 247 (+1272.22%)
Mutual labels:  telemetry
Gps Overlay On Video
Telemetry (GPS) data overlay on videos
Stars: ✭ 136 (+655.56%)
Mutual labels:  telemetry
Socialblocklists
Blocklists to block the communication to social networking sites and privacy harming services
Stars: ✭ 161 (+794.44%)
Mutual labels:  telemetry
Luatelemetry
FrSky SmartPort(S.Port), D-series, F.Port and TBS Crossfire telemetry on all Taranis and Horus transmitters
Stars: ✭ 206 (+1044.44%)
Mutual labels:  telemetry
Telemetry
Data visualization and communication with embedded devices
Stars: ✭ 116 (+544.44%)
Mutual labels:  telemetry
F1-demo
Real-time vehicle telematics analytics demo using OmniSci
Stars: ✭ 27 (+50%)
Mutual labels:  telemetry
Windows
💎 tweaks & fixes for windows 10 - mostly powershell
Stars: ✭ 169 (+838.89%)
Mutual labels:  telemetry
Nvidia Modded Inf
Modified nVidia .inf files to run drivers on all video cards, research & telemetry free drivers
Stars: ✭ 227 (+1161.11%)
Mutual labels:  telemetry
Telemetry metrics
Collect and aggregate Telemetry events over time
Stars: ✭ 144 (+700%)
Mutual labels:  telemetry
Prom ex
An Elixir Prometheus metrics collection library built on top of Telemetry with accompanying Grafana dashboards
Stars: ✭ 149 (+727.78%)
Mutual labels:  telemetry
Windowsspyblocker
WindowsSpyBlocker 🛡️ is an application written in Go and delivered as a single executable to block spying and tracking on Windows systems.
Stars: ✭ 2,913 (+16083.33%)
Mutual labels:  telemetry
Spacextract
Extraction and analysis of telemetry from rocket launch webcasts (from SpaceX and RocketLab)
Stars: ✭ 131 (+627.78%)
Mutual labels:  telemetry
iopipe-python
Python agent for AWS Lambda metrics, tracing, profiling & analytics
Stars: ✭ 77 (+327.78%)
Mutual labels:  telemetry
Anx
Advanced NETCONF Explorer: Graphical Explorer for NETCONF / YANG and GNMI/GRPC Telemetry & Java NETCONF 1.1 client library
Stars: ✭ 118 (+555.56%)
Mutual labels:  telemetry
Gopro Utils
Tools to parse metadata from GoPro Hero 5 & 6 cameras
Stars: ✭ 191 (+961.11%)
Mutual labels:  telemetry
ets2-mobile-route-advisor
ETS2 / ATS's Route Advisor, for mobile devices
Stars: ✭ 119 (+561.11%)
Mutual labels:  telemetry
f1-telemetry-client
A Node UDP client and telemetry parser for Codemaster's Formula 1 series of games
Stars: ✭ 128 (+611.11%)
Mutual labels:  telemetry
Applicationinsights Home
Application Insights main repository for documentation of overall SDK offerings for all platforms.
Stars: ✭ 221 (+1127.78%)
Mutual labels:  telemetry

probe-scraper

Scrape Telemetry probe data from Firefox repositories.

This extracts per-version Telemetry probe data for Firefox and other Mozilla products from registry files like Histograms.json and Scalars.yaml. The data allows answering questions like "which Firefox versions is this Telemetry probe in anyway?". Also, probes outside of Histograms.json - like the CSS use counters - are included in the output data.

The data is pulled from two different sources:

Probe Scraper outputs JSON to https://probeinfo.telemetry.mozilla.org. Effectively, this creates a REST API which can be used by downstream tools like mozilla-schema-generator and various data dictionary type applications (see below).

An OpenAPI reference to this API is available:

probeinfo API docs

A web tool to explore the Firefox-related data is available at probes.telemetry.mozilla.org. A project to develop a similar view for Glean-based data is under development in the Glean Dictionary.

Adding a New Glean Repository

To scrape a git repository for probe definitions, an entry needs to be added in repositories.yaml. The exact format of the entry depends on whether you are adding an application or a library. See below for details.

Adding an application

For a given application, Glean metrics are emitted by the application itself, any libraries it uses that also use Glean, as well as the Glean library proper. Therefore, probe scraper needs a way to find all of the dependencies to determine all of the metrics emitted by that application.

Therefore, each application should specify a dependencies parameter, which is a list of Glean-using libraries used by the application. Each entry should be a library name as specified by the library's library_names parameter.

For Android applications, if you're not sure what the dependencies of the application are, you can run the following command at the root of the project folder:

$ ./gradlew :app:dependencies

See the full application schema documentation for descriptions of all the available parameters.

Adding a library

Probe scraper also needs a way to map dependencies back to an entry in the repositories.yaml file. Therefore, any libraries defined should also include their build-system-specific library names in the library_names parameter.

See the full library schema documentation for descriptions of all the available parameters.

Developing the probe-scraper

You can choose to develop using the container, or locally. Using the container will be slower, since changes will trigger a rebuild of the container. But using the container method will ensure that your PR passes CircleCI build/test phases.

Local development

You may wish to, instead of installing all these requirements in your global Python environment, start by generating and activating a Python virtual environment. The .gitignore expects it to be called ENV or venv:

python -m venv venv
. venv/bin/activate

Install the requirements:

pip install -r requirements.txt
pip install -r test_requirements.txt
python setup.py develop

Run tests. This by default does not run tests that require a web connection:

pytest tests/

To run all tests, including those that require a web connection:

pytest tests/ --run-web-tests

To test whether the code conforms to the style rules, you can run:

python -m black --check probe_scraper tests ./*.py
flake8 --max-line-length 100 probe_scraper tests ./*.py
yamllint repositories.yaml .circleci
python -m isort --profile black --check-only probe_scraper tests ./*.py

To render API documentation locally to index.html:

make apidoc

Developing using the container

Run tests in container. This does not run tests that require a web connection:

export COMMAND='pytest tests/'
make run

To run all tests, including those that require a web connection:

make test

To test whether the code conforms to the style rules, you can run:

make lint

Tests with Web Dependencies

Any tests that require a web connection to run should be marked with @pytest.mark.web_dependency.

These will not run by default, but will run on CI.

Performing a Dry-Run

Before opening a PR, it's good to test the code you wrote on the production data. You can specify a specific Firefox version to run on by using first-version:

export COMMAND='python -m probe_scraper.runner --firefox-version 65 --dry-run'
make run

or locally via:

python -m probe_scraper.runner --firefox-version 65 --dry-run

Including --dry-run means emails will not be sent.

Additionally, you can test just on Glean repositories:

export COMMAND='python -m probe_scraper.runner --glean --dry-run'
make run

By default that will test against every Glean repository, which might take a while. If you want to test against just one (e.g. a new repository you're adding), you can use the --glean-repo argument to just test the repositories you care about:

export COMMAND='python -m probe_scraper.runner --glean --glean-repo glean-core --glean-repo glean-android --glean-repo burnham --dry-run'
make run

Replace burnham in the example above with your repository and its dependencies.

You can also do the dry-run locally:

python -m probe_scraper.runner --glean --glean-repo glean-core --glean-repo glean-android --glean-repo burnham --dry-run

Module overview

The module is built around the following data flow:

  • scrape registry files from mozilla-central, clone files from repositories directory
  • extract probe data from the files
  • transform probe data into output formats
  • save to disk

The code layout consists mainly of:

  • probe_scraper
    • runner.py - the central script, ties the other pieces together
    • scrapers
      • buildhub.py - pull build info from the BuildHub service
      • moz_central_scraper.py - loads probe registry files for multiple versions from mozilla-central
      • git_scraper.py - loads probe registry files from a git repository (no version or channel support yet, just per-commit)
    • parsers/ - extract probe data from the registry files
    • transform_*.py - transform the extracted raw data into output formats
  • tests/ - the unit tests

Accessing the data files

The processed probe data is serialized to the disk in a directory hierarchy starting from the provided output directory. The directory layout resembles a REST-friendly structure.

|-- product
    |-- general
    |-- revisions
    |-- channel (or "all")
        |-- ping type
            |-- probe type (or "all_probes")

For example, all the JSON probe data in the main ping for the Firefox Nightly channel can be accessed with the following path: firefox/nightly/main/all_probes. The probe data for all the channels (same product and ping) can be accessed instead using firefox/all/main/all_probes.

The root directory for the output generated from the scheduled job can be found at https://probeinfo.telemetry.mozilla.org/. All the probe data for Firefox coming from the main ping can be found at https://probeinfo.telemetry.mozilla.org/firefox/all/main/all_probes.

Accessing Glean metrics data

Glean data is generally laid out as follows:

| -- glean
    | -- repositories
    | -- general
    | -- repository-name
        | -- general
        | -- metrics

For example, the data for a repository called fenix would be found at /glean/fenix/metrics. The time the data was last updated for that project can be found at glean/fenix/general.

A list of available repositories is at /glean/repositories.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].