All Projects → adblockradio → Adblockradio

adblockradio / Adblockradio

Licence: mpl-2.0
An adblocker for live radio streams and podcasts. Machine learning meets Shazam.

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Adblockradio

Ai Study
人工智能学习资料超全整理,包含机器学习基础ML、深度学习基础DL、计算机视觉CV、自然语言处理NLP、推荐系统、语音识别、图神经网路、算法工程师面试题
Stars: ✭ 93 (-93.39%)
Mutual labels:  ai
Objectron
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Stars: ✭ 1,352 (-3.91%)
Mutual labels:  ai
Xspotify
A modified Spotify client with DRM bypass.
Stars: ✭ 1,376 (-2.2%)
Mutual labels:  adblock
Tensor Safe
A Haskell framework to define valid deep learning models and export them to other frameworks like TensorFlow JS or Keras.
Stars: ✭ 96 (-93.18%)
Mutual labels:  ai
Ikbt
A python package to solve robot arm inverse kinematics in symbolic form
Stars: ✭ 97 (-93.11%)
Mutual labels:  ai
Text predictor
Char-level RNN LSTM text generator📄.
Stars: ✭ 99 (-92.96%)
Mutual labels:  ai
Webots
Webots Robot Simulator
Stars: ✭ 1,324 (-5.9%)
Mutual labels:  ai
Nuitrack Sdk
Nuitrack™ is a 3D tracking middleware developed by 3DiVi Inc.
Stars: ✭ 103 (-92.68%)
Mutual labels:  ai
Helix theory
螺旋论(theory of helix)—— “熵减机理论(可用来构建AGI、复杂性系统等)”
Stars: ✭ 98 (-93.03%)
Mutual labels:  ai
Opencage
A modding toolkit for Alien: Isolation that covers a wide range of game content and configurations.
Stars: ✭ 98 (-93.03%)
Mutual labels:  ai
Frigate
NVR with realtime local object detection for IP cameras
Stars: ✭ 1,329 (-5.54%)
Mutual labels:  ai
Happy Transformer
A package built on top of Hugging Face's transformer library that makes it easy to utilize state-of-the-art NLP models
Stars: ✭ 97 (-93.11%)
Mutual labels:  ai
Monkeys
A strongly-typed genetic programming framework for Python
Stars: ✭ 98 (-93.03%)
Mutual labels:  ai
Toeicbert
TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.
Stars: ✭ 95 (-93.25%)
Mutual labels:  ai
Lexiconner
Lexicon-based Named Entity Recognition
Stars: ✭ 102 (-92.75%)
Mutual labels:  ai
Msnoise
A Python Package for Monitoring Seismic Velocity Changes using Ambient Seismic Noise | http://www.msnoise.org
Stars: ✭ 94 (-93.32%)
Mutual labels:  signal-processing
Atlasnetv2
This repository contains the source codes for the paper AtlasNet V2 - Learning Elementary Structures.
Stars: ✭ 99 (-92.96%)
Mutual labels:  ai
Intro To Deep Learning
A collection of materials to help you learn about deep learning
Stars: ✭ 103 (-92.68%)
Mutual labels:  ai
Adblock Iran
Ad blocking rules for websites
Stars: ✭ 103 (-92.68%)
Mutual labels:  adblock
Dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Stars: ✭ 9,681 (+588.06%)
Mutual labels:  ai

Adblock Radio (project archived)

Adblock Radio

A library to block ads on live radio streams and podcasts. Machine learning meets Shazam.

Engine of AdblockRadio.com. Demo standalone player available here.

Build status: CircleCI

Help the project grow: Donate using Liberapay

Overview

A technical discussion is available here.

Radio streams are downloaded in predictor.js with the module adblockradio/stream-tireless-baler. Podcasts are downloaded in predictor-file.js.

In both cases, audio is then decoded to single-channel, 22050 Hz PCM with ffmpeg.

Chunks of about one second of PCM audio are piped into two sub-modules:

  • a time-frequency analyser (predictor-ml/ml.js), that analyses spectral content with a neural network.
  • a fingerprint matcher (predictor-db/hotlist.js), that searches for exact occurrences of known ads, musics or jingles.

In post-processing.js, results are gathered for each audio segment and cleaned.

A Readable interface, Analyser, is exposed to the end user. It streams objects containing the audio itself and all analysis results.

On a regular laptop CPU and with the Python time-frequency analyser, computations run at 5-10X for files and at 10-20% usage for live stream.

Getting started

Installation

Mandatory prerequisites:

You need Node.js (>= v10.12.x, but < 11) and NPM. Download it here. Pro-tip: to manage several node versions on your platform, use NVM.

On Debian Stretch:

apt-get install -y git ssh tar gzip ca-certificates build-essential sqlite3 ffmpeg

Note: works on Jessie, but installing ffmpeg is a bit painful. See here and there.

Optional prerequisites:

For best performance (~2x speedup) you should choose to do part of the computations with Python. Additional prerequisites are the following: Python (tested with v2.7.9), Keras (tested with v2.0.8) and Tensorflow (tested with CPU v1.4.0 and GPU v1.3.0).

On Debian:

apt-get install python-dev portaudio19-dev
pip install python_speech_features h5py numpy scipy keras tensorflow zerorpc sounddevice psutil

Note: if you do not have pip follow these instructions to install it.

Then install this module:
git clone https://github.com/adblockradio/adblockradio.git
cd adblockradio
npm install

Testing

Validate your installation with the test suite:

npm test

Command-line demo

At startup and periodically during runtime, filter configuration files are automatically updated from adblockradio.com/models/:

  • a compatible machine-learning model (model.keras or model.json + group1-shard1of1), for the time-frequency analyser.
  • a fingerprint database (hotlist.sqlite), for the fingerprint matcher.

Live stream analysis

Run the demo on French RTL live radio stream:

node demo.js

Here is a sample output of the demo script, showing an ad detected:

{
	"gain": 74.63,
	"ml": {
		"class": "0-ads",
		"softmaxraw": [
			0.996,
			0.004,
			0
		],
		"softmax": [
			0.941,
			0.02,
			0.039
		],
		"slotsFuture": 4,
		"slotsPast": 5
	},
	"hotlist": {
		"class": "9-unsure",
		"file": null,
		"matches": 1,
		"total": 7
	},
	"class": "0-ads",
	"metadata": {
		"artist": "Laurent Ruquier",
		"title": "L'été des Grosses Têtes",
		"cover": "https://cdn-media.rtl.fr/cache/wQofzw9SfgHNHF1rqJA3lQ/60v73-2/online/image/2014/0807/7773631957_laurent-ruquier.jpg"
	},
	"streamInfo": {
		"url": "http://streaming.radio.rtl.fr/rtl-1-44-128",
		"favicon": "https://cdn-static.rtl.fr/versions/www/6.0.637/img/apple-touch-icon.png",
		"homepage": "http://www.rtl.fr/",
		"audioExt": "mp3"
	},
	"predictorStartTime": 1531150137583,
	"playTime": 1531150155250,
	"tBuffer": 15.98,
	"audio": ...
}

Podcast analysis

It is also possible to analyse radio recordings. Run the demo on a recording of French RTL radio, including ads, talk and music:

node demo-file.js

Gradual outputs are similar to those of live stream analysis. An additional post-processing specific to recordings hides the uncertainties in predictions and shows big chunks for each class, with time stamps in milliseconds, making it ready for slicing.

[
	{
		"class": "1-speech",
		"tStart": 0,
		"tEnd": 58500
	},
	{
		"class": "0-ads",
		"tStart": 58500,
		"tEnd": 125500
	},
	{
		"class": "1-speech",
		"tStart": 125500,
		"tEnd": 218000
	},
	{
		"class": "2-music",
		"tStart": 218000,
		"tEnd": 250500
	},
	{
		"class": "1-speech",
		"tStart": 250500,
		"tEnd": 472949
	}
]

Note that when analyzing audio files, you still need to provide the name of a radio stream, because the algorithm has to load acoustic parameters and DB of known samples. Analysis of podcasts not tied to a radio is not yet supported, but may possibly be in the future.

Documentation

Usage

Below is a simple usage example. More thorough usage examples are available in the tests:

  • file/podcast analysis: test/file.js
  • live stream analysis: test/online.js
  • record a live stream, analyse it later: test/offline.js
const { Analyser } = require("adblockradio");

const abr = new Analyser({
	country: "France",
	name: "RTL",
	config: {
		...
	}
});

abr.on("data", function(obj) {
	...
});
Property Description Default
country Country of the radio stream according to radio-browser.info None
name Name of the radio stream according to radio-browser.info None
file File to analyse (optional, analyse the live stream otherwise) None

Methods

Acoustic model and hotlist files are refreshed automatically on startup. If you plan to continuously run the algo for a long time, you can trigger manual updates. Note those methods are only available in live stream analysis mode.

Method Parameters Description
refreshPredictorMl None Manually refresh the ML model (live stream only)
refreshPredictorHotlist None Manually refresh the hotlist DB (live stream only)
refreshMetadata None Manually refresh the metadata scraper (live stream only)
stopDl None Stop Adblock Radio (live stream only)

Optional configuration

Properties marked with a * are meant to be used only with live radio stream analysis, not file analysis where they are ignored.

Scheduling

Property Description Default
predInterval Send stream status to listener every N seconds 1
saveDuration* If enabled, save audio file and metadata every N predInterval times 10
modelUpdatesInterval If enabled, update model files every N minutes 60

Switches

Property Description Periodicity Default
enablePredictorMl Perform machine learning inference predInterval true
JSPredictorMl Use tfjs instead of Python for ML inference (slower) false
enablePredictorHotlist Compute audio fingerprints and search them in a DB predInterval true
saveAudio* Save stream audio data in segments on hard drive saveDuration true
saveMetadata Save a JSON with predictions saveDuration true
fetchMetadata* Gather metadata from radio websites saveDuration true
modelUpdates Keep ML and hotlist files up to date modelUpdatesInterval true

Paths

Property Description Default
modelPath Directory where ML models and hotlist DBs are stored process.cwd() + '/model'
modelFile Path of ML file relative to modelPath country + '_' + name + '/model.keras'
hotlistFile Path of the hotlist DB relative to modelPath country + '_' + name + '/hotlist.sqlite'
saveAudioPath* Root folder where audio and metadata are saved process.cwd() + '/records'

Output

Readable streams constructed with Analyser emit objects with the following properties. Some properties are only available when doing live radio analysis. They are marked with a *. Other specific to file analysis are marked with **.

  • audio*: Buffer containing a chunk of original (compressed) audio data.

  • ml: null if not available, otherwise an object containing the results of the time-frequency analyser

    • softmaxraw: an array of three numbers representing the softmax between ads, speech and music.
    • softmax: same as softmaxraw, but smoothed in time with slotsFuture data points in the future and slotsPast data points in the past. Smoothing weights are defined by consts.MOV_AVG_WEIGHTS in post-processing.js.
    • class: either 0-ads, 1-speech, 2-music or 9-unsure. The classification according to softmax.
  • hotlist: null if not available, otherwise an object containing the results of the fingerprint matcher.

    • file: if class is not "9-unsure", the reference of the file recognized.
    • total: number of fingerprints computed for the given audio segment.
    • matches: number of matching fingerprints between the audio segment and the fingerprint database.
    • class: either 0-ads, 1-speech, 2-music, 3-jingles or 9-unsure if not enough matches have been found.
  • class: final prediction of the algorithm. Either 0-ads, 1-speech, 2-music, 3-jingles or 9-unsure.

  • metadata*: live metadata, fetched and parsed by the module adblockradio/webradio-metadata.

  • streamInfo*: static metadata about the stream. Contains stream url, favicon, bitrate in bytes / s, audio files extension audioExt (mp3 or aac) and homepage URL.

  • gain: a dB value representing the average volume of the stream. Useful if you wish to normalize the playback volume. Calculated by mlpredict.py.

  • tBuffer*: seconds of audio buffer. Calculated by adblockradio/stream-tireless-baler.

  • predictorStartTime*: timestamp of the algorithm startup. Useful to get the uptime.

  • playTime*: approximate timestamp of when the given audio is to be played. TODO check this.

  • tStart**: lower boundary of the time interval linked with the prediction (in milliseconds)

  • tEnd**: upper boundary of the time interval linked with the prediction (in milliseconds)

Supported radios

The list of supported radios is available here.

Note to developers

Integrations of this module are welcome. Suggestions are available here.

A standalone demo player for web browsers is available here.

License

See LICENSE file.

Your contribution to this project is welcome, but might be subject to a contributor's license agreement.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].