All Projects → iwatake2222 → pico-loud_talking_detector

iwatake2222 / pico-loud_talking_detector

Licence: Apache-2.0 license
A tinyML system using a Raspberry Pi Pico and TensorFlow Lite for Microcontrollers to detect loud talking. It can be utilized to encourage people to eat quietly to prevent the spread of the coronavirus and help in the fight against COVID

Programming Languages

Jupyter Notebook
11667 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to pico-loud talking detector

TPU-MobilenetSSD
Edge TPU Accelerator / Multi-TPU + MobileNet-SSD v2 + Python + Async + LattePandaAlpha/RaspberryPi3/LaptopPC
Stars: ✭ 82 (+382.35%)
Mutual labels:  raspberrypi, tensorflowlite
CurrentSense-TinyML
Spying on Microcontrollers using Current Sensing and embedded TinyML models
Stars: ✭ 71 (+317.65%)
Mutual labels:  tensorflowlite, tinyml
wiringpi-tft-tool
TFT Command Line Tool for Raspberry Pi
Stars: ✭ 35 (+105.88%)
Mutual labels:  raspberrypi
rust-crosscompiler-arm
Docker images for Rust dedicated to cross compilation for ARM v6 and more
Stars: ✭ 48 (+182.35%)
Mutual labels:  raspberrypi
background-radiation-monitor
Monitor and record background radiation levels with a cheap detector and a Raspberry Pi.
Stars: ✭ 25 (+47.06%)
Mutual labels:  raspberrypi
Pigrow
Raspberry Pi Grow Box Control Software
Stars: ✭ 98 (+476.47%)
Mutual labels:  raspberrypi
genie-server
The home server version of Almond
Stars: ✭ 237 (+1294.12%)
Mutual labels:  raspberrypi
balena-plant-saver
We're building a plant monitor (and saver) - this is the early stage
Stars: ✭ 68 (+300%)
Mutual labels:  raspberrypi
buzzer music
RPI Pico / Micropython library to play music through one or more buzzers, can automatically replace chords with fast arpeggios to simulate polyphony with a single buzzer. Music can be easily taken from onlinesequencer.net
Stars: ✭ 23 (+35.29%)
Mutual labels:  raspberry-pi-pico
unity-excavator
Physical simulations on Unity
Stars: ✭ 20 (+17.65%)
Mutual labels:  tensorflowlite
blinkt
A Rust library for the Pimoroni Blinkt!, and any similar APA102 or SK9822 LED strips or boards, on a Raspberry Pi.
Stars: ✭ 18 (+5.88%)
Mutual labels:  raspberrypi
Realtek-USB-Wireless-Adapter-Drivers
Realtek USB Wireless Adapter Drivers [0bda:f179] (Kernel 4.15.x ~ 5.9.x)
Stars: ✭ 34 (+100%)
Mutual labels:  raspberrypi
comi
ComiGO:Simple, cross-platform manga reader。简单、跨平台的漫画阅读器。シンプルな漫画リーダー。
Stars: ✭ 34 (+100%)
Mutual labels:  raspberrypi
packages
PiKVM Packages
Stars: ✭ 18 (+5.88%)
Mutual labels:  raspberrypi
Poke-Pi-Dex
Our deep learning for computer vision related project for nostalgic poke weebs (Sistemi digitali, Unibo).
Stars: ✭ 18 (+5.88%)
Mutual labels:  raspberrypi
pinetime-updater
Flash firmware to PineTime the friendly wired way with OpenOCD
Stars: ✭ 53 (+211.76%)
Mutual labels:  raspberrypi
motor-hat
Node Module to control Adafruits MotorHAT for the RaspberryPi
Stars: ✭ 28 (+64.71%)
Mutual labels:  raspberrypi
WebGPIO
A simple web UI for controlling the GPIO pins on a Raspberry Pi
Stars: ✭ 69 (+305.88%)
Mutual labels:  raspberrypi
Report-IP-hourly
📬 Report Linux IP by email hourly.
Stars: ✭ 43 (+152.94%)
Mutual labels:  raspberrypi
gladys-gateway
An End-to-End Encrypted Gateway to access Gladys from the internet
Stars: ✭ 17 (+0%)
Mutual labels:  raspberrypi

Loud Talking Detector in FRISK

  • This tinyML system uses a Raspberry Pi Pico and TensorFlow Lite for Microcontrollers to detect loud talking. It can be utilized to encourage people in restaurants/cafes to eat quietly to prevent the spread of the coronavirus and help in the fight against COVID
    • It detects "talking" when people talk loudly
    • It doesn't detect "talking" when people talk quietly or the sound is not talking (e.g. noise, music, etc.)

00_doc/pic.jpg

YouTube

00_doc/youtube.jpg

System overview

  • Deep Learning Model
    • A deep learning model is created to classify 10 or 5 seconds of audio to two types of sound ("Talking", "Not Talking")
      • Change the value of CLIP_DURATION and kClipDuration to switch clip duration (10sec/5sec)
    • The model is converted to TensorFlow Lite for Microcontrollers format
    • The training runs on Google Colaboratory
  • Device
    • The model is deployed to a Raspberry Pi Pico
    • A microphone and a display are connected to the Raspberry Pi Pico
    • The Raspberry Pi Pico captures sound from the microphone, judges whether it's loud talking and outputs a result to the display

system_overview.png

Sound Category

  • Not Talking:
    • Quiet
    • Voice from a distance
    • Music
    • Noise
    • Others
  • Talking:
    • Voice
    • Voice (more than one person)
    • Voice + Music
    • Voice + Noise

type_category.png

How to make

Components

Connections

00_doc/connections.txt

How to Build

git clone https://github.com/iwatake2222/pico-loud_talking_detector.git
cd pico-loud_talking_detector
git submodule update --init
cd pico-sdk && git submodule update --init && cd ..
mkdir build && cd build

# For Windows Visual Studio 2019 (Developer Command Prompt for VS 2019)
# cmake .. -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Debug -DPICO_DEOPTIMIZED_DEBUG=on
cmake .. -G "NMake Makefiles"
nmake

# For Windows MSYS2 (Run the following commands on MSYS2)
# cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Debug -DPICO_DEOPTIMIZED_DEBUG=on
cmake .. -G "MSYS Makefiles" 
make

How to Debug on PC

  • If you want to debug this project on PC, please run cmake in pj_loud_talking_detector/pj_loud_talking_detector directory
  • The created project uses TestBuffer instead of microphone. You can change audio data by modifying C array in test_audio_data.h
  • wave2array.py is useful to convert wave file to C array.
  • I tested on Visual Studio 2019 and I'm not sure it works on other environments

About Deep Learning Model

How to Create a Deep Learning Model

  • Run the training script 01_script/training/train_micro_speech_model_talking_10sec.ipynb on Google Colaboratory. It takes around 10 hours to train the model using GPU instance
  • The original script is https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb
  • I made some modifications:
    • Use my dataset
    • Mix noise manually:
      1. Prepare original data: [Talking]
      2. Mix background: [Talking, Talking + Background]
      3. Mix noise: [Talking, Talking + Background, Talking + Noise, Talking + Background + Noise]
    • Separate test data from training data completely
      • Some clips are generated from the same video, so data leakage may happen if I randomly separate data following the original script
    • Change the wanted word list from [yes, no] to [talking, not_talking]
    • Remove "SILENCE" and "UNKNOWN" category
      • Because "SILENCE" and "UNKNOWN" are parts of "Not Talking"
    • Change clip duration from 1 sec to 10 sec
    • Increase training steps
  • Note: You cannot run the script because data download will fail. I don't share dataset due to copyright restrictions

Dataset

  • Details
    • Talking:
      • Talk show from YouTube
      • TV show
    • Not Talking:
    • Background (for augmentation and "Not Talking")
      • Restaurant / Coffee shop ambience
    • Noise (for augmentation and "Not Talking")
      • White noise, pink noise
  • The number of data
    • 10 sec model
      • Talking: 22,144
      • Not Talking: 20,364
    • 5 sec model
      • Talking: 38,854
      • Not Talking: 36,557

Software Design

Ths original project is from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech .

Dataflow

dataflow.png

Modules

modules.png

  • AudioBuffer:
    • provides an interface to access storead audio data in a ring block buffer
    • has three implementations: ADC (for analog mic connected to ADC), PDM (for PDM mic), TestBuffer (prepared data array). I use PDM in this project
  • RingBlockBuffer:
    • consists of some blocks. The block size is 512 Byte and the size is equal to DMA's transfer size and the size is equal to the buffer size for PDM mic module
    • 512 Byte ( 32 msec @16kHz ) is also convenient to work with FeatureProvider which generates feature data using 30 msec of audio data at 20 msec intervals
  • AudioProvider:
    • copies data from the ring block buffer to the local buffer for the requested time
    • converts data from uint8_t to int16_t if needed
    • allocates the data on sequential memory address
  • FeatureProvider:
    • almost the same as the original code.
  • Judgement:
    • judges whether the captured sound is "talking" using the following conditions:
      • current score of "talking" >= 0.8
      • average score of "talking" >= 0.6
      • decibel >= -20 [dB] (0 [dB] is the max value of input (65536))

Performance

10 sec model 5 sec model
Accuracy 96.1 [%] 94.8 [%]
Processing time --- ---
__Total 554 [msec] 298 [msec]
__Preprocess 61 [msec] 31 [msec]
__Inference 455 [msec] 226 [msec]
__Other 38 [msec] 41 [msec]
Power consumption 3.3 [V] x 21 [mA] 3.3 [V] x 22 [mA]

Note: Power consumption is measured without OLED (with OLED, it's around 26 [mA]).

Future works

order_call.png

  • This is a very tiny system ( fittiing in FRISK ! ) , so that it can be implemented in an order call system in restaurants to encourage customers to eat quietly to prevent the spread of the coronavirus
  • Need to reduce power consumption
    • Current system continuously captures audio and runs inference. However, a quick response is not so important for many cases. The frequency of inference can be decreased, probably once every several seconds or once a minute is enouhgh
    • Or using an analog circuit to check voice level and kick pico in sleep mode may be a good idea
  • Need to improve accuracy
    • So far, the training data is very limited (most of them are Japanese)

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].