All Projects → Ant-Brain → EfficientWord-Net

Ant-Brain / EfficientWord-Net

Licence: Apache-2.0 license
OneShot Learning-based hotword detection.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to EfficientWord-Net

oneshot-audio
Experiment with "one-shot learning" techniques to recognize a voice signature
Stars: ✭ 22 (-71.79%)
Mutual labels:  one-shot-learning, siamese-network
One-Shot-Learning-with-Siamese-Networks
Implementation of One Shot Learning using Convolutional Siamese Networks on Omniglot Dataset
Stars: ✭ 129 (+65.38%)
Mutual labels:  one-shot-learning, siamese-network
one-shot-steel-surfaces
One-Shot Recognition of Manufacturing Defects in Steel Surfaces
Stars: ✭ 29 (-62.82%)
Mutual labels:  one-shot-learning, siamese-network
Porcupine
On-device wake word detection powered by deep learning.
Stars: ✭ 2,606 (+3241.03%)
Mutual labels:  hotword-detection, wakeword
pytorch-siamese-triplet
One-Shot Learning with Triplet CNNs in Pytorch
Stars: ✭ 74 (-5.13%)
Mutual labels:  one-shot-learning, siamese-network
Offline-Signature-Verification-using-Siamese-Network
Identifying forged signatures using convolutional siamese networks implemented in Keras
Stars: ✭ 31 (-60.26%)
Mutual labels:  one-shot-learning, siamese-network
OfflineSignatureVerification
Writer independent offline signature verification using convolutional siamese networks
Stars: ✭ 49 (-37.18%)
Mutual labels:  siamese-network, siamese-neural-network
Deep-One-Shot-Logo-Retrieval
A Deep One-Shot Network for Query-based Logo Retrieval [Pattern Recognition 2019, Elsevier]
Stars: ✭ 58 (-25.64%)
Mutual labels:  one-shot-learning, siamese-network
Facial-Recognition-Using-FaceNet-Siamese-One-Shot-Learning
Implementation of Facial Recognition System Using Facenet based on One Shot Learning Using Siamese Networks
Stars: ✭ 104 (+33.33%)
Mutual labels:  one-shot-learning, siamese-network
visual-compatibility
Context-Aware Visual Compatibility Prediction (https://arxiv.org/abs/1902.03646)
Stars: ✭ 92 (+17.95%)
Mutual labels:  siamese-network
Siamese Network MNIST
Siamese Network on MNIST Dataset
Stars: ✭ 17 (-78.21%)
Mutual labels:  siamese-network
dialectID siam
Dialect identification using Siamese network
Stars: ✭ 15 (-80.77%)
Mutual labels:  siamese-network
PREREQ-IAAI-19
Inferring Concept Prerequisite Relations from Online Educational Resources (IAAI-19)
Stars: ✭ 22 (-71.79%)
Mutual labels:  siamese-network
ChangeFormer
Official PyTorch implementation of our IGARSS'22 paper: A Transformer-Based Siamese Network for Change Detection
Stars: ✭ 220 (+182.05%)
Mutual labels:  siamese-network
Siamese-Recurrent-Architectures
Usage of Siamese Recurrent Neural network architectures for semantic textual similarity
Stars: ✭ 19 (-75.64%)
Mutual labels:  siamese-network
SignatureVerification
A system to recognize whether signatures are forged or real.
Stars: ✭ 17 (-78.21%)
Mutual labels:  siamese-neural-network
Offline-Signature-Verification
Implemented two papers for offline signature verification. Both use different deep learning techniques - Convolutional network and Siamese network.
Stars: ✭ 24 (-69.23%)
Mutual labels:  siamese-network
RankNet
Learning to Rank from Pair-wise data
Stars: ✭ 40 (-48.72%)
Mutual labels:  siamese-network
go-snowboy
Go wrapper for Kitt-AI's snowboy audio detection library.
Stars: ✭ 40 (-48.72%)
Mutual labels:  hotword-detection
Nearest-Celebrity-Face
Tensorflow Implementation of FaceNet: A Unified Embedding for Face Recognition and Clustering to find the celebrity whose face matches the closest to yours.
Stars: ✭ 30 (-61.54%)
Mutual labels:  one-shot-learning

EfficientWord-Net

Versions : 3.6 ,3.7,3.8,3.9

Hotword detection based on one-shot learning

Home assistants require special phrases called hotwords to get activated (eg:"ok google")

EfficientWord-Net is an hotword detection engine based on one-shot learning inspired from FaceNet's Siamese Network Architecture. Works very similar to face recognition , just requires a few samples of your own custom hotword to get going. No extra training or huge datasets required!! This will allow developers to add custom hotwords to their programs without a sweat or any extra charges. Just like google assistant's hotword detector, the engine performs the best when 3-4 hotword samples are collected directly from the user This repository is an official implemenation of EfficientWord-Net as a python library from the authors.

The library is purely written with python and uses Google's Tflite implemenation for faster realtime inference.

Demo of EfficientWord-Net in Pi

EfficientWord-Net.mp4

Access preprint

The research paper is currently under review in IEEE, click here to access the preprint and the training code will be available for public access once the paper is published.

Python Version Requirements

This Library works between python versions: 3.6 to 3.9

Dependencies Installation

Before running the pip installation command for the library, few dependencies need to be installed manually.

tflite package cannot be listed in requirements.txt hence will be automatically installed when the package is initialized in the system.

librosa package is not required for inference only cases , however when generate_reference is called , will be automatically installed.


Package Installation

Run the following pip command

pip install EfficientWord-Net

and to import running

import eff_word_net

Demo

After installing the packages, you can run the Demo script inbuilt with library (ensure you have a working mic).

Accesss Documentation from : https://ant-brain.github.io/EfficientWord-Net/

Command to run demo

python -m eff_word_net.engine

Generating Custom Wakewords

For any new hotword, the library needs information about the hotword, this information is obtained from a file called {wakeword}_ref.json. Eg: For the wakeword 'alexa', the library would need the file called alexa_ref.json

These files can be generated with the following procedure:

One needs to collect few 4 to 10 uniquely sounding pronunciations of a given wakeword. Then put them into a seperate folder, which doesnt contain anything else.

Or one could use the following command to generate audio files for a given word, uses ibm neural tts demo api, Kindly dont over use it for our sake (lol)

python -m eff_word_net.ibm_generate

Finally run this command, it will ask for the input folder's location (containing the audio files) and the output folder (where _ref.json file will be stored).

python -m eff_word_net.generate_reference

The pathname of the generated wakeword needs to passed to the HotwordDetector detector instance.

HotwordDetector(
        hotword="hello",
        reference_file = "/full/path/name/of/hello_ref.json"),
        threshold=0.9, #min confidence required to consider a trigger
        relaxation_time = 0.8 #default value ,in seconds
)

relaxation time parameter is used to determine the min time between any 2 triggers, any potential triggers before the relaxation_time will be cancelled

The detector operates on a sliding widow approach resulting in multiple triggers for single utterance of a hotword, the relaxation_time parameter can used to control the multiple triggers, in most cases 0.8sec(default) will do


Out of the box sample hotwords

Few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, Siri the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable

from eff_word_net import samples_loc

Try your first single hotword detection script

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

mycroft_hw = HotwordDetector(
        hotword="Mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft ")
while True :
    frame = mic_stream.getFrame()
    result = mycroft_hw.scoreFrame(frame)
    if result==None :
        #no voice activity
        continue
    if(result["match"]):
        print("Wakeword uttered",result["confidence"])

Detecting Mulitple Hotwords from audio streams

The library provides a computation friendly way to detect multiple hotwords from a given stream, instead of running scoreFrame() of each wakeword individually

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
        hotword="Alexa",
        reference_file = os.path.join(samples_loc,"alexa_ref.json"),
    )

siri_hw = HotwordDetector(
        hotword="Siri",
        reference_file = os.path.join(samples_loc,"siri_ref.json"),
    )

mycroft_hw = HotwordDetector(
        hotword="mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

multi_hw_engine = MultiHotwordDetector(
        detector_collection = [
            alexa_hw,
            siri_hw,
            mycroft_hw,
        ],
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft / Alexa / Siri")

while True :
    frame = mic_stream.getFrame()
    result = multi_hw_engine.findBestMatch(frame)
    if(None not in result):
        print(result[0],f",Confidence {result[1]:0.4f}")

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/

Change notes from v0.1.1 to 0.2.2

major changes to replace complex friking logic of handling poly triggers per utterance into more simpler logic and more simpler api for programmers

Introduces breaking changes

Limitations in Current model

  • trained on single words , hence may result in bizare behaviour on using phrases like "Hey xxx"
  • audio processing window limited to 1 sec. Hence will not work effectively for longer hotwords

FAQ :

  • Hotword Perfomance is bad : if you are having some issue like this , feel to ask the same in discussions

CONTRIBUTION:

  • If you have an ideas to make the project better, feel free to ping us in discussions
  • The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It will be of a great help if tensorflow guru's outthere help us out with this.

TODO :

  • Add audio file handler in streams. PR's are welcome.
  • Remove librosa requirement to encourage generating reference files directly in edge devices
  • Add more detailed documentation explaining slider window concept

SUPPORT US:

Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in medium.

LICENCSE : Apache License 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].