All Projects → Narasimha1997 → Wavenet Stt

Narasimha1997 / Wavenet Stt

Licence: gpl-3.0
An end-to-end speech recognition system with Wavenet. Built using C++ and python.

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Wavenet Stt

Speech recognition
中文语音识别
Stars: ✭ 534 (+2866.67%)
Mutual labels:  speech-recognition
Wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 5,907 (+32716.67%)
Mutual labels:  speech-recognition
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+4144.44%)
Mutual labels:  speech-recognition
Wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 617 (+3327.78%)
Mutual labels:  speech-recognition
Speech recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Stars: ✭ 5,999 (+33227.78%)
Mutual labels:  speech-recognition
Adapt
Adapt Intent Parser
Stars: ✭ 690 (+3733.33%)
Mutual labels:  speech-recognition
Ctcdecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Stars: ✭ 529 (+2838.89%)
Mutual labels:  speech-recognition
Kur
Descriptive Deep Learning
Stars: ✭ 811 (+4405.56%)
Mutual labels:  speech-recognition
Awesome Diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Stars: ✭ 673 (+3638.89%)
Mutual labels:  speech-recognition
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+4100%)
Mutual labels:  speech-recognition
Vad
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
Stars: ✭ 622 (+3355.56%)
Mutual labels:  speech-recognition
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+3416.67%)
Mutual labels:  speech-recognition
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+34433.33%)
Mutual labels:  speech-recognition
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+2911.11%)
Mutual labels:  speech-recognition
Stephanie Va
Stephanie is an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work.
Stars: ✭ 772 (+4188.89%)
Mutual labels:  speech-recognition
Sonus
💬 /so.nus/ STT (speech to text) for Node with offline hotword detection
Stars: ✭ 532 (+2855.56%)
Mutual labels:  speech-recognition
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+3688.89%)
Mutual labels:  wavenet
Speechpy
💬 SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
Stars: ✭ 833 (+4527.78%)
Mutual labels:  speech-recognition
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+4388.89%)
Mutual labels:  speech-recognition
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+4000%)
Mutual labels:  speech-recognition

STT-Wavenet

Python and C++ implementation of end-to-end sentence level Speech Recognition using DeepMind's recent research on audio processing and synthesis. This is based on WaveNet: A Generative Model for Raw Audio where DeepMind proposed a neural network architecture that could generate human-like audio from text, the model is also capable of performing speech-to-text. This repo provides speech-to-text implementation of Wavenet. The model takes Mel-spectograph as input and produces text as output using wavenet + beam search decoder.

Wavenet STT

Modifying the architecture/making changes to the exporter:

Those who wish to modify or play with wavenet architecture can go to core directory. Refer README.md

Building C++ api (libtensorflow_cc)

TO build C++ api, you have to build tensorflow from scratch along with its dependencies as a monolithic shared library, also make sure the headers are properly exported. If you don't want to build tensorflow, use pre-built shared libraries from FloopCZ/tensorflow_cc. I would recommend building tensorflow from scratch as it properly compiles to your hardware, using prebuilt shared libraries can lead to segmentation faults and illegal instruction execution attempts as they would have compiled tensorflow with different versions of gcc and different hardware optminzations that your processor lacks.

Building tensorflow from scratch:
  1. Install bazel and clone tensorflow repository, run config script and answer the questions carefully.
  2. Build libtensorflow_cc:
    bazel build -c opt --config monolithic //tensorflow:libtensorflow_cc.so
    
  3. Export headers:
    bazel build -c opt --config monolithic //tensorflow:install_headers
    
Build the wavenet module
  1. Install pybind11 and python headers.
    sudo apt install python3-dev &&
    pip3 install pybind11
    
  2. Go to platform/buld_env
  3. Make sure you properly set these env variables:
    TENSORFLOW_CC_LIBS_PATH=${HOME}/Documents/installation/tensorflow/lib        #libtensorflow_cc.so* directory
    TENSORFLOW_CC_INCLUDE_PATH=${HOME}/Documents/installation/tensorflow/include #tensorflow headers directory
    
    #misc : keep them default
    present_dir=$(pwd)
    SST_SOURCE_DIR=${present_dir}/../cc/src 
    SST_INCLUDE_PATH=${present_dir}/../cc/include
    SST_PYTHON_PATH=${present_dir}/../wavenetsst/wavenetpy
    
  4. Build wavenet CPython module
    ./tensorflow_env.sh python
    
Using python wavenet module

The wavenet python module wavenetpy is located at platform/wavenetstt, the module requires wavenet CPython shared library. Since static build is yet to be implemeted, the shared library dynamically links with libtensorflow_cc during runtime. So, make sure you export a proper LD_LIBRARY_PATH.

TENSORFLOW_CC_LIBS_PATH=${HOME}/Documents/installation/tensorflow/lib
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${TENSORFLOW_CC_LIBS_PATH}

Install requirements -> numpy and librosa

pip3 install -r requirements.txt

Example : Running speech recognition

from wavenetpy import WavenetSTT
#load the model
wavenet = WavenetSTT('../../pb/wavenet-stt.pb')

#pass the audio file
result = wavenet.infer_on_file('test.wav')
print(result)
TODO and Roadmap:
  1. Build static library to avoid dynamic linking with libtensorflow_cc
  2. We are using librosa for MFCC, the goal is to use custom C++ implementation.
  3. Use custom C++ ctc_beam_search_decoder because it is not supported in tensorflow lite.
  4. Provide a Dockerfile
  5. Implement Tensorflow Lite implementation for embedded devices and android.
  6. Add Tensorflow.js support.
  7. Optimize C++ code.
  8. Provide CI/CD pipeline for C++ build.
  9. Provide a way to directly access mfcc pointer instead of memory copy. This is not possible as now because of the limitation in Tensorflow C++ api. In other words, add a custom memory allocator for tensors.
Contributor guide

We welcome contributors especially beginners. Contributors can :

  1. Raise issues
  2. Suggest features
  3. Fix issues and bugs
  4. Impelement features specified in TODO and Roadmap.
Acknowledgements
  1. Deepmind
  2. buriburisuri for providing the pretrained ckpt files and wavenet.py.
  3. kingscraft for tensorflow reference implementation.
  4. Stackoverflow and Tensorflow Docs
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].