All Projects → DemisEom → Specaugment

DemisEom / Specaugment

Licence: apache-2.0
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Specaugment

KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-94.85%)
Mutual labels:  speech, speech-recognition
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+0%)
Mutual labels:  speech-recognition, speech
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-78.19%)
Mutual labels:  speech, speech-recognition
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-56.13%)
Mutual labels:  speech, speech-recognition
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (-3.68%)
Mutual labels:  speech-recognition, speech
specAugment
Tensor2tensor experiment with SpecAugment
Stars: ✭ 46 (-88.73%)
Mutual labels:  speech-recognition, data-augmentation
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (-79.9%)
Mutual labels:  speech, speech-recognition
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (-79.41%)
Mutual labels:  speech, speech-recognition
mixup
speechpro.com/
Stars: ✭ 23 (-94.36%)
Mutual labels:  speech-recognition, data-augmentation
speech to text
how to use the Google Cloud Speech API to transcribe audio/video files.
Stars: ✭ 35 (-91.42%)
Mutual labels:  speech, speech-recognition
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (-49.75%)
Mutual labels:  speech, speech-recognition
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-69.85%)
Mutual labels:  speech, speech-recognition
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (-87.01%)
Mutual labels:  speech, speech-recognition
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-94.85%)
Mutual labels:  speech, speech-recognition
TF-Speech-Recognition-Challenge-Solution
Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Stars: ✭ 58 (-85.78%)
Mutual labels:  speech, speech-recognition
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-96.57%)
Mutual labels:  speech, speech-recognition
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (-40.69%)
Mutual labels:  speech-recognition, speech
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (-39.22%)
Mutual labels:  data-augmentation, speech-recognition
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (-45.1%)
Mutual labels:  speech, speech-recognition
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-87.25%)
Mutual labels:  speech, speech-recognition

SpecAugment License

This is a implementation of SpecAugment that speech data augmentation method which directly process the spectrogram with Tensorflow & Pytorch, introduced by Google Brain[1]. This is currently under the Apache 2.0, Please feel free to use for your project. Enjoy!

How to use

First, you need to have python 3 installed along with Tensorflow.

Next, you need to install some audio libraries work properly. To install the requirement packages. Run the following command:

pip3 install SpecAugment

And then, run the specAugment.py program. It modifies the spectrogram by warping it in the time direction, masking blocks of consecutive frequency channels, and masking blocks of utterances in time.

Try your audio file SpecAugment

$ python3
>>> import librosa
>>> from specAugment import spec_augment_tensorflow
# If you are Pytorch, then import spec_augment_pytorch instead of spec_augment_tensorflow
>>> audio, sampling_rate = librosa.load(audio_path)
>>> mel_spectrogram = librosa.feature.melspectrogram(y=audio,
                                                     sr=sampling_rate,
                                                     n_mels=256,
                                                     hop_length=128,
                                                     fmax=8000)
>>> warped_masked_spectrogram = spec_augment_tensorflow.spec_augment(mel_spectrogram=mel_spectrogram)
>>> print(warped_masked_spectrogram)
'
[[1.54055389e-01 7.51822486e-01 7.29588015e-01 ... 1.03616300e-01
  1.04682689e-01 1.05411769e-01]
 [2.21608739e-01 1.38559084e-01 1.01564167e-01 ... 4.19907116e-02
  4.86430404e-02 5.27331798e-02]
 [3.62784019e-01 2.09934399e-01 1.79158230e-01 ... 2.42307431e-01
  3.18662338e-01 3.67405599e-01]
 ...
 [6.36117335e-07 8.06897948e-07 8.55346431e-07 ... 2.84445018e-07
  4.02975952e-07 5.57131738e-07]
 [6.27753429e-07 7.53681318e-07 8.13035033e-07 ... 1.35111146e-07
  2.74058225e-07 4.56901031e-07]
 [0.00000000e+00 7.48416680e-07 5.51771037e-07 ... 1.13901361e-07
  2.56365068e-07 4.43868592e-07]]
'

Learn more examples about how to do specific tasks in SpecAugment at the test code.

python spec_augment_test.py

In test code, we using one of the LibriSpeech dataset.

Example result of base spectrogram Example result of base spectrogram

Reference

  1. https://arxiv.org/pdf/1904.08779.pdf
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].