All Projects → grausof → Keras Sincnet

grausof / Keras Sincnet

Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Keras Sincnet

Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+1525.53%)
Mutual labels:  artificial-intelligence, filtering, convolutional-neural-networks, cnn, speech-recognition, asr, speech-processing, audio, audio-processing, waveform
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (+151.06%)
Mutual labels:  artificial-intelligence, speech-recognition, speech-processing
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (+106.38%)
Mutual labels:  convolutional-neural-networks, speech-recognition, asr
Deep Learning With Python
Deep learning codes and projects using Python
Stars: ✭ 195 (+314.89%)
Mutual labels:  artificial-intelligence, convolutional-neural-networks, cnn
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (+29.79%)
Mutual labels:  speech-recognition, audio, audio-processing
Wav2letter
Speech Recognition model based off of FAIR research paper built using Pytorch.
Stars: ✭ 78 (+65.96%)
Mutual labels:  convolutional-neural-networks, speech-recognition, asr
Iresnet
Improved Residual Networks (https://arxiv.org/pdf/2004.04989.pdf)
Stars: ✭ 163 (+246.81%)
Mutual labels:  artificial-intelligence, convolutional-neural-networks, cnn
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+5753.19%)
Mutual labels:  cnn, speech-recognition, audio
Pyconv
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition (https://arxiv.org/pdf/2006.11538.pdf)
Stars: ✭ 231 (+391.49%)
Mutual labels:  artificial-intelligence, convolutional-neural-networks, cnn
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+12.77%)
Mutual labels:  speech-recognition, speech-processing, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-42.55%)
Mutual labels:  speech-recognition, speech-processing, asr
Dtln
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Stars: ✭ 147 (+212.77%)
Mutual labels:  speech-processing, audio, audio-processing
Image classifier
CNN image classifier implemented in Keras Notebook 🖼️.
Stars: ✭ 139 (+195.74%)
Mutual labels:  artificial-intelligence, convolutional-neural-networks, cnn
Transfer Learning Suite
Transfer Learning Suite in Keras. Perform transfer learning using any built-in Keras image classification model easily!
Stars: ✭ 212 (+351.06%)
Mutual labels:  artificial-intelligence, convolutional-neural-networks, cnn
Surfboard
Novoic's audio feature extraction library
Stars: ✭ 318 (+576.6%)
Mutual labels:  speech-processing, audio, audio-processing
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (+734.04%)
Mutual labels:  cnn, speech-recognition, asr
Blipkit
C library for creating the beautiful sound of old sound chips
Stars: ✭ 23 (-51.06%)
Mutual labels:  audio, waveform
Waveform Playlist
Multitrack Web Audio editor and player with canvas waveform preview. Set cues, fades and shift multiple tracks in time. Record audio tracks or provide audio annotations. Export your mix to AudioBuffer or WAV! Project inspired by Audacity.
Stars: ✭ 919 (+1855.32%)
Mutual labels:  audio, waveform
Cnnimageretrieval Pytorch
CNN Image Retrieval in PyTorch: Training and evaluating CNNs for Image Retrieval in PyTorch
Stars: ✭ 931 (+1880.85%)
Mutual labels:  convolutional-neural-networks, cnn
Deepmodels
TensorFlow Implementation of state-of-the-art models since 2012
Stars: ✭ 33 (-29.79%)
Mutual labels:  convolutional-neural-networks, cnn

SincNet (M. Ravanelli - Y. Bengio) implementation using Keras Functional Framework v2+

  • Models are converted from original torch networks.
  • It supports only Tensorflow backend
  • The cfg file is the same as the original code, but some parameters are not supported

SincNet

SincNet is a neural architecture for processing raw audio samples. It is a novel Convolutional Neural Network (CNN) that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters.

References

[1] Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet” Arxiv

Cites the authors

If you use this code or part of it, please cite the authors!

Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet” Arxiv

Prerequisites

  • Linux / Mac
  • Python 3.6/2.7
  • keras 2.1.6
  • Tensorflow 1.10.0
  • tqdm (pip install tqdm)
  • pysoundfile (pip install pysoundfile)

How to run a TIMIT experiment

Even though the code can be easily adapted to any speech dataset, in the following part of the documentation we provide an example based on the popular TIMIT dataset.

1. Run TIMIT data preparation.

This step is necessary to store a version of TIMIT in which start and end silences are removed and the amplitute of each speech utterance is normalized. To do it, run the following code:

python TIMIT_preparation.py $TIMIT_FOLDER $OUTPUT_FOLDER data_lists/TIMIT_all.scp

where:

  • $TIMIT_FOLDER is the folder of the original TIMIT corpus
  • $OUTPUT_FOLDER is the folder in which the normalized TIMIT will be stored
  • data_lists/TIMIT_all.scp is the list of the TIMIT files used for training/test the speaker id system.

2. Run the speaker id experiment.

  • Modify the [data] section of cfg/SincNet_TIMIT.cfg file according to your paths. In particular, modify the data_folder with the $OUTPUT_FOLDER specified during the TIMIT preparation. The other parameters of the config file belong to the following sections:
  1. [windowing], that defines how each sentence is splitted into smaller chunks.
  2. [cnn], that specifies the characteristics of the CNN architecture.
  3. [dnn], that specifies the characteristics of the fully-connected DNN architecture following the CNN layers.
  4. [class], that specify the softmax classification part.
  5. [optimization], that reports the main hyperparameters used to train the architecture.
  • Once setup the cfg file, you can run the speaker id experiments using the following command:

python train.py --cfg=cfg/SincNet_TIMIT.cfg

The network might take several hours to converge (depending on the speed of your GPU card).

3. Results.

The results are saved into the output_folder specified in the cfg file. In this folder, you can find a file (res.res) summarizing test accuracy. The model checkpoints/SincNet.hdf5 is the SincNet model saved after the last iteration. Tensorboard can be used to display the loss and accuracy on the train set with the following command:

tensorboard --logdir=output_folder/logs

Using the cfg file specified above, we obtain the following results:

epoch 0, acc_te=0.040379 acc_te_snt=0.059885
epoch 8, acc_te=0.366254 acc_te_snt=0.810245
epoch 16, acc_te=0.360024 acc_te_snt=0.782107
epoch 24, acc_te=0.439231 acc_te_snt=0.915584
epoch 32, acc_te=0.443118 acc_te_snt=0.914141
epoch 40, acc_te=0.449710 acc_te_snt=0.921356
epoch 48, acc_te=0.494441 acc_te_snt=0.955267
epoch 56, acc_te=0.479593 acc_te_snt=0.949495
epoch 64, acc_te=0.492353 acc_te_snt=0.961039
epoch 72, acc_te=0.496025 acc_te_snt=0.960317
....
epoch 280, acc_te=0.514994 acc_te_snt=0.972583
epoch 288, acc_te=0.508841 acc_te_snt=0.971861
epoch 296, acc_te=0.524953 acc_te_snt=0.975469
epoch 304, acc_te=0.507942 acc_te_snt=0.974026
epoch 312, acc_te=0.510259 acc_te_snt=0.974026
epoch 320, acc_te=0.503742 acc_te_snt=0.966089

WARNING: the results of this network are in terms of accuracy and not of error (as for the original network).

Where SincNet is implemented?

To take a look into the SincNet layer implementation you should open the file sincnet.py and read the class SincConv1D and the function sinc. The rest of the network is in the model.py file

How to use SincNet with a different dataset?

In this repository, we used the TIMIT dataset as a tutorial to show how SincNet works (as in the original code). With the current version of the code, you can easily use a different corpus. To do it you should provide in input the corpora-specific input files (in wav format) and your own labels. You should thus modify the paths into the *.scp files you find in the data_lists folder.

To assign to each sentence the right label, you also have modify the dictionary "TIMIT_labels.npy". The labels are specified within a python dictionary that contains sentence ids as keys (e.g., "si1027") and speaker_ids as values. Each speaker_id is an integer, ranging from 0 to N_spks-1. In the TIMIT dataset, you can easily retrieve the speaker id from the path (e.g., train/dr1/fcjf0/si1027.wav is the sentence_id "si1027" uttered by the speaker "fcjf0"). For other datasets, you should be able to retrieve in such a way this dictionary containing pairs of speaker and sentence ids.

You should then modify the config file (cfg/SincNet_TIMIT.cfg) according to your new paths. Remember also to change the field "class_lay=462" according to the number of speakers N_spks you have in your dataset.

References

[1] SincNet original code written in PyTorch by the autor (https://github.com/mravanelli/SincNet)

[2] Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet” Arxiv

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].