All Projects → skaws2003 → pytorch-mfcc

skaws2003 / pytorch-mfcc

Licence: MIT license
A pytorch implementation of MFCC.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch-mfcc

timit-preprocessor
Extract mfcc vectors and phones from TIMIT dataset
Stars: ✭ 14 (-53.33%)
Mutual labels:  mfcc
pyAudioProcessing
Audio feature extraction and classification
Stars: ✭ 165 (+450%)
Mutual labels:  mfcc
spafe
🔉 spafe: Simplified Python Audio Features Extraction
Stars: ✭ 310 (+933.33%)
Mutual labels:  mfcc
ConvolutionaNeuralNetworksToEnhanceCodedSpeech
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…
Stars: ✭ 25 (-16.67%)
Mutual labels:  mfcc
sonopy
A simple audio feature extraction library
Stars: ✭ 72 (+140%)
Mutual labels:  mfcc
Aubio
a library for audio and music analysis
Stars: ✭ 2,601 (+8570%)
Mutual labels:  mfcc
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+36900%)
Mutual labels:  mfcc
scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
Stars: ✭ 17 (-43.33%)
Mutual labels:  mfcc
BasicsMusicalInstrumClassifi
Basics of Musical Instruments Classification using Machine Learning
Stars: ✭ 27 (-10%)
Mutual labels:  mfcc
Speaker-Identification
A program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+180%)
Mutual labels:  mfcc
DTW Digital Voice Recognition
基于DTW与MFCC特征进行数字0-9的语音识别,DTW,MFCC,语音识别,中英数据,端点检测,Digital Voice Recognition。
Stars: ✭ 28 (-6.67%)
Mutual labels:  mfcc
vamp-aubio-plugins
aubio plugins for Vamp
Stars: ✭ 38 (+26.67%)
Mutual labels:  mfcc

Now official torchaudio supports MFCC!!! See Here. This Library will no longer be maintained

MFCC (Mel Frequency Cepstral Coefficient) for PyTorch

Based on this repository, this project extends the MFCC function for Pytorch so that backpropagation path could be established through.

Dependency

  • Python >= 3.5
  • PyTorch >= 1.0
  • numpy
  • librosa

Installation

git clone https://github.com/skaws2003/pytorch_mfcc.git

Parameters

Parameters Description
samplerate samplerate of the signal
winlen the length of the analysis window. Defaults 0.025s
winstep the length of step between each windows. Defaults 0.01s
numcep the number of cepstrum to return. Defaults 13
nfilt the number of filters in the filterbank. Defaults 26
nfft FFT size. Defaults 512
lowfreq lowest band edge of mel filters(Hz) Defaults 0
highfreq highest band edge of mel filters(Hz) Defaults samplerate/2
preemph apply preemphasis filter with preemph as coefficient. 0 is no filter. Defaults 0.97
ceplifter apply a lifter to final cepstral coefficients. 0 is no lifter. Defaults 22
appendEnergy if this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy.

Example use

import librosa
import torch
import pytorch_mfcc
import numpy


device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')     # Device
files = ['english.wav','english_crop.wav']      # Files to load

# Read files
signals = []
wav_lengths = []
sample_rate = 8000  # 8000 for the example file, but normally it is 22050 of 44100. Check it and be careful.

for f in files:
    signal,rate = librosa.load(f,sr=sample_rate,mono=True)    # Load wavefile. Be careful of the sampling rate.
    signals.append(signal)
    wav_lengths.append(len(signal))

# Pad signals with zeros, and make batch
max_length = max(wav_lengths)
signals_torch = []
for i in range(len(signals)):
    signal = torch.tensor(signals[i],dtype=torch.float32).to(device)
    zeros = torch.zeros(max_length - len(signal)).to(device)
    signal = torch.cat([signal,zeros])
    signals_torch.append(signal)
    
signal_batch = torch.stack(signals_torch)

# Now do mfcc
mfcc_layer = pytorch_mfcc.MFCC(samplerate=sample_rate).to(device)     # MFCC layer
val,mfcc_lengths = mfcc_layer(signal_batch,wav_lengths)       # Do mfcc

print(val.shape)
print(mfcc_lengths)

References

Sample Source

sample english.wav and english_crop.wav from:

wget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au
sox english.au -e signed-integer english.wav

Comments

Any contribution is welcomed. Please don't hesitate to make a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].