In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…

Stars: ✭ 25 (-16.67%)

Mutual labels: mfcc

sonopy

A simple audio feature extraction library

Stars: ✭ 72 (+140%)

Mutual labels: mfcc

Aubio

a library for audio and music analysis

Stars: ✭ 2,601 (+8570%)

Mutual labels: mfcc

Numpy Ml

Machine learning, in numpy

Stars: ✭ 11,100 (+36900%)

Mutual labels: mfcc

scim

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

Stars: ✭ 17 (-43.33%)

Mutual labels: mfcc

BasicsMusicalInstrumClassifi

Basics of Musical Instruments Classification using Machine Learning

Stars: ✭ 27 (-10%)

Mutual labels: mfcc

Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Stars: ✭ 84 (+180%)

Mutual labels: mfcc

DTW Digital Voice Recognition

基于DTW与MFCC特征进行数字0-9的语音识别，DTW，MFCC，语音识别，中英数据，端点检测，Digital Voice Recognition。

Stars: ✭ 28 (-6.67%)

Mutual labels: mfcc

vamp-aubio-plugins

aubio plugins for Vamp

Stars: ✭ 38 (+26.67%)

Mutual labels: mfcc

Now official torchaudio supports MFCC!!! See Here. This Library will no longer be maintained

MFCC (Mel Frequency Cepstral Coefficient) for PyTorch

Based on this repository, this project extends the MFCC function for Pytorch so that backpropagation path could be established through.

Dependency

Python >= 3.5
PyTorch >= 1.0
numpy
librosa

Installation

git clone https://github.com/skaws2003/pytorch_mfcc.git

Parameters

Parameters	Description
samplerate	samplerate of the signal
winlen	the length of the analysis window. Defaults 0.025s
winstep	the length of step between each windows. Defaults 0.01s
numcep	the number of cepstrum to return. Defaults 13
nfilt	the number of filters in the filterbank. Defaults 26
nfft	FFT size. Defaults 512
lowfreq	lowest band edge of mel filters(Hz) Defaults 0
highfreq	highest band edge of mel filters(Hz) Defaults samplerate/2
preemph	apply preemphasis filter with preemph as coefficient. 0 is no filter. Defaults 0.97
ceplifter	apply a lifter to final cepstral coefficients. 0 is no lifter. Defaults 22
appendEnergy	if this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy.

Example use

import librosa
import torch
import pytorch_mfcc
import numpy


device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')     # Device
files = ['english.wav','english_crop.wav']      # Files to load

# Read files
signals = []
wav_lengths = []
sample_rate = 8000  # 8000 for the example file, but normally it is 22050 of 44100. Check it and be careful.

for f in files:
    signal,rate = librosa.load(f,sr=sample_rate,mono=True)    # Load wavefile. Be careful of the sampling rate.
    signals.append(signal)
    wav_lengths.append(len(signal))

# Pad signals with zeros, and make batch
max_length = max(wav_lengths)
signals_torch = []
for i in range(len(signals)):
    signal = torch.tensor(signals[i],dtype=torch.float32).to(device)
    zeros = torch.zeros(max_length - len(signal)).to(device)
    signal = torch.cat([signal,zeros])
    signals_torch.append(signal)
    
signal_batch = torch.stack(signals_torch)

# Now do mfcc
mfcc_layer = pytorch_mfcc.MFCC(samplerate=sample_rate).to(device)     # MFCC layer
val,mfcc_lengths = mfcc_layer(signal_batch,wav_lengths)       # Do mfcc

print(val.shape)
print(mfcc_lengths)

References

DCT for PyTorch by Ziyang Hu
This project is based on python_speech_features by James Lyons

Sample Source

sample english.wav and english_crop.wav from:

wget http://voyager.jpl.nasa.gov/spacecraft/audio/english.au
sox english.au -e signed-integer english.wav

Comments

Any contribution is welcomed. Please don't hesitate to make a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

skaws2003 / pytorch-mfcc

Programming Languages

Labels

Projects that are alternatives of or similar to pytorch-mfcc

Now official torchaudio supports MFCC!!! See Here. This Library will no longer be maintained

MFCC (Mel Frequency Cepstral Coefficient) for PyTorch

Dependency

Installation

Parameters

Example use

References

Sample Source

Comments