daemon / pytorch-pcen

Licence: MIT license

PyTorch reimplementation of per-channel energy normalization for audio.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch-pcen

Gcc Nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Stars: ✭ 231 (+188.75%)

Mutual labels: speech

browser-apis

🦄 Cool & Fun Browser Web APIs 🥳

Stars: ✭ 21 (-73.75%)

Mutual labels: speech

Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.

Stars: ✭ 25 (-68.75%)

Mutual labels: speech

Tacotron pytorch

PyTorch implementation of Tacotron speech synthesis model.

Stars: ✭ 242 (+202.5%)

Mutual labels: speech

Voice Gender

Gender recognition by voice and speech analysis

Stars: ✭ 248 (+210%)

Mutual labels: speech

idear

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+5%)

Mutual labels: speech

Source separation

Deep learning based speech source separation using Pytorch

Stars: ✭ 226 (+182.5%)

Mutual labels: speech

txt2speech

Convert text to speech using Google Translate API

Stars: ✭ 38 (-52.5%)

Mutual labels: speech

lectures-all

Central repository for all lectures on deep learning at UPC ETSETB TelecomBCN.

Stars: ✭ 46 (-42.5%)

Mutual labels: speech

IMS-Toucan

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+268.75%)

Mutual labels: speech

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (+206.25%)

Mutual labels: speech

Wavegrad

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+206.25%)

Mutual labels: speech

VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!

Stars: ✭ 278 (+247.5%)

Mutual labels: speech

Lhotse

Stars: ✭ 236 (+195%)

Mutual labels: speech

TF-Speech-Recognition-Challenge-Solution

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

Stars: ✭ 58 (-27.5%)

Mutual labels: speech

Setk

Tools for Speech Enhancement integrated with Kaldi

Stars: ✭ 227 (+183.75%)

Mutual labels: speech

Naver-AI-Hackathon-Speech

2019 Clova AI Hackathon : Speech - Rank 12 / Team Kai.Lib

Stars: ✭ 26 (-67.5%)

Mutual labels: speech

wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+156.25%)

Mutual labels: speech

anycontrol

Voice control for your websites and applications

Stars: ✭ 53 (-33.75%)

Mutual labels: speech

react-native-speech-bubble

💬 A speech bubble dialog component for React Native.

Stars: ✭ 50 (-37.5%)

Mutual labels: speech

View All Similar Projects ➔

PyTorch-PCEN

Efficient PyTorch reimplementation of per-channel energy normalization with Mel spectrogram features.

Overview

Robustness to loudness differences in near- and far-field conditions is critical in high-quality speech recognition applications. Obviously, spectrogram energies differ significantly between, say, shouting at arms-length and whispering from a distance. This can worsen model quality, since the model itself would need to be robust across a wide range of input. The log-compression step in the popular log-Mel transform partially addresses this issue by reducing the dynamic range of audio; however, it ignores per-channel energy differences and is static by definition.

Per-channel energy normalization is one such solution to the aforementioned problems. It provides a per-channel, trainable front-end in place of the log compression, greatly improving model robustness in keyword spotting systems -- all the while being resource-efficient and easy to implement.

Installation and Usage

PyTorch and NumPy are required. LibROSA and matplotlib are required only for the example.
To install via pip, run pip install git+https://github.com/daemon/pytorch-pcen. Otherwise, clone this repository and run python setup.py install.
To run the example in the module, place a 16kHz WAV file named yes.wav in the current directory. Then, do python -m pcen.pcen.

The following is a self-contained example for using a streaming PCEN layer:

import pcen
import torch

# 40-dimensional features, 30-millisecond window, 10-millisecond shift; trainable is false by default
transform = pcen.StreamingPCENTransform(n_mels=40, n_fft=480, hop_length=160, trainable=True)
audio = torch.empty(1, 16000).normal_(0, 0.1) # Gaussian noise

# 1600 is an arbitrary chunk size; This step is unnecessary but demonstrates the streaming nature
streaming_chunks = audio.split(1600, 1)
pcen_chunks = [transform(chunk) for chunk in streaming_chunks] # Transform each chunk
transform.reset() # Reset the persistent streaming state
pcen_ = torch.cat(pcen_chunks, 1)

Citation

Wang, Yuxuan, Pascal Getreuer, Thad Hughes, Richard F. Lyon, and Rif A. Saurous. Trainable frontend for robust and far-field keyword spotting. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 5670-5674. IEEE, 2017.

@inproceedings{wang2017trainable,
  title={Trainable frontend for robust and far-field keyword spotting},
  author={Wang, Yuxuan and Getreuer, Pascal and Hughes, Thad and Lyon, Richard F and Saurous, Rif A},
  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on},
  pages={5670--5674},
  year={2017},
  organization={IEEE}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

daemon / pytorch-pcen

Programming Languages

Labels

Projects that are alternatives of or similar to pytorch-pcen

PyTorch-PCEN

Overview

Installation and Usage

Citation