Audio-WestlakeU / NBSS

Licence: other

The official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to NBSS

TASNET

Time-domain Audio Separation Network (IN PYTORCH)

Stars: ✭ 18 (-76.62%)

Mutual labels: speech, separation

pytorch-pcen

PyTorch reimplementation of per-channel energy normalization for audio.

Stars: ✭ 80 (+3.9%)

Mutual labels: speech

Wavegrad

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+218.18%)

Mutual labels: speech

IMS-Toucan

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+283.12%)

Mutual labels: speech

lectures-all

Central repository for all lectures on deep learning at UPC ETSETB TelecomBCN.

Stars: ✭ 46 (-40.26%)

Mutual labels: speech

TF-Speech-Recognition-Challenge-Solution

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

Stars: ✭ 58 (-24.68%)

Mutual labels: speech

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (+218.18%)

Mutual labels: speech

icassp2019-latex-template

ICASSP 2019 official Latex template

Stars: ✭ 21 (-72.73%)

Mutual labels: speech

wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+166.23%)

Mutual labels: speech

react-native-speech-bubble

💬 A speech bubble dialog component for React Native.

Stars: ✭ 50 (-35.06%)

Mutual labels: speech

VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!

Stars: ✭ 278 (+261.04%)

Mutual labels: speech

browser-apis

🦄 Cool & Fun Browser Web APIs 🥳

Stars: ✭ 21 (-72.73%)

Mutual labels: speech

anycontrol

Voice control for your websites and applications

Stars: ✭ 53 (-31.17%)

Mutual labels: speech

Voice Gender

Gender recognition by voice and speech analysis

Stars: ✭ 248 (+222.08%)

Mutual labels: speech

ventib

📈 Ventib records your voice, transcribes it in realtime, and performs speech pattern analysis to give you objective statistics about how you speak.

Stars: ✭ 43 (-44.16%)

Mutual labels: speech

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (+214.29%)

Mutual labels: speech

idear

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (+9.09%)

Mutual labels: speech

Multimodal-Gesture-Recognition-with-LSTMs-and-CTC

An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.

Stars: ✭ 25 (-67.53%)

Mutual labels: speech

cape

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Stars: ✭ 29 (-62.34%)

Mutual labels: speech

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+132.47%)

Mutual labels: speech

View All Similar Projects ➔

Multi-channel Narrow-band Deep Speech Separation

A multichannel speech separation method. The official repo of:
[1] Changsheng Quan, Xiaofei Li. Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training. In ICASSP 2022.
[2] Changsheng Quan, Xiaofei Li. Multichannel Speech Separation with Narrow-band Conformer. In Interspeech 2022.
[3] Changsheng Quan, Xiaofei Li. NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer. arXiv preprint arXiv:2212.02076.

Audio examples can be found at https://audio.westlake.edu.cn/Research/nbss.htm. More information about our group can be found at https://audio.westlake.edu.cn.

Requirements

pip install -r requirements.txt

# gpuRIR: check https://github.com/DavidDiazGuerra/gpuRIR

Generate rirs

Generate rirs using configs/rir_cfg_4.json, and the generated rirs are placed in dataset/rir_cfg_4.

python generate_rirs.py

Train & Test

This project is built on the pytorch-lightning package, in particular its command line interface (CLI). Thus we recommond you to have some knowledge about the CLI in lightning.

Train NBC2 on the 0-th GPU with config file configs/NBC2_small.yaml or configs/NBC2_large.yaml (replace the rir & clean speech dir before training).

python NBSSCLI.py fit --config=configs/NBC2_small.yaml \
 --data.batch_size=[2,2] \ # batch size for train and val
 --trainer.accumulate_grad_batches=1 \
 --trainer.gpus=0,

More gpus can be used by appending the gpu indexes to trainer.gpus, e.g. --trainer.gpus=0,1,2,3,.

Configs configs/NBC-fit.yaml and configs/NB-BLSTM-fit.yaml can be used to train and test NBC and NB-BLSTM in the same way respectively. But mind to change the number of utterances for training in one mini-batch. As we use ddp for distributed training, the number of utterances in one mini-batch = num of gpus used * the number of utterances for dataloader * accumulate_grad_batches. In the above command, we have 2 utterances in one mini-batch, i.e. 1 2 1.

Resume training from a checkpoint:

python NBSSCLI.py fit --config=logs/NBSS/version_x/config.yaml \
 --data.batch_size=[2,2] \
 --trainer.accumulate_grad_batches=1 \ 
 --trainer.gpus=0, \ 
 --ckpt_path=logs/NBSS/version_x/checkpoints/last.ckpt

where version_x should be replaced with the version you want to resume.

Test the model trained (Dataset with different seeds will generate different wavs):

python NBSSCLI.py test --config=logs/NBSS/version_x/config.yaml \ 
 --ckpt_path=logs/NBSS/version_x/checkpoints/epochY_neg_si_sdrZ.ckpt \ 
 --trainer.gpus=0, \ 
 --data.seeds="{'train':null,'val':2,'test':3}" \ 
 --data.audio_time_len="['headtail 4', 'headtail 4', 'headtail 4']"

where headtail is the speech overlap way and it can be mid, full, or startend (please refer to [3]).

Module Version

see models/arch/NBSS.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Audio-WestlakeU / NBSS

Programming Languages

Labels

Projects that are alternatives of or similar to NBSS

Multi-channel Narrow-band Deep Speech Separation

Requirements

Generate rirs

Train & Test

Module Version