All Projects → kaituoxu → TasNet

kaituoxu / TasNet

Licence: other
A PyTorch implementation of Time-domain Audio Separation Network (TasNet) with Permutation Invariant Training (PIT) for speech separation.

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to TasNet

audio source separation
An implementation of audio source separation tools.
Stars: ✭ 41 (-49.38%)
Mutual labels:  source-separation, audio-separation
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+176.54%)
Mutual labels:  speech-separation
react-pits
React 中的坑
Stars: ✭ 29 (-64.2%)
Mutual labels:  pit
soundscape IR
Tools of soundscape information retrieval, this repository is a developing project. Please go to https://github.com/meil-brcas-org/soundscape_IR for full releases.
Stars: ✭ 23 (-71.6%)
Mutual labels:  source-separation
Paddle-Image-Models
A PaddlePaddle version image model zoo.
Stars: ✭ 131 (+61.73%)
Mutual labels:  pit
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+938.27%)
Mutual labels:  speech-separation
Singing-Voice-Separation-RNN
Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks
Stars: ✭ 44 (-45.68%)
Mutual labels:  source-separation
pitmp-maven-plugin
Maven plugin to handle multi module projects for PiTest
Stars: ✭ 36 (-55.56%)
Mutual labels:  pit
Voice-Separation-and-Enhancement
A framework for quick testing and comparing multi-channel speech enhancement and separation methods, such as DSB, MVDR, LCMV, GEVD beamforming and ICA, FastICA, IVA, AuxIVA, OverIVA, ILRMA, FastMNMF.
Stars: ✭ 60 (-25.93%)
Mutual labels:  speech-separation
awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Stars: ✭ 48 (-40.74%)
Mutual labels:  speech-separation
wavenet
Audio source separation (mixture to vocal) using the Wavenet
Stars: ✭ 20 (-75.31%)
Mutual labels:  source-separation
mann-for-speech-separation
Neural Turing machine for source separation in Tensorflow
Stars: ✭ 18 (-77.78%)
Mutual labels:  speech-separation
speaker extraction
target speaker extraction and verification for multi-talker speech
Stars: ✭ 85 (+4.94%)
Mutual labels:  source-separation
Calculate-SNR-SDR
Script to calculate SNR and SDR using python
Stars: ✭ 76 (-6.17%)
Mutual labels:  speech-separation
DeepSeparation
Keras Implementation and Experiments with Deep Recurrent Neural Networks for Source Separation
Stars: ✭ 19 (-76.54%)
Mutual labels:  source-separation
shadow
shadow table.
Stars: ✭ 12 (-85.19%)
Mutual labels:  pit
Deep-Clustering-for-Speech-Separation
Pytorch implements Deep Clustering: Discriminative Embeddings For Segmentation And Separation
Stars: ✭ 99 (+22.22%)
Mutual labels:  speech-separation
AMSS-Net
A PyTorch implementation of the paper: "AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries" (ACM Multimedia 2021)
Stars: ✭ 19 (-76.54%)
Mutual labels:  source-separation
TASNET
Time-domain Audio Separation Network (IN PYTORCH)
Stars: ✭ 18 (-77.78%)
Mutual labels:  tasnet
rl singing voice
Unsupervised Representation Learning for Singing Voice Separation
Stars: ✭ 18 (-77.78%)
Mutual labels:  source-separation

TasNet: Time-domain Audio Separation Network

A PyTorch implementation of "TasNet: Time-domain Audio Separation Network for Real-time, single-channel speech separation", published in ICASSP2018, by Yi Luo and Nima Mesgarani.

Results

Method Causal SDRi SI-SNRi Config
TasNet-BLSTM (Paper) No 11.1 10.8
TasNet-BLSTM (Here) No 11.84 11.54 L40 N500 hidden500 layer4 lr1e-3 epoch100 batch size10
TasNet-BLSTM (Here) No 11.77 11.46 + L2 1e-4
TasNet-BLSTM (Here) No 13.07 12.78 + L2 1e-5

Install

  • PyTorch 0.4.1+
  • Python3 (Recommend Anaconda)
  • pip install -r requirements.txt
  • If you need to convert wjs0 to wav format and generate mixture files, cd tools; make

Usage

If you already have mixture wsj0 data:

  1. $ cd egs/wsj0, modify wsj0 data path data to your path in the beginning of run.sh.
  2. $ bash run.sh, that's all!

If you just have origin wsj0 data (sphere format):

  1. $ cd egs/wsj0, modify three wsj0 data path to your path in the beginning of run.sh.
  2. Convert sphere format wsj0 to wav format and generate mixture. Stage 0 part provides an example.
  3. $ bash run.sh, that's all!

You can change hyper-parameter by $ bash run.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 3. See parameter name in egs/aishell/run.sh before . utils/parse_options.sh.

Workflow

Workflow of egs/wsj0/run.sh:

  • Stage 0: Convert sphere format to wav format and generate mixture (optional)
  • Stage 1: Generating json files including wav path and duration
  • Stage 2: Training
  • Stage 3: Evaluate separation performance
  • Stage 4: Separate speech using TasNet

More detail

# Set PATH and PYTHONPATH
$ cd egs/wsj0/; . ./path.sh
# Train:
$ train.py -h
# Evaluate performance:
$ evaluate.py -h
# Separate mixture audio:
$ separate.py -h

How to visualize loss?

If you want to visualize your loss, you can use visdom to do that:

  1. Open a new terminal in your remote server (recommend tmux) and run $ visdom
  2. Open a new terminal and run $ bash run.sh --visdom 1 --visdom_id "<any-string>" or $ train.py ... --visdom 1 --vidsdom_id "<any-string>"
  3. Open your browser and type <your-remote-server-ip>:8097, egs, 127.0.0.1:8097
  4. In visdom website, chose <any-string> in Environment to see your loss

How to resume training?

$ bash run.sh --continue_from <model-path>

TODO

  • Layer normlization described in paper
  • LSTM skip connection
  • Curriculum learning
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].