All Projects → hitachi-speech → Eend

hitachi-speech / Eend

Licence: mit
End-to-End Neural Diarization

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Eend

Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+2862.75%)
Mutual labels:  chainer, kaldi, end-to-end
DSTC6-End-to-End-Conversation-Modeling
DSTC6: End-to-End Conversation Modeling Track
Stars: ✭ 56 (-63.4%)
Mutual labels:  chainer, end-to-end
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+428.1%)
Mutual labels:  kaldi, end-to-end
Comicolorization
This is the implementation of the "Comicolorization: Semi-automatic Manga Colorization"
Stars: ✭ 99 (-35.29%)
Mutual labels:  chainer
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+786.93%)
Mutual labels:  kaldi
Tf Kaldi Speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
Stars: ✭ 117 (-23.53%)
Mutual labels:  kaldi
Listen Attend Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Stars: ✭ 147 (-3.92%)
Mutual labels:  end-to-end
Factorized Tdnn
PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi
Stars: ✭ 98 (-35.95%)
Mutual labels:  kaldi
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+7188.24%)
Mutual labels:  kaldi
Chainercv
ChainerCV: a Library for Deep Learning in Computer Vision
Stars: ✭ 1,463 (+856.21%)
Mutual labels:  chainer
Adversarial text
Code for Adversarial Training Methods for Semi-Supervised Text Classification
Stars: ✭ 109 (-28.76%)
Mutual labels:  chainer
Tacotron Pytorch
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Stars: ✭ 104 (-32.03%)
Mutual labels:  end-to-end
Chainer Pose Proposal Net
Chainer implementation of Pose Proposal Networks
Stars: ✭ 119 (-22.22%)
Mutual labels:  chainer
Elpis
🙊 WIP software for creating speech recognition models.
Stars: ✭ 101 (-33.99%)
Mutual labels:  kaldi
Chainer Pix2pix
chainer implementation of pix2pix
Stars: ✭ 130 (-15.03%)
Mutual labels:  chainer
Pytorch Kaldi Neural Speaker Embeddings
A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.
Stars: ✭ 99 (-35.29%)
Mutual labels:  kaldi
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-18.95%)
Mutual labels:  kaldi
Kiss
Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"
Stars: ✭ 108 (-29.41%)
Mutual labels:  chainer
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (-30.72%)
Mutual labels:  end-to-end
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (-25.49%)
Mutual labels:  end-to-end

EEND (End-to-End Neural Diarization)

EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

The EEND extension for various number of speakers is also provided in this repository.

Install tools

Requirements

  • NVIDIA CUDA GPU
  • CUDA Toolkit (8.0 <= version <= 10.1)

Install kaldi and python environment

cd tools
make
  • This command builds kaldi at tools/kaldi
    • if you want to use pre-build kaldi
      cd tools
      make KALDI=<existing_kaldi_root>
      
      This option make a symlink at tools/kaldi
  • This command extracts miniconda3 at tools/miniconda3, and creates conda envirionment named 'eend'
  • Then, installs Chainer and cupy into 'eend' environment

Test recipe (mini_librispeech)

Configuration

  • Modify egs/mini_librispeech/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Data preparation

cd egs/mini_librispeech/v1
./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh
  • If you use encoder-decoder based attractors [3], modify run.sh to use config/eda/{train,infer}.yaml
  • See RESULT.md and compare with your result.

CALLHOME two-speaker experiment

Configuraition

  • Modify egs/callhome/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.
  • Modify egs/callhome/v1/run_prepare_shared.sh according to storage paths of your corpora.

Data preparation

cd egs/callhome/v1
./run_prepare_shared.sh
# If you want to conduct 1-4 speaker experiments, run below.
# You also have to set paths to your corpora properly.
./run_prepare_shared_eda.sh

Self-attention-based model using 2-speaker mixtures

./run.sh

BLSTM-based model using 2-speaker mixtures

local/run_blstm.sh

Self-attention-based model with EDA using 1-4-speaker mixtures

./run_eda.sh

References

[1] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Permutation-free Objectives," Proc. Interspeech, pp. 4300-4304, 2019

[2] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019

[3] Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, " End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. INTERSPEECH, 2020

Citation

@inproceedings{Fujita2019Interspeech,
 author={Yusuke Fujita and Naoyuki Kanda and Shota Horiguchi and Kenji Nagamatsu and Shinji Watanabe},
 title={{End-to-End Neural Speaker Diarization with Permutation-free Objectives}},
 booktitle={Interspeech},
 pages={4300--4304}
 year=2019
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].