Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hitachi-speech → Eend

hitachi-speech / Eend

Licence: mit

End-to-End Neural Diarization

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning chainer kaldi end-to-end

Projects that are alternatives of or similar to Eend

Espnet

End-to-End Speech Processing Toolkit

Stars: ✭ 4,533 (+2862.75%)

Mutual labels: chainer, kaldi, end-to-end

DSTC6-End-to-End-Conversation-Modeling

DSTC6: End-to-End Conversation Modeling Track

Stars: ✭ 56 (-63.4%)

Mutual labels: chainer, end-to-end

Espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Stars: ✭ 808 (+428.1%)

Mutual labels: kaldi, end-to-end

Comicolorization

This is the implementation of the "Comicolorization: Semi-automatic Manga Colorization"

Stars: ✭ 99 (-35.29%)

Mutual labels: chainer

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+786.93%)

Mutual labels: kaldi

Tf Kaldi Speaker

Neural speaker recognition/verification system based on Kaldi and Tensorflow

Stars: ✭ 117 (-23.53%)

Mutual labels: kaldi

Listen Attend Spell

A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.

Stars: ✭ 147 (-3.92%)

Mutual labels: end-to-end

Factorized Tdnn

PyTorch implementation of the Factorized TDNN (TDNN-F) from "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks" and Kaldi

Stars: ✭ 98 (-35.95%)

Mutual labels: kaldi

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+7188.24%)

Mutual labels: kaldi

Chainercv

ChainerCV: a Library for Deep Learning in Computer Vision

Stars: ✭ 1,463 (+856.21%)

Mutual labels: chainer

Adversarial text

Code for Adversarial Training Methods for Semi-Supervised Text Classification

Stars: ✭ 109 (-28.76%)

Mutual labels: chainer

Tacotron Pytorch

A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model

Stars: ✭ 104 (-32.03%)

Mutual labels: end-to-end

Chainer Pose Proposal Net

Chainer implementation of Pose Proposal Networks

Stars: ✭ 119 (-22.22%)

Mutual labels: chainer

Elpis

🙊 WIP software for creating speech recognition models.

Stars: ✭ 101 (-33.99%)

Mutual labels: kaldi

Chainer Pix2pix

chainer implementation of pix2pix

Stars: ✭ 130 (-15.03%)

Mutual labels: chainer

Pytorch Kaldi Neural Speaker Embeddings

A light weight neural speaker embeddings extraction based on Kaldi and PyTorch.

Stars: ✭ 99 (-35.29%)

Mutual labels: kaldi

Pytorch Asr

ASR with PyTorch

Stars: ✭ 124 (-18.95%)

Mutual labels: kaldi

Kiss

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"

Stars: ✭ 108 (-29.41%)

Mutual labels: chainer

E2e Asr

PyTorch Implementations for End-to-End Automatic Speech Recognition

Stars: ✭ 106 (-30.72%)

Mutual labels: end-to-end

Rnn Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Stars: ✭ 114 (-25.49%)

Mutual labels: end-to-end

View All Similar Projects ➔

EEND (End-to-End Neural Diarization)

EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

BLSTM EEND (INTERSPEECH 2019)
- https://www.isca-speech.org/archive/Interspeech_2019/abstracts/2899.html
Self-attentive EEND (ASRU 2019)
- https://ieeexplore.ieee.org/abstract/document/9003959/

The EEND extension for various number of speakers is also provided in this repository.

Self-attentive EEND with encoder-decoder based attractors
- https://arxiv.org/abs/2005.09921

Install tools

Requirements

NVIDIA CUDA GPU
CUDA Toolkit (8.0 <= version <= 10.1)

Install kaldi and python environment

cd tools
make

This command builds kaldi at tools/kaldi
- if you want to use pre-build kaldi
```
cd tools
make KALDI=<existing_kaldi_root>
```
  This option make a symlink at tools/kaldi
This command extracts miniconda3 at tools/miniconda3, and creates conda envirionment named 'eend'
Then, installs Chainer and cupy into 'eend' environment
- use CUDA in /usr/local/cuda/
  - if you need to specify your CUDA path
```
cd tools
make CUDA_PATH=/your/path/to/cuda-8.0
```
    This command installs cupy-cudaXX according to your CUDA version. See https://docs-cupy.chainer.org/en/stable/install.html#install-cupy

Test recipe (mini_librispeech)

Configuration

Modify egs/mini_librispeech/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Data preparation

cd egs/mini_librispeech/v1
./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh

If you use encoder-decoder based attractors [3], modify run.sh to use config/eda/{train,infer}.yaml
See RESULT.md and compare with your result.

CALLHOME two-speaker experiment

Configuraition

Modify egs/callhome/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl". If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.
Modify egs/callhome/v1/run_prepare_shared.sh according to storage paths of your corpora.

Data preparation

cd egs/callhome/v1
./run_prepare_shared.sh
# If you want to conduct 1-4 speaker experiments, run below.
# You also have to set paths to your corpora properly.
./run_prepare_shared_eda.sh

Self-attention-based model using 2-speaker mixtures

./run.sh

BLSTM-based model using 2-speaker mixtures

local/run_blstm.sh

Self-attention-based model with EDA using 1-4-speaker mixtures

./run_eda.sh

References

[1] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Permutation-free Objectives," Proc. Interspeech, pp. 4300-4304, 2019

[2] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, " End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019

[3] Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, " End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. INTERSPEECH, 2020

Citation

@inproceedings{Fujita2019Interspeech,
 author={Yusuke Fujita and Naoyuki Kanda and Shota Horiguchi and Kenji Nagamatsu and Shinji Watanabe},
 title={{End-to-End Neural Speaker Diarization with Permutation-free Objectives}},
 booktitle={Interspeech},
 pages={4300--4304}
 year=2019
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 153

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗