All Projects → microsoft → UniSpeech

microsoft / UniSpeech

Licence: other
UniSpeech - Large Scale Self-Supervised Learning for Speech

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects

Projects that are alternatives of or similar to UniSpeech

Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+4878.13%)
Mutual labels:  speech, speech-recognition, speaker-verification
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+560.27%)
Mutual labels:  speech, speech-recognition, speaker-verification
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+275.45%)
Mutual labels:  speech-recognition, speech-processing, speech-separation
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+8.04%)
Mutual labels:  speech, speech-recognition, speech-processing
awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Stars: ✭ 150 (-33.04%)
Mutual labels:  speech-recognition, speech-processing
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-90.62%)
Mutual labels:  speech, speech-recognition
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-90.18%)
Mutual labels:  speech, speech-processing
awesome-speech-enhancement
A curated list of awesome Speech Enhancement papers, libraries, datasets, and other resources.
Stars: ✭ 48 (-78.57%)
Mutual labels:  speech-processing, speech-separation
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-20.09%)
Mutual labels:  speech, speech-recognition
speechrec
a simple speech recognition app using the Web Speech API Interfaces
Stars: ✭ 18 (-91.96%)
Mutual labels:  speech-recognition, speech-processing
QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Stars: ✭ 71 (-68.3%)
Mutual labels:  speech-recognition, speech-processing
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-90.62%)
Mutual labels:  speech, speech-recognition
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-76.34%)
Mutual labels:  speech-recognition, speech-processing
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-60.27%)
Mutual labels:  speech, speech-recognition
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (-71.87%)
Mutual labels:  speech-recognition, speech-processing
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-93.75%)
Mutual labels:  speech, speech-recognition
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (-8.48%)
Mutual labels:  speech-recognition, speech-processing
D-TDNN
PyTorch implementation of Densely Connected Time Delay Neural Network
Stars: ✭ 60 (-73.21%)
Mutual labels:  speech, speaker-verification
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (-63.39%)
Mutual labels:  speech, speech-recognition
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-83.04%)
Mutual labels:  speaker-verification, speech-processing

UniSpeech

The family of UniSpeech:

WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

ILS-SSL (ICASSP 2022 Submission): Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision

Model introductions, evaluation results, and model inference instructions are located in their corresponding folders. The source code is here [https://github.com/microsoft/UniSpeech/tree/main/src].

Update

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model Pretraining Dataset Finetuning Dataset Model
UniSpeech Large EN Labeled: 1350 hrs en - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it - download
Unispeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr - download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr 1 hr fr download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es 1 hr es download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it 1 hr it download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky 1 hr ky download
UniSpeech-SAT Base 960 hrs LibriSpeech - download
UniSpeech-SAT Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
UniSpeech-SAT Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
WavLM Base 960 hrs LibriSpeech - Azure Storage
Google Drive
WavLM Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive
WavLM Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive

Universal Representation Evaluation on SUPERB

alt text

Downstream Task Performance

We also evaluate our models on typical speaker related benchmarks.

Speaker Verification

Finetune the model with VoxCeleb2 dev data, and evaluate it on the VoxCeleb1

Model Fix pre-train Vox1-O Vox1-E Vox1-H
ECAPA-TDNN - 0.87 1.12 2.12
HuBERT large Yes 0.888 0.912 1.853
Wav2Vec2.0 (XLSR) Yes 0.915 0.945 1.895
UniSpeech-SAT large Yes 0.771 0.781 1.669
WavLM large Yes 0.59 0.65 1.328
WavLM large No 0.505 0.579 1.176
+Large Margin Finetune and Score Calibration
HuBERT large No 0.585 0.654 1.342
Wav2Vec2.0 (XLSR) No 0.564 0.605 1.23
UniSpeech-SAT large No 0.564 0.561 1.23
WavLM large (New) No 0.33 0.477 0.984

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Speech Separation

Evaluation on LibriCSS

Model 0S 0L OV10 OV20 OV30 OV40
Conformer (SOTA) 4.5 4.4 6.2 8.5 11 12.6
UniSpeech-SAT base 4.4 4.4 5.4 7.2 9.2 10.5
UniSpeech-SAT large 4.3 4.2 5.0 6.3 8.2 8.8
WavLM base+ 4.5 4.4 5.6 7.5 9.4 10.9
WavLM large 4.2 4.1 4.8 5.8 7.4 8.5

Speaker Diarization

Evaluation on CALLHOME

Model spk_2 spk_3 spk_4 spk_5 spk_6 spk_all
EEND-vector clustering 7.96 11.93 16.38 21.21 23.1 12.49
EEND-EDA clustering (SOTA) 7.11 11.88 14.37 25.95 21.95 11.84
UniSpeech-SAT large 5.93 10.66 12.9 16.48 23.25 10.92
WavLM Base 6.99 11.12 15.20 16.48 21.61 11.75
WavLm large 6.46 10.69 11.84 12.89 20.70 10.35

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

@inproceedings{Wang2021UniSpeech,
  author    = {Chengyi Wang and Yu Wu and Yao Qian and Kenichi Kumatani and Shujie Liu and Furu Wei and Michael Zeng and Xuedong Huang},
  editor    = {Marina Meila and Tong Zhang},
  title     = {UniSpeech: Unified Speech Representation Learning with Labeled and
               Unlabeled Data},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning,
               {ICML} 2021, 18-24 July 2021, Virtual Event},
  series    = {Proceedings of Machine Learning Research},
  volume    = {139},
  pages     = {10937--10947},
  publisher = {{PMLR}},
  year      = {2021},
  url       = {http://proceedings.mlr.press/v139/wang21y.html},
  timestamp = {Thu, 21 Oct 2021 16:06:12 +0200},
  biburl    = {https://dblp.org/rec/conf/icml/0002WQK0WZ021.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Chen2021WavLM,
  title   = {WavLM: Large-Scale Self-Supervised  Pre-training   for Full Stack Speech Processing},
  author  = {Sanyuan Chen and Chengyi Wang and Zhengyang Chen and Yu Wu and Shujie Liu and Zhuo Chen and Jinyu Li and Naoyuki Kanda and Takuya Yoshioka and Xiong Xiao and Jian Wu and Long Zhou and Shuo Ren and Yanmin Qian and Yao Qian and Jian Wu and Michael Zeng and Furu Wei},
  eprint={2110.13900},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}
@article{Chen2021UniSpeechSAT,
  title   = {UniSpeech-SAT: Universal Speech Representation Learning with  Speaker Aware Pre-Training},
  author  = {Sanyuan Chen and Yu Wu and Chengyi Wang and Zhengyang Chen and Zhuo Chen and Shujie Liu and   Jian Wu and Yao Qian and Furu Wei and Jinyu Li and  Xiangzhan Yu},
  eprint={2110.05752},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].