All Projects → VITA-Group → AutoSpeech

VITA-Group / AutoSpeech

Licence: MIT license
[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to AutoSpeech

Mtlnas
[CVPR 2020] MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
Stars: ✭ 58 (-70.26%)
Mutual labels:  automl, neural-architecture-search
Petridishnn
Code for the neural architecture search methods contained in the paper Efficient Forward Neural Architecture Search
Stars: ✭ 112 (-42.56%)
Mutual labels:  automl, neural-architecture-search
Autodl Projects
Automated deep learning algorithms implemented in PyTorch.
Stars: ✭ 1,187 (+508.72%)
Mutual labels:  automl, neural-architecture-search
Morph Net
Fast & Simple Resource-Constrained Learning of Deep Network Structure
Stars: ✭ 937 (+380.51%)
Mutual labels:  automl, neural-architecture-search
Awesome Autodl
A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.
Stars: ✭ 1,819 (+832.82%)
Mutual labels:  automl, neural-architecture-search
Efficientnas
Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search https://arxiv.org/abs/1807.06906
Stars: ✭ 44 (-77.44%)
Mutual labels:  automl, neural-architecture-search
Pnasnet.tf
TensorFlow implementation of PNASNet-5 on ImageNet
Stars: ✭ 102 (-47.69%)
Mutual labels:  automl, neural-architecture-search
Adanet
Fast and flexible AutoML with learning guarantees.
Stars: ✭ 3,340 (+1612.82%)
Mutual labels:  automl, neural-architecture-search
Nas Benchmark
"NAS evaluation is frustratingly hard", ICLR2020
Stars: ✭ 126 (-35.38%)
Mutual labels:  automl, neural-architecture-search
Amla
AutoML frAmework for Neural Networks
Stars: ✭ 119 (-38.97%)
Mutual labels:  automl, neural-architecture-search
Devol
Genetic neural architecture search with Keras
Stars: ✭ 925 (+374.36%)
Mutual labels:  automl, neural-architecture-search
Naszilla
Naszilla is a Python library for neural architecture search (NAS)
Stars: ✭ 181 (-7.18%)
Mutual labels:  automl, neural-architecture-search
Awesome Automl And Lightweight Models
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Stars: ✭ 691 (+254.36%)
Mutual labels:  automl, neural-architecture-search
Autokeras
AutoML library for deep learning
Stars: ✭ 8,269 (+4140.51%)
Mutual labels:  automl, neural-architecture-search
Hpbandster
a distributed Hyperband implementation on Steroids
Stars: ✭ 456 (+133.85%)
Mutual labels:  automl, neural-architecture-search
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+5386.15%)
Mutual labels:  automl, neural-architecture-search
Pnasnet.pytorch
PyTorch implementation of PNASNet-5 on ImageNet
Stars: ✭ 309 (+58.46%)
Mutual labels:  automl, neural-architecture-search
Darts
Differentiable architecture search for convolutional and recurrent networks
Stars: ✭ 3,463 (+1675.9%)
Mutual labels:  automl, neural-architecture-search
Deephyper
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks
Stars: ✭ 117 (-40%)
Mutual labels:  automl, neural-architecture-search
Sgas
SGAS: Sequential Greedy Architecture Search (CVPR'2020) https://www.deepgcns.org/auto/sgas
Stars: ✭ 137 (-29.74%)
Mutual labels:  automl, neural-architecture-search

AutoSpeech: Neural Architecture Search for Speaker Recognition

License: MIT

Code for this paper AutoSpeech: Neural Architecture Search for Speaker Recognition

Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang

Overview

Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be naturally fit for speaker recognition. Due to the prohibitive complexity of manually exploring the design space, we propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech. Our evaluation results on VoxCeleb1 demonstrate that the derived CNN architectures from the proposed approach significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.

Results

Our proposed approach outperforms speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 backbones. The detailed comparison can be found in our paper.

Method Top-1 EER Parameters Pretrained model
VGG-M 80.50 10.20 67M iden/veri
ResNet-18 79.48 12.30 12M iden, veri
ResNet-34 81.34 11.99 22M iden, veri
Proposed 87.66 8.95 18M iden, veri

Visualization

left: normal cell. right: reduction cell

progress_convolutional_normal progress_convolutional_reduce

Quick start

Requirements

  • Python 3.7

  • Pytorch>=1.0: pip install torch torchvision

  • Other dependencies: pip install -r requirements

Dataset

VoxCeleb1: You will need DevA-DevD and Test parts. Additionally, you will need original files: vox1_meta.csv, iden_split.txt, and veri_test.txt from official website. Alternatively, the dataset can be downloaded using dl_script.sh.

The data should be organized as:

  • VoxCeleb1
    • dev/wav/...
    • test/wav/...
    • vox1_meta.csv
    • iden_split.txt
    • veri_test.txt

Running the code

  • data preprocess:

    python data_preprocess.py /path/to/VoxCeleb1

    The output folder of it should be:

    • feature
      • dev
      • test
      • merged

    dev and test are used for verification, and merged are used for identification.

  • Training and evaluating ResNet-18, ResNet-34 baselines:

    python train_baseline_identification.py --cfg exps/baseline/resnet18_iden.yaml

    python train_baseline_verification.py --cfg exps/baseline/resnet18_veri.yaml

    python train_baseline_identification.py --cfg exps/baseline/resnet34_iden.yaml

    python train_baseline_verification.py --cfg exps/baseline/resnet34_veri.yaml

    You need to modify the DATA_DIR field in .yaml file.

  • Architecture search:

    python search.py --cfg exps/search.yaml

    You need to modify the DATA_DIR field in .yaml file.

  • Training from scratch for identification:

    python train_identification.py --cfg exps/scratch/scratch.yaml --text_arch GENOTYPE

    You need to modify the DATA_DIR field in .yaml file.

    GENOTYPE is the search architecture object. For example, the GENOTYPE of the architecture report in the paper is:

    "Genotype(normal=[('dil_conv_5x5', 1), ('dil_conv_3x3', 0), ('dil_conv_5x5', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 2), ('dil_conv_3x3', 2), ('max_pool_3x3', 1)], normal_concat=range(2, 6), reduce=[('max_pool_3x3', 1), ('max_pool_3x3', 0), ('dil_conv_5x5', 2), ('max_pool_3x3', 1), ('dil_conv_5x5', 3), ('dil_conv_3x3', 2), ('dil_conv_5x5', 4), ('dil_conv_5x5', 2)], reduce_concat=range(2, 6))"

  • Training from scratch for verification:

    python train_verification.py --cfg exps/scratch/scratch.yaml --text_arch GENOTYPE

  • Evaluation:

    • Identification

      python evaluate_identification.py --cfg exps/scratch/scratch_iden.yaml --load_path /path/to/the/trained/model

    • Verification

      python evaluate_verification.py --cfg exps/scratch/scratch_veri.yaml --load_path /path/to/the/trained/model

Citation

If you use this code for your research, please cite our paper.

@misc{ding2020autospeech,
    title={AutoSpeech: Neural Architecture Search for Speaker Recognition},
    author={Shaojin Ding and Tianlong Chen and Xinyu Gong and Weiwei Zha and Zhangyang Wang},
    year={2020},
    eprint={2005.03215},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

Acknowledgement

Part of the codes are adapted from deep-speaker and Real-Time-Voice-Cloning.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].