Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yistLin → FragmentVC

yistLin / FragmentVC

Licence: MIT license

Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch concatenative transformer attention-mechanism voice-conversion any-to-any

Projects that are alternatives of or similar to FragmentVC

Linear Attention Transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

Stars: ✭ 205 (+52.99%)

Mutual labels: transformer, attention-mechanism

enformer-pytorch

Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

Stars: ✭ 146 (+8.96%)

Mutual labels: transformer, attention-mechanism

Self Attention Cv

Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.

Stars: ✭ 209 (+55.97%)

Mutual labels: transformer, attention-mechanism

Transformer In Generating Dialogue

An Implementation of 'Attention is all you need' with Chinese Corpus

Stars: ✭ 121 (-9.7%)

Mutual labels: transformer, attention-mechanism

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-82.84%)

Mutual labels: transformer, attention-mechanism

Routing Transformer

Fully featured implementation of Routing Transformer

Stars: ✭ 149 (+11.19%)

Mutual labels: transformer, attention-mechanism

TianChi AIEarth

TianChi AIEarth Contest Solution

Stars: ✭ 57 (-57.46%)

Mutual labels: transformer, attention-mechanism

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Stars: ✭ 990 (+638.81%)

Mutual labels: transformer, attention-mechanism

Predict materials properties using only the composition information!

Stars: ✭ 57 (-57.46%)

Mutual labels: transformer, attention-mechanism

h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Stars: ✭ 121 (-9.7%)

Mutual labels: transformer, attention-mechanism

Overlappredator

[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.

Stars: ✭ 106 (-20.9%)

Mutual labels: transformer, attention-mechanism

a collection of visualization function

Stars: ✭ 189 (+41.04%)

Mutual labels: transformer, attention-mechanism

EQTransformer, a python package for earthquake signal detection and phase picking using AI.

Stars: ✭ 95 (-29.1%)

Mutual labels: transformer, attention-mechanism

A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.

Stars: ✭ 165 (+23.13%)

Mutual labels: transformer, attention-mechanism

Se3 Transformer Pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.

Stars: ✭ 73 (-45.52%)

Mutual labels: transformer, attention-mechanism

Transformers-RL

An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"

Stars: ✭ 107 (-20.15%)

Mutual labels: transformer, attention-mechanism

Neural Machine Translation with Keras

Stars: ✭ 501 (+273.88%)

Mutual labels: transformer, attention-mechanism

Awesome Bert Nlp

A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.

Stars: ✭ 567 (+323.13%)

Mutual labels: transformer, attention-mechanism

Implementation of E(n)-Transformer, which extends the ideas of Welling's E(n)-Equivariant Graph Neural Network to attention

Stars: ✭ 131 (-2.24%)

Mutual labels: transformer, attention-mechanism

Exploring attention weights in transformer-based models with linguistic knowledge.

Stars: ✭ 233 (+73.88%)

Mutual labels: transformer, attention-mechanism

View All Similar Projects ➔

FragmentVC

Here is the official implementation of the paper, FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention. In this paper we proposed FragmentVC, in which the latent phonetic structure of the utterance from the source speaker is obtained from Wav2Vec 2.0, while the spectral features of the utterance(s) from the target speaker are obtained from log mel-spectrograms. By aligning the hidden structures of the two different feature spaces with a two-stage training process, FragmentVC is able to extract fine-grained voice fragments from the target speaker utterance(s) and fuse them into the desired utterance, all based on the attention mechanism of Transformer as verified with analysis on attention maps, and is accomplished end-to-end.

The following are the overall model architecture and the conceptual illustration.

And the architecture of smoother blocks and extractor blocks.

For the audio samples and attention map analyses, please refer to our demo page.

Usage

You can download the pretrained model as well as the vocoder following the link under Releases section on the sidebar.

The whole project was developed using Python 3.8, torch 1.6, and the pretrained model as well as the vocoder were turned to TorchScript, so it's not guaranteed to be backward compatible. You can install the dependencies with

pip install -r requirements.txt

If you encounter any problems while installing fairseq, please refer to pytorch/fairseq for the installation instruction.

Wav2Vec

In our implementation, we're using Wav2Vec 2.0 Base w/o finetuning which is trained on LibriSpeech. You can download the checkpoint wav2vec_small.pt from pytorch/fairseq.

Vocoder

The WaveRNN-based neural vocoder is from yistLin/universal-vocoder which is based on the paper, Towards achieving robust universal neural vocoding.

Voice conversion with pretrained models

You can convert an utterance from source speaker with multiple utterances from target speaker, e.g.

python convert.py \
    -w <WAV2VEC_PATH> \
    -v <VOCODER_PATH> \
    -c <CHECKPOINT_PATH> \
    VCTK-Corpus/wav48/p225/p225_001.wav \ # source utterance
    VCTK-Corpus/wav48/p227/p227_002.wav \ # target utterance 1/3
    VCTK-Corpus/wav48/p227/p227_003.wav \ # target utterance 2/3
    VCTK-Corpus/wav48/p227/p227_004.wav \ # target utterance 3/3
    output.wav

Or you can prepare a conversion pairs information file in YAML format, like

# pairs_info.yaml
pair1:
    source: VCTK-Corpus/wav48/p225/p225_001.wav
    target:
        - VCTK-Corpus/wav48/p227/p227_001.wav
pair2:
    source: VCTK-Corpus/wav48/p225/p225_001.wav
    target:
        - VCTK-Corpus/wav48/p227/p227_002.wav
        - VCTK-Corpus/wav48/p227/p227_003.wav
        - VCTK-Corpus/wav48/p227/p227_004.wav

And convert multiple pairs at the same time, e.g.

python convert_batch.py \
    -w <WAV2VEC_PATH> \
    -v <VOCODER_PATH> \
    -c <CHECKPOINT_PATH> \
    pairs_info.yaml \
    outputs # the output directory of conversion results

After the conversion, the output directory, outputs, will be containing

pair1.wav
pair1.mel.png
pair1.attn.png
pair2.wav
pair2.mel.png
pair2.attn.png

where *.wav are the converted utterances, *.mel.png are the plotted mel-spectrograms of the formers, and *.attn.png are the attention map between Conv1d 1 and Extractor 3 (please refer to the model architecture above).

Train from scratch

Emperically, if you train the model on the CSTR VCTK Corpus, it would take 1 hr to preprocess the data and around 12 hr to train to 200K steps (on an RTX 2080 Ti).

Preprocessing

You can preprocess multiple corpora by passing multiple paths. But each path should be the directory that directly contains the speaker directories, i.e.

python preprocess.py \
    VCTK-Corpus/wav48 \
    LibriTTS/train-clean-360 \
    <WAV2VEC_PATH> \
    features  # the output directory of preprocessed features

After preprocessing, the output directory will be containing:

metadata.json
utterance-000x7gsj.tar
utterance-00wq7b0f.tar
utterance-01lpqlnr.tar
...

Training

python train.py features --save_dir ./ckpts

You can further specify --preload for preloading all training data into RAM to boost training speed. If --comment <COMMENT> is specified, e.g. --comment vctk, the training logs will be placed under a newly created directory like, logs/2020-02-02_12:34:56_vctk, otherwise there won't be any logging. For more details, you can refer to the usage by python train.py -h.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 134

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗