Alternatives and detailed information of pytorch_FFTNet

yoyololicon / pytorch_FFTNet

Licence: other

A pytorch implementation of FFTNet.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch FFTNet

wavenet-like-vocoder

Basic wavenet and fftnet vocoder model.

Stars: ✭ 20 (-42.86%)

Mutual labels: vocoder, fftnet

FFTNet

FFTNet: a Real-Time Speaker-Dependent Neural Vocoder

Stars: ✭ 63 (+80%)

Mutual labels: vocoder, fftnet

GlottDNN

GlottDNN vocoder and tools for training DNN excitation models

Stars: ✭ 30 (-14.29%)

Mutual labels: vocoder

Tensorflowtts

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+6705.71%)

Mutual labels: vocoder

Tts

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Stars: ✭ 5,427 (+15405.71%)

Mutual labels: vocoder

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (+108.57%)

Mutual labels: vocoder

codec2 talkie

Turn your Android phone into Codec2 Walkie-Talkie (Bluetooth/USB/TCPIP KISS modem client for DV digital voice communication)

Stars: ✭ 65 (+85.71%)

Mutual labels: vocoder

LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (+91.43%)

Mutual labels: vocoder

universal-vocoder

A PyTorch implementation of the universal neural vocoder

Stars: ✭ 46 (+31.43%)

Mutual labels: vocoder

melgan

MelGAN implementation with Multi-Band and Full Band supports...

Stars: ✭ 54 (+54.29%)

Mutual labels: vocoder

WorldInApple

Swift wrapper for vocoder World(https://github.com/mmorise/World)

Stars: ✭ 18 (-48.57%)

Mutual labels: vocoder

magphase

MagPhase Vocoder: Speech analysis/synthesis system for TTS and related applications.

Stars: ✭ 76 (+117.14%)

Mutual labels: vocoder

vietTTS

Vietnamese Text to Speech library

Stars: ✭ 78 (+122.86%)

Mutual labels: vocoder

This is a pytorch implementation of FFTNet described here. Work in progress.

Quick Start

Install requirements

pip install -r requirements.txt

Download CMU_ARCTIC dataset.
Train the model and save. The default parameters are pretty much the same as int the original paper. Raise the flag --preprocess when execute the first time.

python train.py \
    --preprocess \
    --wav_dir your_downloaded_wav_dir \
    --data_dir preprocessed_feature_dir \
    --model_file saved_model_name \

Use trained model to decode/reconstruct a wav file from the mcc feature.

python decode.py \
    --infile wav_file
    --outfile reconstruct_file_name
    --data_dir preprocessed_feature_dir \
    --model_file saved_model_name \

FFTNet_generator and FFTNet_vocoder are two files I used to test the model workability using torchaudio yesno dataset.

Current result

There are some files decoded in the samples folder.

Differences from paper

window size: 400 >> depend on minimum_f0 (cuz I use pyworld to get f0 and mcc coefficients)

TODO

Zero padding.
Injected noise.
Voiced/unvoiced conditional sampling.
Post-synthesis denoising.

Notes

I combine two 1x1 convolution kernel to one 1x2 dilated kernel. This can remove redundant bias parameters and accelerate total speed.
The author said in the middle layers the channels size are 128 not 256.
My model will get stuck at the begining (loss aroung 4.x) for thousands of step, then go down very fast to 2.6 ~ 3.0. Use smaller learning rate can help a little bit.

Variations of FFTNet

Radix-N FFTNet

Use the flag --radixs to specify each layer's radix.

# a radix-4 FFTNet with 1024 receptive field
python train.py --radixs 4 4 4 4 4

The original FFtNet use Radix-2 structure. In my experiment, a radix-4 network can still achieved similar result, even radix-8, and by reduce the number of layers, it can run faster.

Transposed FFTNet

Fig. 2 in the paper can be redraw as dilated structure with kernel size 2 (also means radix size 2).

If we draw all the lines;

and transpose the the graph to let the arrows go backward, you'll find a WaveNet dilated structure.

Add the flag --transpose, you can get a simplified version of WaveNet.

# a WaveNet-like structure model withou gated/residual/skip unit.
python train.py --transpose

In my experiment, the transposed models are more easy to train and have slightly lower training loss compare to FFTNet.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yoyololicon / pytorch_FFTNet

Programming Languages

Labels

Projects that are alternatives of or similar to pytorch FFTNet

Quick Start

Current result

Differences from paper

TODO

Notes

Variations of FFTNet

Radix-N FFTNet

Transposed FFTNet