All Projects → galgreshler → Catch-A-Waveform

galgreshler / Catch-A-Waveform

Licence: other
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Catch-A-Waveform

Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (+80.34%)
Mutual labels:  speech-synthesis
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (-28.21%)
Mutual labels:  speech-synthesis
NanoFlow
PyTorch implementation of the paper "NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity." (NeurIPS 2020)
Stars: ✭ 63 (-46.15%)
Mutual labels:  speech-synthesis
Normit
Translations with speech synthesis in your terminal as a node package
Stars: ✭ 219 (+87.18%)
Mutual labels:  speech-synthesis
ttsflow
tensorflow speech synthesis c++ inference for voicenet
Stars: ✭ 17 (-85.47%)
Mutual labels:  speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+152.14%)
Mutual labels:  speech-synthesis
Lingvo
Lingvo
Stars: ✭ 2,361 (+1917.95%)
Mutual labels:  speech-synthesis
Dyci2Lib
"Dicy2 for Max" is a Max package implementing interactive agents using machine-learning to generate musical sequences that can be integrated into musical situations ranging from the production of structured material within a compositional process to the design of autonomous agents for improvised interaction. Check also our plugin for Ableton live !
Stars: ✭ 35 (-70.09%)
Mutual labels:  music-generation
voder
An emulation of the Voder Speech Synthesizer.
Stars: ✭ 19 (-83.76%)
Mutual labels:  speech-synthesis
wiki2ssml
Wiki2SSML provides the WikiVoice markup language used for fine-tuning synthesised voice.
Stars: ✭ 31 (-73.5%)
Mutual labels:  speech-synthesis
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (+106.84%)
Mutual labels:  speech-synthesis
tacotron2
Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.
Stars: ✭ 17 (-85.47%)
Mutual labels:  speech-synthesis
GlottDNN
GlottDNN vocoder and tools for training DNN excitation models
Stars: ✭ 30 (-74.36%)
Mutual labels:  speech-synthesis
Tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Stars: ✭ 2,581 (+2105.98%)
Mutual labels:  speech-synthesis
InpaintNet
Code accompanying ISMIR'19 paper titled "Learning to Traverse Latent Spaces for Musical Score Inpaintning"
Stars: ✭ 48 (-58.97%)
Mutual labels:  music-generation
Universalvocoding
A PyTorch implementation of "Robust Universal Neural Vocoding"
Stars: ✭ 197 (+68.38%)
Mutual labels:  speech-synthesis
sam
Software Automatic Mouth - Tiny Speech Synthesizer
Stars: ✭ 316 (+170.09%)
Mutual labels:  speech-synthesis
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (-44.44%)
Mutual labels:  speech-synthesis
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-54.7%)
Mutual labels:  speech-synthesis
sova-tts-engine
Tacotron2 based engine for the SOVA-TTS project
Stars: ✭ 63 (-46.15%)
Mutual labels:  speech-synthesis

Catch-A-Waveform

Project Website | Paper

Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)

Generate audio from a single audio input

Catch-A-Waveform's Applications

Install dependencies

python -m pip install -r requirements.txt

Training

Unconditional Generation

To train for unconditional inference or bandwidth extension, just place an audio signal inside the inputs folder and provide it's name. The default extension is .wav, for a file with a different extension, provide the name with extension:

python train_main.py --input_file <input_file_name>

For speech signals, train with --speech flag:

python train_main.py --input_file <input_file_name> --speech

Inpainting

To train for inpainting task, set run_mode to inpainting and provide the indices of the hole start and end, in samples, through the parameter inpainting_indices:

python train_main.py --input_file <input_file_name> --run_mode inpainting --inpainting_indices <hole_start_idx> <hole_end_idx>

Inpainting multiple holes can be done by providing multiple indices, e.g.:

python train_main.py --input_file <input_file_name> --run_mode inpainting --inpainting_indices <hole1_start_idx> <hole1_end_idx> <hole2_start_idx> <hole2_end_idx> ...

Denoising

To train denoising task, set run_mode to denosing:

python train_main.py --input_file <input_file_name> --run_mode denoising

Inference

Uncoditional generation

After training, a directory named after the input file will be created in the outputs folder. To inference from a trained model simply run:

python generate_main.py --input_folder <model_folder_name>

This will generate a 30 [sec] length signal in the model folders, inside GeneratedSignals. To create multiple signals with various length, you can use the n_signals and length flags, for example:

python generate_main.py --input_folder <model_folder_name> --n_signals 3 --length 60

To write signals of all scales, use the flag --generate_all_scales.

Create music variations

To create variations of a given song, while enforcing general structure of the input (See sec. 4.2 in our paper), use the --condition flag:

python generate_main.py --input_folder <model_folder_name> --condition

Bandwidth Exentsion

To perform bandwidth extension with a trained model run the following:

python extend.py --input_folder <model_folder_name> --lr_signal <low_resolution_signal_file_name>

lr_signal is a path to a low resolution audio (i.e. it's sample rate is lower than the model's). The extended output will be created in the GeneratedSignals folder of the used model. In order to calculate SNR and LSD of the extended signal, place the low-resolution and ground-truth high-resolution signals in the inputs folder with corresponding filenames: <file_name>_lr and <file_name>_hr. You optionally provide the frequency response of the anti-aliasing filter used to create the lr signal. In the inputs folder put a text file with two lines: the real and imaginary parts of the frequency response. The inverse of the filter will be used to inverse the transient area of the lr signal, and can slightly improve SNR.

Inpainting

Inpainting is done by generating the missing part in the reconstruction signal, and then stitching it with the input:

python inpaint.py --input_folder <model_folder_name>

You can create different inpainting realization by adding the --new flag.

Denoising

The denoised signal is just the reconstructed signal, so after training on a noisy signal, simply run:

python generate_main.py --input_folder <model_folder_name> --reconstruct

Run examples

Unconditional Generation

Music:

python train_main.py --input_file TenorSaxophone_MedleyDB_185

Speech:

python train_main.py --input_file trump_farewell_address_8 --speech

Bandwidth Extension

First, we train a model on several concatenated sentences from VCTK Corpus:

python train_main.py --input_file VCTK_p347_363_to_371 --speech

Then we take as input a low-resolution version of new sentence by the same speaker and extend it with the trained model:

python extend.py --input_folder VCTK_p347_363_to_371 --lr_signal VCTK_p347_410_lr

To run with correcting the lr signal's transient run:

python extend.py --input_folder VCTK_p347_363_to_371 --lr_signal VCTK_p347_410_lr --filter_file libDs4H

libDs4H was achieved by performing FFT(s_hr)/FFT(s_lr) where s_lr is the lr signal created by librosa.resample.

Inpainting

python train_main.py --input_file FMA_rock_for_inpainting --run_mode inpainting --inpainting_indices 89164 101164

Denoising

python train_main.py --input_file JosephJoachim_BachAdagio_1904 --run_mode denoising --init_sample_rate 10000

Here we set the init_sample_rate to be 10KHz (default is 16Khz) since the old recording has limited bandwidth.

Pretrained Models

Instead of running the examples yourself, you can download the pretrained generators and just perform inference. After downloaing the folders, put them inside outputs folder and run inference.

The models can be downloaded from Google Drive.

Citation

If you use this code in your research, please cite our paper:

@article{greshler2021catch,
  title={Catch-a-waveform: Learning to generate audio from a single short example},
  author={Greshler, Gal and Shaham, Tamar and Michaeli, Tomer},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

Credits

The examples signals are taken from the following websites:

Some code was adapted from:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].