All Projects → mostafaelaraby → wavegan-pytorch

mostafaelaraby / wavegan-pytorch

Licence: Apache-2.0 license
Pytorch Implementation of wavegan model to generate audio

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to wavegan-pytorch

MusicTransformer-Pytorch
MusicTransformer written for MaestroV2 using the Pytorch framework for music generation
Stars: ✭ 106 (+9.28%)
Mutual labels:  music-generation
seqgan-music
Implementation of a paper "Polyphonic Music Generation with Sequence Generative Adversarial Networks" in TensorFlow
Stars: ✭ 21 (-78.35%)
Mutual labels:  music-generation
InpaintNet
Code accompanying ISMIR'19 paper titled "Learning to Traverse Latent Spaces for Musical Score Inpaintning"
Stars: ✭ 48 (-50.52%)
Mutual labels:  music-generation
MidiTok
A convenient MIDI / symbolic music tokenizer for Deep Learning networks, with multiple strategies 🎶
Stars: ✭ 180 (+85.57%)
Mutual labels:  music-generation
genmusic
Generative Music- a stochastic modal music generator
Stars: ✭ 17 (-82.47%)
Mutual labels:  music-generation
python-twelve-tone
🎶 12-tone matrix to generate dodecaphonic melodies 🎶
Stars: ✭ 68 (-29.9%)
Mutual labels:  music-generation
lakh-pianoroll-dataset
A collection of 174,154 multi-track piano-rolls
Stars: ✭ 64 (-34.02%)
Mutual labels:  music-generation
bmusegan
Code for “Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation”
Stars: ✭ 58 (-40.21%)
Mutual labels:  music-generation
generating-music
🎷 Artificial Composition of Multi-Instrumental Polyphonic Music
Stars: ✭ 28 (-71.13%)
Mutual labels:  music-generation
DeepMusic
A python package for high level musical data manipulation and preprocessing, making data ready to be fed to a neural network.
Stars: ✭ 24 (-75.26%)
Mutual labels:  ai-music
Deep-Learning-Coursera
Projects from the Deep Learning Specialization from deeplearning.ai provided by Coursera
Stars: ✭ 123 (+26.8%)
Mutual labels:  music-generation
facet
Facet is a live coding system for algorithmic music
Stars: ✭ 72 (-25.77%)
Mutual labels:  music-generation
classifying-vae-lstm
music generation with a classifying variational autoencoder (VAE) and LSTM
Stars: ✭ 27 (-72.16%)
Mutual labels:  music-generation
Music-generation-cRNN-GAN
cRNN-GAN to generate music by training on instrumental music (midi)
Stars: ✭ 38 (-60.82%)
Mutual labels:  music-generation
Dyci2Lib
"Dicy2 for Max" is a Max package implementing interactive agents using machine-learning to generate musical sequences that can be integrated into musical situations ranging from the production of structured material within a compositional process to the design of autonomous agents for improvised interaction. Check also our plugin for Ableton live !
Stars: ✭ 35 (-63.92%)
Mutual labels:  music-generation
melodyoflife
Melody of Life is a step sequencer using cellular automata
Stars: ✭ 38 (-60.82%)
Mutual labels:  music-generation
fusion gan
Codes for the paper 'Learning to Fuse Music Genres with Generative Adversarial Dual Learning' ICDM 17
Stars: ✭ 18 (-81.44%)
Mutual labels:  music-generation
hum2song
Hum2Song: Multi-track Polyphonic Music Generation from Voice Melody Transcription with Neural Networks
Stars: ✭ 61 (-37.11%)
Mutual labels:  music-generation
Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
Stars: ✭ 117 (+20.62%)
Mutual labels:  music-generation
Introtodeeplearning
Lab Materials for MIT 6.S191: Introduction to Deep Learning
Stars: ✭ 4,955 (+5008.25%)
Mutual labels:  music-generation

WaveGAN v2 Pytorch

Pytorch implementation of WaveGAN , a machine learning algorithm which learns to generate raw audio waveforms.

  • In v2 Added ability to train WaveGANs capable of generating longer audio examples (up to 4 seconds at 16kHz)
  • In v2 Added ability to train WaveGANs capable of generating multi-channel audio

This is the ported Pytorch implementation of WaveGAN (Donahue et al. 2018) (paper) (demo) (sound examples). WaveGAN is a machine learning algorithm which learns to synthesize raw waveform audio by observing many examples of real audio. WaveGAN is comparable to the popular DCGAN approach (Radford et al. 2016) for learning to generate images.

In this repository, we include an implementation of WaveGAN capable of learning to generate up to 4 seconds of audio at 16kHz.

WaveGAN is capable of learning to synthesize audio in many different sound domains. In the above figure, we visualize real and WaveGAN-generated audio of speech, bird vocalizations, drum sound effects, and piano excerpts. These sound examples and more can be heard here.

Requirements

pip install -r requirements.txt

Datasets

WaveGAN can now be trained on datasets of arbitrary audio files (previously required preprocessing). You can use any folder containing audio, but here are a few example datasets to help you get started:

WaveGan Parameters (params.py)

  • target_signals_dir: folder including train subfolder contianing train wav data files
  • model_prefix: model name used for saving mode
  • n_iterations: number of train iterations
  • lr_g: generator learning rate
  • lr_d: discriminator learning rate
  • beta11: Adam optimizer first decay rate for moment estimates
  • beta2: Adam optimizer second decay rate for moment estimates
  • decay_lr: flag used to decay learning rate linearly through iterations till reaching zero at 100k iteration
  • generator_batch_size_factor: in some cases we might try to multiply batch size by a factor when updatng the generator to give it a more correct and meaningful signal from the discriminator
  • n_critic: updating the generator every n updates to the critic/ discriminator
  • p_coeff: gradient penalty regularization factor
  • batch_size: batch size during training default 10
  • noise_latent_dim: dimension of the latent dim used to generate waves
  • model_capacity_size: capacity of the model default 64 can be 32 when generating longer window length of 2-4 seconds
  • output_dir: directory that contains saved model and saved samples during the training
  • window_length: window length of the output utterance can be 16384 (1 sec), 32768 (2 sec), 65536 (4 sec)
  • manual_seed: model random seed
  • num_channels: to define number of channels used in the data

Samples

  • Model trained on piano dataset to generate 4 seconds using model capacity 32 for faster training
  • Latent space interpolation to check the model give the following image

- A sample audio can be found at sample (from an early iteration with 4 sec window)

Quality considerations

If your results are too noisy, try adding a post-processing filter . You may also want to change the amount of or remove phase shuffle from models.py . Increasing either the model size or filter length from models.py may improve results but will increase training time.

Monitoring

The train script will generate a fixed latent space and save output samples to the output dir specified in the params.

Contributions

This repo is based on chrisdonahue's , jtcramer's implementation and mazzzystar

Attribution

If you use this code in your research, cite via the following BibTeX:

@inproceedings{donahue2019wavegan,
  title={Adversarial Audio Synthesis},
  author={Donahue, Chris and McAuley, Julian and Puckette, Miller},
  booktitle={ICLR},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].