Upsampled to the original audio rate using nearest neighbor interpolation
The above figure is new version wavenet
The encoding audio is used to condition a WaveNet decoder. The conditioning signal is passed through a 1 × 1 layer that is different for each WaveNet layer
The WaveNet decoder has 4 blocks of 10 residual-layers
The input and output are quantized using 8-bit mu-law encoding
Loss fuction is softmax
Data Augmentation for FacebookNet
Uniformly select a segment of length between 0.25 and 0.5 seconds
Modulate its pitch by a random number between -0.5 and 0.5 of half-steps
Facebook Net for Audio Source Separation
Structure A, I made the decoding part to be same as encoding, removed downsample and upsample, removed confusion loss.
I used data augmentation strategy from u-wave-net paper. For example, A is mix audio, B is vocals and C is accompaniment. B * factor0 + C * factor1 = newA, I used newA as input and C*factor1 as label. Factor0 and factor1 is chosen uniformly from the interval [0.7, 1.0].
I used Ccmixter as dataset. Ccmixter has 3 Children's songs, two songs as training data and the other as testing data, the result on testing data is also very good even though is slightly worse than training data.
Three rap songs can also generalize well.
Two songs have different background music and same lyrics(two same voice), generalization is also ok, but worse than above two situations
First 45 songs for training and last 5 songs for testing, the results is still not good.
If I chose 9 different types of music, even in training set, the result is not good. I am trying to solve this problem.
Some other tests
Add downsample and upsample, add confusion loss, use short time fourier transform to preprocess the raw audio. The results are worse than structure A.
My result is better than original paper's result, but when I add to structure A, the result became very bad. Because I think that I need the domain information when I generate the music without voice. I should keep the original music and accompaniment having same type.
TODO for facebook net
Try to add decoding part to structure A. The bottleneck during inference is the
autoregressive process done by the WaveNet, try to use dedicated CUDA kernels code by
NVIDIA
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].