VoiceSplit
Pytorch unofficial implementation of VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Final project for SCC5830- Image Processing @ ICMC/USP.
Dataset
For the task we intend to use the LibriSpeech dataset initially. However, to use it in this task, we need to generate audios with overlappings voices.
Improvements
We use Si-SNR with PIT instead of Power Law compressed loss, because it allows us to achieve a better result ( comparison available in: https://github.com/Edresson/VoiceSplit).
We used the MISH activation function instead of ReLU and this has improved the result
Report
You can see a report of what was done in this repository here
Demos
Colab notebooks Demos:
Exp 1: link
Exp 2: link
Exp 3: link
Exp 4: link
Exp 5 (best): link
Site demo for the experiment with best results (Exp 5): https://edresson.github.io/VoiceSplit/
ToDos:
Create documentation for the repository and remove unused code
Future Works
- Train VoiceSplit model with GE2E3k and Mean Squared Error loss function
Acknowledgment:
In this repository it contains codes of other collaborators, the due credits were given in the used functions:
Preprocessing: Eren Gölge @erogol
VoiceFilter Model: Seungwon Park @seungwonpark