All Projects → ShuhuaGao → Vchsm

ShuhuaGao / Vchsm

Licence: lgpl-3.0
C++ 11 algorithm implementation for voice conversion using harmonic plus stochastic models

Projects that are alternatives of or similar to Vchsm

Audioswitch
An Android audio management library for real-time communication apps.
Stars: ✭ 69 (+81.58%)
Mutual labels:  audio, voice
Mystiq
Qt5/C++ FFmpeg Media Converter
Stars: ✭ 393 (+934.21%)
Mutual labels:  audio, conversion
Figaro
Real-time voice-changer for voice-chat, etc. Will support many different voice-filters and features in the future. 🎵
Stars: ✭ 80 (+110.53%)
Mutual labels:  audio, voice
Ffmpegcore
A .NET FFMpeg/FFProbe wrapper for easily integrating media analysis and conversion into your C# applications
Stars: ✭ 429 (+1028.95%)
Mutual labels:  audio, conversion
Mad Twinnet
The code for the MaD TwinNet. Demo page:
Stars: ✭ 99 (+160.53%)
Mutual labels:  audio, voice
Soundfingerprinting
Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.
Stars: ✭ 554 (+1357.89%)
Mutual labels:  algorithm, audio
Competitive Programing
个人算法刷题处
Stars: ✭ 33 (-13.16%)
Mutual labels:  algorithm
Google Hash Code 2020
More Pizza : Solution for the Practice Round of Google Hash Code 2020
Stars: ✭ 36 (-5.26%)
Mutual labels:  algorithm
Rhashmap
Robin Hood hash map library
Stars: ✭ 33 (-13.16%)
Mutual labels:  algorithm
Algo
📚 My solutions to algorithm problems on various websites
Stars: ✭ 32 (-15.79%)
Mutual labels:  algorithm
Audiovisualizer
iOS Audio Visualizer
Stars: ✭ 37 (-2.63%)
Mutual labels:  audio
Lab Notes
😍 有趣的想法 & 有意思灵感 & 小算法实验室,犄角旮旯乱七八糟代码杂货铺,新奇好玩都在这里。
Stars: ✭ 37 (-2.63%)
Mutual labels:  algorithm
Data Structures Questions
golang sorting algorithm and data construction.
Stars: ✭ 977 (+2471.05%)
Mutual labels:  algorithm
Guitard
Node based multi effects audio processor
Stars: ✭ 31 (-18.42%)
Mutual labels:  audio
Nexmo Node Code Snippets
NodeJS code examples for using Nexmo
Stars: ✭ 36 (-5.26%)
Mutual labels:  voice
Minimumaudioplugin
Minimum implementation of a native audio plugin for Unity
Stars: ✭ 33 (-13.16%)
Mutual labels:  audio
Sound
core sound data structures and interfaces
Stars: ✭ 37 (-2.63%)
Mutual labels:  audio
Sudoku Generator
A Sudoku puzzle generator written in C++ using modified and efficient backtracking algorithm.
Stars: ✭ 33 (-13.16%)
Mutual labels:  algorithm
Strawberry
🍓 Strawberry Music Player
Stars: ✭ 972 (+2457.89%)
Mutual labels:  audio
Kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Stars: ✭ 985 (+2492.11%)
Mutual labels:  audio

INTRA-LINGUAL AND CROSS-LINGUAL VOICE CONVERSION USING HARMONIC PLUS STOCHASTIC MODELS

This is a C++ 11 implementation and verification of the above algorithm proposed by Dr. Daniel Erro Eslava based on his PhD thesis, available at [http://www.lsi.upc.edu/~nlp/papers/phd_daniel_erro.pdf], specially on mobile platforms. This algorithm can be used to convert a speaker's voice to another speaker's after training with corpora from both speakers. For fun, you could convert your voice to President Trump's by downloading his audios from YouTube, for example. (NOTE: this is just a preliminary version corresponding to Ref. [1] and codes involving more features introduced in Ref. [2] are not provided here due to proprietary.)

Features

  • Written in modern C++ (C++11/14)
  • Depend on the Eigen linear algebra library for high numeric computation performance
  • Utilize OpenMP on Windows or GCD on Mac/iOS to take full advantage of parallel computation on multiple cores of a modern CPU for high speed. To achieve more fine-grained control of parallelism, the thread module introduced in C++ 11 is a better choice.
  • Expose C interfaces to facilitate interoperation with other languages on multiple platforms, such as Windows, Mac OS and iOS

Usage

Before training and conversion, please first add the proper paths like /include/vchsm and /external/Eigen3.3.4 in your IDE's header file search path such that train_C.h and convert_C.h can be found and the source files can be compiled correctly.

Training

The C API for the training phase are presented in train_C.h, which requires the client to specify the source audios, target audios such that it can generate a model file denoting the voice conversion from the source speaker to the target speaker. All the parameters are documented in details in the corresponding header file. To use an API, just

#include "train_C.h"

Alternatively, if you are familiar with CMake, the above include paths can be added to the CMakeLists.txt using the target_include_directories. Please refer to the CMakeLists.txt in this repository for an example.

Conversion

After training, this algorithm produces a model file (whose path is specified by the user in the training phase). Then we can convert any voice audios from the source speaker to make it hear like the target speaker's voice with this model file. All the C APIs for conversion are contained in convert_C.h, which are well documented in the header file.

#include "convert_C.h"

Build the example

[NOTE: please first change the corresponding paths in main.cpp according to your actual configuration.]

With Visual Studio

If you work with Visual Studio, the easiest way is to use the provided solution file directly in CppAlgo/CppAlgo.sln.

With CMake

  • Create a build directory: mkdir build and cd build
  • Run CMake: cmake ..
  • Build using make: make
  • Run the generated executable: ./vchsm

Parallel computing

For the training phase, the algorithm needs to analyze multiple training samples (audios). To speed up the training procedure, a natural way is to divide the training set into several subsets whereby each subset can be processed simultaneously on a multiple-core processor. A naive implementation of this parallel process idea is realized with OpenMP in train_c.cpp or GCD in train_c.mm.

Note

  • For both training and conversion APIs, a client needs to provide multiple arguments such as the list of source audios, target audios and the path of the model file. To make this process more friendly and more concise, the API can also make use of a single configuration file to specify all the required parameters.
  • Requirements of input audios:
    • Mono WAV, i.e., a single channel;
    • 16 bits per sample;
    • 16kHz sampling frequency. (You may use tools such as MediaInfo to get technical details of an audio file.)
    • For training, the materials for the target speaker and the source speaker should remain the same. For example, in source-1.wav and target-1.wav, they may both say "make America great again".
    • After training finished, the source speaker can speak anything he/she likes, and the algorithm will convert the voice into one like the target speaker's using the model file obtained in training.

Example

In the audios subdirectory, we have placed the audio samples from Dr. Daniel Erro Eslava, which includes a training set composed 20 audio samples from two speakers and a test set containing 10 audio samples from the source speaker. Let's walk through a simple example. Using the training set in Audios, we can train a model with the given 20 audio samples from both the source speaker and target speaker. After that, this generated model can be used to convert any voice record of the source speaker into the voice of the target speaker, i.e., voice conversion. Therefore, you may make yourself sound like President Trump if you can get his voice samples for training. On a Intel Core i7 CPU with 4 cores, it takes about 30 seconds to train and a negligible time to convert, which makes this implementation close to real-time.

#include <cstdio>
#include "../include/vchsm/train_C.h"
#include "../include/vchsm/convert_C.h"

#define VERBOSE_TRUE 1
#define VERBOSE_FALSE 0

int main()
{
	// train a model with the given 20 source and target speaker's audios
	// the audios files are named from 1 to 20
	const char* sourceAudioDir = "E:/GitHub/vchsm/Audios/source_train/";
	const char*  targetAudioDir = "E:/GitHub/vchsm/Audios/target_train/";
	const int numTrainSamples = 20;
	const char* sourceAudioList[numTrainSamples];
	const char* targetAudioList[numTrainSamples];
	for (int i = 0; i < numTrainSamples; ++i)
	{
		char* buff = new char[100];
		std::sprintf(buff, "%s%d.wav", sourceAudioDir, i + 1);
		sourceAudioList[i] = buff;
		buff = new char[100];
		std::sprintf(buff, "%s%d.wav", targetAudioDir, i + 1);
		targetAudioList[i] = buff;
	}
	// model file to be generated
	const char* modelFile = "E:/GitHub/vchsm/models/Model.dat";
	// start training	
	trainHSMModel(sourceAudioList, targetAudioList, numTrainSamples, 4, modelFile, VERBOSE_TRUE);
	// deallocate
	for (int i = 0; i < numTrainSamples; ++i)
	{
		delete[] sourceAudioList[i];
		delete[] targetAudioList[i];
	}
	// perform conversion
	const char* testAudio = "E:/GitHub/vchsm/Audios/test/jal_in_42_3.wav";
	const char* testAudioConverted = "E:/GitHub/vchsm/Audios/test/jal_in_42_3_c.wav";
	convertSingle(modelFile, testAudio, testAudioConverted, VERBOSE_TRUE);
	// now we can compare the above audio before and after conversion
	std::getchar();
}

Codes for the above example are placed in ./CppAlgo/example/main.cpp.

Additional notes for application on Mac/iOS

Since the Mac/iOS platforms prefer GCD to OpenMP for multiple-core parallel programming, the equivalent implementation using GCD is provided in train_C.mm. Just use train_C.mm to replace train_C.cpp (which depends on OpenMP) if you plan to deploy it on Mac/iOS. To facilitate the use of this library with Swift, a bridge file C_Swift_bridge_header.h is also included.

Publication

[1] Wu, Xiaoling, Shuhua Gao, Dong-Yan Huang, and Cheng Xiang. "Voichap: A standalone real-time voice change application on iOS platform." In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 728-732. IEEE, 2017.

[2] Gao, Shuhua, Xiaoling Wu, Cheng Xiang, and Dongyan Huang. "Development of a computationally efficient voice conversion system on mobile phones." APSIPA Transactions on Signal and Information Processing 8 (2019). PDF

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].