All Projects β†’ jsingh811 β†’ pyAudioProcessing

jsingh811 / pyAudioProcessing

Licence: GPL-3.0 license
Audio feature extraction and classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pyAudioProcessing

classy
Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies
Stars: ✭ 12 (-92.73%)
Mutual labels:  classifier, classify
Bag-of-Visual-Words
πŸŽ’ Bag of Visual words (BoW) approach for object classification and detection in images together with SIFT feature extractor and SVM classifier.
Stars: ✭ 39 (-76.36%)
Mutual labels:  classifier, feature-extraction
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Stars: ✭ 155 (-6.06%)
Mutual labels:  classifier, feature-extraction
golinear
liblinear bindings for Go
Stars: ✭ 45 (-72.73%)
Mutual labels:  classifier
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+8.48%)
Mutual labels:  audio-data
Diabetic-Retinopathy-Feature-Extraction-using-Fundus-Images
Diabetic Retinopathy is a very common eye disease in people having diabetes. This disease can lead to blindness if not taken care of in early stages, This project is a part of the whole process of identifying Diabetic Retinopathy in its early stages. In this project, we'll extract basic features which can help us in identifying Diabetic Retinopa…
Stars: ✭ 25 (-84.85%)
Mutual labels:  feature-extraction
video features
Extract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
Stars: ✭ 225 (+36.36%)
Mutual labels:  feature-extraction
createml-playgrounds
Create ML playgrounds for building machine learning models. For developers and data scientists.
Stars: ✭ 82 (-50.3%)
Mutual labels:  classifier
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-86.67%)
Mutual labels:  hyperparameter-tuning
WAV.jl
Julia package for working with WAV files
Stars: ✭ 79 (-52.12%)
Mutual labels:  wav-files
pyefd
Python implementation of "Elliptic Fourier Features of a Closed Contour"
Stars: ✭ 71 (-56.97%)
Mutual labels:  feature-extraction
ConvolutionaNeuralNetworksToEnhanceCodedSpeech
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…
Stars: ✭ 25 (-84.85%)
Mutual labels:  mfcc
mlr3tuning
Hyperparameter optimization package of the mlr3 ecosystem
Stars: ✭ 44 (-73.33%)
Mutual labels:  hyperparameter-tuning
machine learning course
Artificial intelligence/machine learning course at UCF in Spring 2020 (Fall 2019 and Spring 2019)
Stars: ✭ 47 (-71.52%)
Mutual labels:  feature-extraction
spafe
πŸ”‰ spafe: Simplified Python Audio Features Extraction
Stars: ✭ 310 (+87.88%)
Mutual labels:  mfcc
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-76.97%)
Mutual labels:  feature-extraction
iros bshot
B-SHOT : A Binary Feature Descriptor for Fast and Efficient Keypoint Matching on 3D Point Clouds
Stars: ✭ 43 (-73.94%)
Mutual labels:  feature-extraction
open-box
Generalized and Efficient Blackbox Optimization System.
Stars: ✭ 64 (-61.21%)
Mutual labels:  hyperparameter-tuning
scikit-hyperband
A scikit-learn compatible implementation of hyperband
Stars: ✭ 68 (-58.79%)
Mutual labels:  hyperparameter-tuning
lapis-bayes
Naive Bayes classifier for use in Lua
Stars: ✭ 26 (-84.24%)
Mutual labels:  classifier

pyAudioProcessing

pyaudioprocessing

A Python based library for processing audio data into features (GFCC, MFCC, spectral, chroma) and building Machine Learning models.
This was initially written using Python 3.7, and updated several times using Python 3.8 and Python 3.9, and has been tested to work with Python >= 3.6, <3.10.

Getting Started

  1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip
pip install pyAudioProcessing

To upgrade to the latest version of pyAudioProcessing, the following pip command can be used.

pip install -U pyAudioProcessing
  1. Or, you could also clone the project and get it setup
git clone [email protected]:jsingh811/pyAudioProcessing.git
cd pyAudioProcessing
pip install -e .

You can also get the requirements by running

pip install -r requirements/requirements.txt

Contents

Data structuring
Feature and Classifier model options
Pre-trained models
Extracting numerical features from audio
Building custom classification models
Audio cleaning
Audio format conversion
Audio visualization

Please refer to the Wiki for more details.

Citation

Using pyAudioProcessing in your research? Please cite as follows.

Jyotika Singh. (2021, July 22). jsingh811/pyAudioProcessing: Audio processing, feature extraction and classification (Version v1.2.0). Zenodo. http://doi.org/10.5281/zenodo.5121041

DOI

Bibtex

@software{jyotika_singh_2021_5121041,
  author       = {Jyotika Singh},
  title        = {{jsingh811/pyAudioProcessing: Audio processing,
                   feature extraction and classification}},
  month        = jul,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v1.2.0},
  doi          = {10.5281/zenodo.5121041},
  url          = {https://doi.org/10.5281/zenodo.5121041}
}

Options

Feature options

You can choose between features gfcc, mfcc, spectral, chroma or any combination of those, example gfcc,mfcc,spectral,chroma, to extract from your audio files for classification or just saving extracted feature for other uses.

Classifier options

You can choose between svm, svm_rbf, randomforest, logisticregression, knn, gradientboosting and extratrees.
Hyperparameter tuning is included in the code for each using grid search.

Training and Testing Data structuring (Optional)

The library works with data structured as per this section or alternatively with taking an input dictionary object specifying location paths of the audio files.

Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:

.
β”œβ”€β”€ training_data
β”œβ”€β”€ music
β”‚   β”œβ”€β”€ music_sample1.wav
β”‚   β”œβ”€β”€ music_sample2.wav
β”‚   β”œβ”€β”€ music_sample3.wav
β”‚   β”œβ”€β”€ music_sample4.wav
β”œβ”€β”€ speech
β”‚   β”œβ”€β”€ speech_sample1.wav
β”‚   β”œβ”€β”€ speech_sample2.wav
β”‚   β”œβ”€β”€ speech_sample3.wav
β”‚   β”œβ”€β”€ speech_sample4.wav

Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as

.
β”œβ”€β”€ testing_data
β”œβ”€β”€ music
β”‚   β”œβ”€β”€ music_sample5.wav
β”‚   β”œβ”€β”€ music_sample6.wav
β”œβ”€β”€ speech
β”‚   β”œβ”€β”€ speech_sample5.wav
β”‚   β”œβ”€β”€ speech_sample6.wav

If you want to classify audio samples without any known labels, structure the data similarly as

.
β”œβ”€β”€ data
β”œβ”€β”€ unknown
β”‚   β”œβ”€β”€ sample1.wav
β”‚   β”œβ”€β”€ sample2.wav

Classifying with Pre-trained Models

There are three models that have been pre-trained and provided in this project. They are as follows.

music genre: Contains pre-trained SVM classifier to classify audio into 10 music genres - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. This classifier was trained using MFCC, GFCC, spectral, and chroma features.

musicVSspeech: Contains pre-trained SVM classifier that classifying audio into two possible classes - music and speech. This classifier was trained using MFCC, spectral, and chroma features.

musicVSspeechVSbirds: Contains pre-trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using GFCC, spectral, and chroma features.

There are three ways to specify the data you want to classify.

  1. Classifying a single audio file specified by input file.
from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre

# musicVSspeech classification
results_music_speech = classify_ms(file="/Users/xyz/Documents/audio.wav")

# musicVSspeechVSbirds classification
results_music_speech_birds = classify_msb(file="/Users/xyz/Documents/audio.wav")

# music genre classification
results_music_genre = classify_genre(file="/Users/xyz/Documents/audio.wav")
  1. Using file_names specifying locations of audios as follows.
# {"audios_1" : [<path to audio>, <path to audio>, ...], "audios_2": [<path to audio>, ...],}

# Examples.  

file_names = {
	"music" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav"],
	"birds": [ "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"]
}

file_names = {
	"audios" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav", "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"]
}

The following commands in Python can be used to classify your data.

from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre

# musicVSspeech classification
results_music_speech = classify_ms(file_names=file_names)

# musicVSspeechVSbirds classification
results_music_speech_birds = classify_msb(file_names=file_names)

# music genre classification
results_music_genre = classify_genre(file_names=file_names)
  1. Using data structured as specified in structuring guidelines and passing the parent folder path as folder_path input.

The following commands in Python can be used to classify your data.

from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre

# musicVSspeech classification
results_music_speech = classify_ms(folder_path="../data")

# musicVSspeechVSbirds classification
results_music_speech_birds = classify_msb(folder_path="../data")

# music genre classification
results_music_genre = classify_genre(folder_path="../data")

Sample results look like

{'../data/music': {'beatles.wav': {'probabilities': [0.8899067858599712, 0.011922234412695229, 0.0981709797273336], 'classes': ['music', 'speech', 'birds']}, ...}

Training and Classifying Audio files

Audio data can be trained, tested and classified using pyAudioProcessing. Please see feature options and classifier model options for more information.

Sample spoken location name dataset for spoken instances of "london" and "boston" can be found here.

Examples

Code example of using gfcc,spectral,chroma feature and svm classifier.

There are 2 ways to pass the training data in.

  1. Using locations of files in a dictionary format as the input file_names.

  2. Passing in a folder_path containing sub-folders and audio. Please refer to the section on Training and Testing Data structuring to use your own data instead.

from pyAudioProcessing.run_classification import  classify, train

# Training
train(
	file_names={
		"music": [<path to audio>, <path to audio>, ..],
		"speech": [<path to audio>, <path to audio>, ..]
	},
	feature_names=["gfcc", "spectral", "chroma"],
	classifier="svm",
	classifier_name="svm_test_clf"
)

Or, to use a directory containing audios organized as in structuring guidelines, the following can be used

train(
	folder_path="../data", # path to dir
	feature_names=["gfcc", "spectral", "chroma"],
	classifier="svm",
	classifier_name="svm_test_clf"
)

The above logs files analyzed, hyperparameter tuning results for recall, precision and F1 score, along with the final confusion matrix.

To classify audio samples with the classifier you created above,

# Classify a single file 

results = classify(
	file = "<path to audio>",
	feature_names=["gfcc", "spectral", "chroma"],
	classifier="svm",
	classifier_name="svm_test_clf"
)

# Classify multiple files with known labels and locations
results = classify(
	file_names={
		"music": [<path to audio>, <path to audio>, ..],
		"speech": [<path to audio>, <path to audio>, ..]
	},
	feature_names=["mfcc", "gfcc", "spectral", "chroma"],
	classifier="svm",
	classifier_name="svm_test_clf"
)

# or you can specify a folder path as described in the training section.

The above logs the filename where the classification results are saved along with the details about testing files and the classifier used if you pass in logfile=True into the function call.

If you cloned the project via git, the following command line example of training and classification with gfcc,spectral,chroma features and svm classifier can be used as well. Sample data can be found here. Please refer to the section on Training and Testing Data structuring to use your own data instead.

Training:

python pyAudioProcessing/run_classification.py -f "data_samples/training" -clf "svm" -clfname "svm_clf" -t "train" -feats "gfcc,spectral,chroma"

Classifying:

python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "svm" -clfname "svm_clf" -t "classify" -feats "gfcc,spectral,chroma" -logfile "../classifier_results"

Classification results get saved in ../classifier_results_svm_clf.json.

Extracting features from audios

This feature lets the user extract aggregated data features calculated per audio file. See feature options for more information on choices of features available.

Examples

Code example for performing gfcc and mfcc feature extraction can be found below.

from pyAudioProcessing.extract_features import get_features

# Feature extraction of a single file

features = get_features(
  file="<path to audio>",
  feature_names=["gfcc", "mfcc"]
)

# Feature extraction of a multiple files

features = get_features(
  file_names={
    "music": [<path to audio>, <path to audio>, ..],
    "speech": [<path to audio>, <path to audio>, ..]
  },
  feature_names=["gfcc", "mfcc"]
)

# or if you have a dir with  sub-folders and audios
# features = get_features(folder_path="data_samples/testing", feature_names=["gfcc", "mfcc"])

# features is a dictionary that will hold data of the following format
"""
{
  music: {file1_path: {"features": <list>, "feature_names": <list>}, ...},
  speech: {file1_path: {"features": <list>, "feature_names": <list>}, ...},
  ...
}
"""

To save features in a json file,

from pyAudioProcessing import utils
utils.write_to_json("audio_features.json", features)

If you cloned the project via git, the following command line example of for gfcc and mfcc feature extractions can be used as well. The features argument should be a comma separated string, example gfcc,mfcc.
To use your own audio files for feature extraction, pass in the directory path containing .wav files as the -f argument. Please refer to the format of directory data_samples/testing or the section on Training and Testing Data structuring.

python pyAudioProcessing/extract_features.py -f "data_samples/testing"  -feats "gfcc,mfcc"

Features extracted get saved in audio_features.json.

Audio format conversion

You can convert you audio in .mp4, .mp3, .m4a and .aac to .wav. This will allow you to use audio feature generation and classification functionalities.

In order to convert your audios, the following code sample can be used.

from pyAudioProcessing.convert_audio import convert_files_to_wav

# dir_path is the path to the directory/folder on your machine containing audio files
dir_path = "data/mp4_files"

# simply change audio_format to "mp3", "m4a" or "acc" depending on the format
# of audio that you are trying to convert to wav
convert_files_to_wav(dir_path, audio_format="mp4")

# the converted wav files will be saved in the same dir_path location.

Audio cleaning

To remove low-activity regions from your audio clip, the following sample usage can be referred to.

from pyAudioProcessing import clean

clean.remove_silence(
	      <path to wav file>,
               output_file=<path where you want to store cleaned wav file>
)

Audio visualization

To see time-domain view of the audios, and the spectrogram of the audios, please refer to the following sample usage.

from pyAudioProcessing import plot

# spectrogram plot
plot.spectrogram(
     <path to wav file>,
    show=True, # set to False if you do not want the plot to show
    save_to_disk=True, # set to False if you do not want the plot to save
    output_file=<path where you want to store spectrogram as a png>
)

# time-series plot
plot.time(
     <path to wav file>,
    show=True, # set to False if you do not want the plot to show
    save_to_disk=True, # set to False if you do not want the plot to save
    output_file=<path where you want to store the plot as a png>
)

Author

Jyotika Singh
https://twitter.com/jyotikasingh_/ https://www.linkedin.com/in/jyotikasingh/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].