micah5 / Pyaudioclassification
Programming Languages
Projects that are alternatives of or similar to Pyaudioclassification
pyAudioClassification
Dead simple audio classification
Who is this for? ๐ฉโ๐ป ๐จโ๐ป
People who just want to classify some audio quickly, without having to dive into the world of audio analysis. If you need something a little more involved, check out pyAudioAnalysis or panotti
Quick install
pip install pyaudioclassification
Requirements
- Python 3
- Keras
- Tensorflow
- librosa
- NumPy
- Soundfile
- tqdm
- matplotlib
Quick start
from pyaudioclassification import feature_extraction, train, predict
features, labels = feature_extraction(<data_path>)
model = train(features, labels)
pred = predict(model, <data_path>)
Or, if you're feeling reckless, you could just string them together like so:
pred = predict(train(feature_extraction(<training_data_path>)), <prediction_data_path>)
A full example with saving, loading & some dummy data can be found here.
Read below for a more detailed look at each of these calls.
Detailed Guide
Step 1: Preprocessing ๐ถ ๐ฑ
First, add all your audio files to a directory in the following structure
data/
โโโ <class_name>/
โ โโโ <file_name>
โ โโโ ...
โโโ ...
For example, if you were trying to classify dog and cat sounds it might look like this
data/
โโโ cat/
โ โโโ cat1.ogg
โ โโโ cat2.ogg
โ โโโ cat3.wav
โ โโโ cat4.wav
โโโ dog/
โโโ dog1.ogg
โโโ dog2.ogg
โโโ dog3.wav
โโโ dog4.wav
Great, now we need to preprocess this data. Just call feature_extraction(<data_path>)
and it'll return our input and target data.
Something like this:
features, labels = feature_extraction('/Users/mac2015/data/')
(If you don't want to print to stdout, just pass verbose=False
as a argument)
Depending on how much data you have, this process could take a while... so it might be a good idea to save. You can save and load with NumPy
np.save('%s.npy' % <file_name>, features)
features = np.load('%s.npy' % <file_name>)
Step 2: Training ๐ช
Next step is to train your model on the data. You can just call...
model = train(features, labels)
...but depending on your dataset, you might need to play around with some of the hyper-parameters to get the best results.
Options
-
epochs
: The number of iterations. Default is50
. -
lr
: Learning rate. Increase to speed up training time, decrease to get more accurate results (if your loss is 'jumping'). Default is0.01
. -
optimiser
: Choose any of these. Default is'SGD'
. -
print_summary
: Prints a summary of the model you'll be training. Default isFalse
. -
loss_type
: Classification type. Default iscategorical
for >2 classes, andbinary
otherwise.
You can add any of these as optional arguments, for example train(features, labels, lr=0.05)
Again, you probably want to save your model once it's done training. You can do this with Keras:
from keras.models import load_model
model.save('my_model.h5')
model = load_model('my_model.h5')
Step 3: Prediction ๐ ๐
Now the fun part- try your trained model on new data!
pred = predict(model, <data_path>)
Your <data_path>
should point to a new, untested audio file.
Binary
If you have 2 classes (or if you force selected 'binary'
as a type), pred
will just be a single number for each file.
The closer it is to 0, the closer the prediction is for the first class, and the closer it is to 1 the closer the prediction is to the second class.
So for our cat/dog example, if it returns 0.2
it's 80% sure the sound is a cat, and if it returns 0.8
it's 80% sure it's a dog.
Categorical
If you have more than 2 classes (or if you force selected 'categorical'
as a type), pred
will be an array for each sound file.
It'll look something like this
[[1.6454633e-06 3.7017996e-11 9.9999821e-01 1.5900606e-07]]
The index of each item in the array will correspond to the prediction for that class.
You can pretty print the predictions by showing them in a leaderboard, like so:
print_leaderboard(pred, <training_data_path>)
It looks like this:
1. Cow 100.0% (index 2)
2. Rooster 0.0% (index 0)
3. Frog 0.0% (index 3)
4. Pig 0.0% (index 1)
References
- Large parts of the code (particularly the feature extraction) are based on mtobeiyf/audio-classification
- panotti