All Projects → gionanide → Speech_signal_processing_and_classification

gionanide / Speech_signal_processing_and_classification

Licence: mit
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Speech signal processing and classification

Nltk data
NLTK Data
Stars: ✭ 675 (+335.48%)
Mutual labels:  natural-language-processing, nltk
Stocksight
Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
Stars: ✭ 1,037 (+569.03%)
Mutual labels:  natural-language-processing, nltk
Rte Speech Generator
Natural Language Processing to generate new speeches for the President of Turkey.
Stars: ✭ 22 (-85.81%)
Mutual labels:  natural-language-processing, speech-processing
Surfboard
Novoic's audio feature extraction library
Stars: ✭ 318 (+105.16%)
Mutual labels:  feature-extraction, speech-processing
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-53.55%)
Mutual labels:  natural-language-processing, nltk
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+2912.9%)
Mutual labels:  natural-language-processing, classifier
Textblob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
Stars: ✭ 7,991 (+5055.48%)
Mutual labels:  natural-language-processing, nltk
bob
Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-75.48%)
Mutual labels:  feature-extraction, speech-processing
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+630.32%)
Mutual labels:  natural-language-processing, nltk
Nltk Book Resource
Notes and solutions to complement the official NLTK book
Stars: ✭ 54 (-65.16%)
Mutual labels:  natural-language-processing, nltk
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+70.97%)
Mutual labels:  natural-language-processing, feature-extraction
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-23.87%)
Mutual labels:  natural-language-processing, speech-processing
Bag-of-Visual-Words
🎒 Bag of Visual words (BoW) approach for object classification and detection in images together with SIFT feature extractor and SVM classifier.
Stars: ✭ 39 (-74.84%)
Mutual labels:  classifier, feature-extraction
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (+250.32%)
Mutual labels:  natural-language-processing, classifier
pyAudioProcessing
Audio feature extraction and classification
Stars: ✭ 165 (+6.45%)
Mutual labels:  classifier, feature-extraction
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-75.48%)
Mutual labels:  natural-language-processing, classifier
Python Tutorial Notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Stars: ✭ 52 (-66.45%)
Mutual labels:  natural-language-processing, nltk
Nltk
NLTK Source
Stars: ✭ 10,309 (+6550.97%)
Mutual labels:  natural-language-processing, nltk
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+1105.16%)
Mutual labels:  natural-language-processing, nltk
Crf Layer On The Top Of Bilstm
The CRF Layer was implemented by using Chainer 2.0. Please see more details here: https://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/
Stars: ✭ 148 (-4.52%)
Mutual labels:  natural-language-processing

Speech-Signal-Processing-and-Classification

Aristotle University of Thessaloniki - University of Groningen

Abstract of my thesis conducted during the 7-8th semester. " Two-class classification problems by analyzing the speech signal. "

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking traditional features will be tested against agnostic-features extracted by convolutive neural networks (CNNs) (e.g., auto-encoders) [4]. Additionally as concerns SVM algortihm the dimensionality reduction step took place using algorithms like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) also the kernel form of PCA - KernelPCA. In the multiclass implementation the use of (KPCA) following with LDA was essential, in the binary classification we compare the use of PCA and KernelPCA. For experimental purposes Graph Spectral Analysis(IsoMap, LLE) was used for dimensionality reduction followed with Spectral Clustering in order to investigate subsets. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be used toward achieving our goal, such as KALDI. Comparisons will be made against [5-7].

[1]X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing. Up- per Saddle River, N.J.: Pearson Education-Prentice Hall, 2001.

[2] J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Pro- cessing of Speech Signals. New York, N.Y.: Wiley-IEEE, 1999.

[3] L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing. Upper Saddle River, N.J.: Pearson Education- Prentice Hall, 2011.

[4] Wei-Ning Hsu, Yu Zhang, and James R. Glass, Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder- Based Data Augmentation,2017, http://arxiv.org/abs/1707.06265.

[5] C. Kotropoulos and G.R. Arce, ”Linear discriminant classifier with re- ject option for the detection of vocal fold paralysis and vocal fold edema”, EURASIP Advances in Signal Processing, vol. 2009, article ID 203790, 13 pages, 2009 (DOI:10.1155/2009/203790).

[6] E.Ziogas and C.Kotropoulos, ”Detection of vocal fold paralysis and edema using linear discriminant classifiers” in Proc. 4th Panhellenic Ar- tificial Intelligence Conf. (SETN-06), Heraklion, Greece, vol. LNAI 3966, pp. 454-464, May 19-20, 2006.

[7] M.Marinaki, C.Kotropoulos, I.Pitas, and N.Maglaveras, ”Automatic de- tection of vocal fold paralysis and edema” in Proc. 8th Int. Conf. Spoken Language Processing (INTERSPEECH 2004), Jeju, Korea, pp. 537-540, Oc- tober, 2004.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].