All Projects → matatusko → opinion-or-fact-sentence-classifier

matatusko / opinion-or-fact-sentence-classifier

Licence: other
Classifies sentences whether they represent a fact or personal opinion with 90% accuracy using various Machine Learning algorithms from sklearn library.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to opinion-or-fact-sentence-classifier

Albert Tf2.0
ALBERT model Pretraining and Fine Tuning using TF2.0
Stars: ✭ 180 (+718.18%)
Mutual labels:  classifier
golinear
liblinear bindings for Go
Stars: ✭ 45 (+104.55%)
Mutual labels:  classifier
Nepali-News-Classifier
Text Classification of Nepali Language Document. This Mini Project was done for the partial fulfillment of NLP Course : COMP 473.
Stars: ✭ 13 (-40.91%)
Mutual labels:  classifier
Errant
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
Stars: ✭ 208 (+845.45%)
Mutual labels:  classifier
polyssifier
run a multitude of classifiers on you data and get an AUC report
Stars: ✭ 64 (+190.91%)
Mutual labels:  classifier
website-fingerprinting
Deanonymizing Tor or VPN users with website fingerprinting and machine learning.
Stars: ✭ 59 (+168.18%)
Mutual labels:  classifier
Naive Bayes Classifier
yet another general purpose naive bayesian classifier.
Stars: ✭ 162 (+636.36%)
Mutual labels:  classifier
dhtbay
A DHT crawler and torrent indexer
Stars: ✭ 94 (+327.27%)
Mutual labels:  classifier
createml-playgrounds
Create ML playgrounds for building machine learning models. For developers and data scientists.
Stars: ✭ 82 (+272.73%)
Mutual labels:  classifier
Face-Recognition-FaceNet
A python script label faces in group photos using Facenet. 🎉
Stars: ✭ 21 (-4.55%)
Mutual labels:  classifier
Pytorch Multi Label Classifier
A pytorch implemented classifier for Multiple-Label classification
Stars: ✭ 232 (+954.55%)
Mutual labels:  classifier
bayes
naive bayes in php
Stars: ✭ 61 (+177.27%)
Mutual labels:  classifier
ocr-machine-learning
OCR Machine Learning in python
Stars: ✭ 42 (+90.91%)
Mutual labels:  classifier
Licenseclassifier
A License Classifier
Stars: ✭ 180 (+718.18%)
Mutual labels:  classifier
train-classifier-from-scratch
Machine Learning: Collect data online and train a classifier from scratch
Stars: ✭ 59 (+168.18%)
Mutual labels:  classifier
Programming Language Classifier
An example of how to use CreateML in Xcode 10 to create a Core ML model for classifying text
Stars: ✭ 172 (+681.82%)
Mutual labels:  classifier
lapis-bayes
Naive Bayes classifier for use in Lua
Stars: ✭ 26 (+18.18%)
Mutual labels:  classifier
opfython
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.
Stars: ✭ 29 (+31.82%)
Mutual labels:  classifier
pytorch hand classifier
Simple hand classifier by Pytorch and ResNet
Stars: ✭ 91 (+313.64%)
Mutual labels:  classifier
pyAudioProcessing
Audio feature extraction and classification
Stars: ✭ 165 (+650%)
Mutual labels:  classifier

Opinion Classifier

Classifies sentences whether they represent a fact or personal opinion. Tested with different algorithms from sklearn, including random forest classifier, support vector machines, logistic regression and neural network and each achieves over 90% accuracy.

Dataset

Factual sentences are mostly sentences extracted from wikipedia articles - the whole process happens in the gather_and_prepare_data.py module. Maybe not a perfectly annotated data, but one can safely assume most of the sentences found on wiki are pure fact (or at least structured as facts).

As for the opinions, I've found a great dataset called Opinosis, which consists of opinion sentences as extracted from various reviews on many different topics.

In total there were around 11,000 sentences in the dataset. Not the perfect amount, but still does the job.

Features

As for the features, I'm using spaCy to extract finely grained part of speech tags (including info whether the verb is in past or present form, singular or plural noun etc. More info on spaCy website. Also, I'm extracted entities and their labels as more features. I wasn't sure whether that would work, as usually people tend to go with BOW models, but the classifiers all achieve over 90% accuracy on a test data, and based on my small sample test with sentences I've created, it is rather accurate.

Just for the sake of it, I've built a BOW model on exactly the same dataset and run it through multiple ML algorithms, including Naive Bayes, Random Forest, SVM, Logistic Regression and NN. It was interesting to see that in the accuracy on a test set was way higher, reaching up to 97%.

However, when tested on random sample sentences outside of database, the BOW model (all algos testes) did a horrible job classifying all the sentences on donuts mostly incorrectly. I assume the BOW model tends to overfit and works only with sentences which are roughly on similar topic or contain the words in the wordlist. On the other hand, the model based on sentence structure (number of labeled POS tags) generalizes more and provides better results on new examples, at least when trained on relatively small dataset.

Samples for sentence structure model

Extracted from quick_tests.py module. Feel free to give it a try, all the models I've trained are included in the repository. Maybe some more fine-tuning with parameters would yield better results, but for a side-project I'm quite satisfied with what it does.

Using rf_classifier (random forest)
--: Sentence: "As far as I am concerned, donuts are amazing."
is an OPINION!

Using svm_classifier (support vector machine)
--: Sentence: "Donuts are a kind of ring-shaped, deep fried dessert."
is a FACT!

Using lr_classifier (logistic regression)
--: Sentence: "Doughnut can also be spelled as "Donut", which is an American variant of the word."
is a FACT!

Using nn_classifier (neural network)
--: Sentence: "This new graphics card I bought recently is pretty amazing, it has no trouble rendering my 3D donuts art in high quality."
is a FACT!

Using nn_classifier (neural network)
--: Sentence: "I think this new graphics card is amazing, it has no trouble rendering my 3D donuts art in high quality."
is an OPINION!

Samples for BOW model (using NN classifier which got 97% on test set)

Sentence: As far as I am concerned, donuts are amazing.
The above sentence is a FACT!

Sentence: Donuts are torus-shaped, deep fried desserts, very often with a jam feeling on the inside.
The above sentence is a FACT!

Sentence: Doughnut can also be spelled as "Donut", which is an American variant of the word.
The above sentence is a FACT!

Sentence: This new graphics card I bought recently is pretty amazing, it has no trouble rendering my 3D donuts art in high quality.
The above sentence is a FACT!

Sentence: Noone knows what are the origins of donuts.
The above sentence is a FACT!

Sentence: The earliest origins to the modern doughnuts are generally traced back to the olykoek ("oil(y) cake"), which Dutch settlers brought with them to early New York
The above sentence is an OPINION!

Sentence: This donut is quite possibly the best tasting donut in the entire world.
The above sentence is a FACT!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].