All Projects → ptnplanet → Java Naive Bayes Classifier

ptnplanet / Java Naive Bayes Classifier

A java classifier based on the naive Bayes approach complete with Maven support and a runnable example.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Java Naive Bayes Classifier

chatto
Chatto is a minimal chatbot framework in Go.
Stars: ✭ 98 (-66.21%)
Mutual labels:  classifier
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (-93.1%)
Mutual labels:  classifier
Computer-Vision-Project
The goal of this project was to develop a Face Recognition application using a Local Binary Pattern approach and, using the same approach, develop a real time Face Recognition application.
Stars: ✭ 20 (-93.1%)
Mutual labels:  classifier
NN-scratch
Coding up a Neural Network Classifier from Scratch
Stars: ✭ 78 (-73.1%)
Mutual labels:  classifier
Bag-of-Visual-Words
🎒 Bag of Visual words (BoW) approach for object classification and detection in images together with SIFT feature extractor and SVM classifier.
Stars: ✭ 39 (-86.55%)
Mutual labels:  classifier
labelReader
Programmatically find and read labels using Machine Learning
Stars: ✭ 44 (-84.83%)
Mutual labels:  classifier
Emotion and Polarity SO
An emotion classifier of text containing technical content from the SE domain
Stars: ✭ 74 (-74.48%)
Mutual labels:  classifier
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
Stars: ✭ 36 (-87.59%)
Mutual labels:  classifier
pghumor
Is This a Joke? Humor Detection in Spanish Tweets
Stars: ✭ 48 (-83.45%)
Mutual labels:  classifier
smalltext
Classify short texts with neural network.
Stars: ✭ 15 (-94.83%)
Mutual labels:  classifier
Water-classifier-fastai
Deploy your Flask web app classifier on Heroku which is written using fastai library.
Stars: ✭ 37 (-87.24%)
Mutual labels:  classifier
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (-93.79%)
Mutual labels:  classifier
simple-image-classifier
Simple image classifier microservice using tensorflow and sanic
Stars: ✭ 22 (-92.41%)
Mutual labels:  classifier
tensorflow-image-classifier
Easily train an image classifier and then use it to label/tag other images
Stars: ✭ 29 (-90%)
Mutual labels:  classifier
GeneticAlgorithmForFeatureSelection
Search the best feature subset for you classification mode
Stars: ✭ 82 (-71.72%)
Mutual labels:  classifier
classy
Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies
Stars: ✭ 12 (-95.86%)
Mutual labels:  classifier
naive-bayes-classifier
Implementing Naive Bayes Classification algorithm into PHP to classify given text as ham or spam. This application uses MySql as database.
Stars: ✭ 21 (-92.76%)
Mutual labels:  classifier
node-fasttext
Nodejs binding for fasttext representation and classification.
Stars: ✭ 39 (-86.55%)
Mutual labels:  classifier
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-51.03%)
Mutual labels:  classifier
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-94.83%)
Mutual labels:  classifier

Java Naive Bayes Classifier

Build Status

Nothing special. It works and is well documented, so you should get it running without wasting too much time searching for other alternatives on the net.

Maven Quick-Start

This Java Naive Bayes Classifier can be installed via the jitpack repository. Make sure to add it to your buildfile first.

<repositories>
  <repository>
    <id>jitpack.io</id>
    <url>https://jitpack.io</url>
  </repository>
</repositories>

Then, treat it as any other dependency.

<dependency>
  <groupId>com.github.ptnplanet</groupId>
  <artifactId>Java-Naive-Bayes-Classifier</artifactId>
  <version>1.0.7</version>
</dependency>

For other build-tools (e.g. gradle), visit https://jitpack.io for configuration snippets.

Please also head to the release tab for further releases.

Overview

I like talking about features and categories. Objects have features and may belong to a category. The classifier will try matching objects to their categories by looking at the objects' features. It does so by consulting its memory filled with knowledge gathered from training examples.

Classifying a feature-set results in the highest product of 1) the probability of that category to occur and 2) the product of all the features' probabilities to occure in that category:

classify(feature1, ..., featureN) = argmax(P(category) * PROD(P(feature|category)))

This is a so-called maximum a posteriori estimation. Wikipedia actually does a good job explaining it: http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model

Learning from Examples

Add knowledge by telling the classifier, that these features belong to a specific category:

String[] positiveText = "I love sunny days".split("\\s");
bayes.learn("positive", Arrays.asList(positiveText));

Classify unknown objects

Use the gathered knowledge to classify unknown objects with their features. The classifier will return the category that the object most likely belongs to.

String[] unknownText1 = "today is a sunny day".split("\\s");
bayes.classify(Arrays.asList(unknownText1)).getCategory();

Example

Here is an excerpt from the example. The classifier will classify sentences (arrays of features) as sentences with either positive or negative sentiment. Please refer to the full example for a more detailed documentation.

// Create a new bayes classifier with string categories and string features.
Classifier<String, String> bayes = new BayesClassifier<String, String>();

// Two examples to learn from.
String[] positiveText = "I love sunny days".split("\\s");
String[] negativeText = "I hate rain".split("\\s");

// Learn by classifying examples.
// New categories can be added on the fly, when they are first used.
// A classification consists of a category and a list of features
// that resulted in the classification in that category.
bayes.learn("positive", Arrays.asList(positiveText));
bayes.learn("negative", Arrays.asList(negativeText));

// Here are two unknown sentences to classify.
String[] unknownText1 = "today is a sunny day".split("\\s");
String[] unknownText2 = "there will be rain".split("\\s");

System.out.println( // will output "positive"
    bayes.classify(Arrays.asList(unknownText1)).getCategory());
System.out.println( // will output "negative"
    bayes.classify(Arrays.asList(unknownText2)).getCategory());

// Get more detailed classification result.
((BayesClassifier<String, String>) bayes).classifyDetailed(
    Arrays.asList(unknownText1));

// Change the memory capacity. New learned classifications (using
// the learn method) are stored in a queue with the size given
// here and used to classify unknown sentences.
bayes.setMemoryCapacity(500);

Forgetful learning

This classifier is forgetful. This means, that the classifier will forget recent classifications it uses for future classifications after - defaulting to 1.000 - classifications learned. This will ensure, that the classifier can react to ongoing changes in the user's habbits.

Interface

The abstract Classifier<T, K> serves as a base for the concrete BayesClassifier<T, K>. Here are its methods. Please also refer to the Javadoc.

  • void reset() Resets the learned feature and category counts.
  • Set<T> getFeatures() Returns a Set of features the classifier knows about.
  • Set<K> getCategories() Returns a Set of categories the classifier knows about.
  • int getCategoriesTotal() Retrieves the total number of categories the classifier knows about.
  • int getMemoryCapacity() Retrieves the memory's capacity.
  • void setMemoryCapacity(int memoryCapacity) Sets the memory's capacity. If the new value is less than the old value, the memory will be truncated accordingly.
  • void incrementFeature(T feature, K category) Increments the count of a given feature in the given category. This is equal to telling the classifier, that this feature has occurred in this category.
  • void incrementCategory(K category) Increments the count of a given category. This is equal to telling the classifier, that this category has occurred once more.
  • void decrementFeature(T feature, K category) Decrements the count of a given feature in the given category. This is equal to telling the classifier that this feature was classified once in the category.
  • void decrementCategory(K category) Decrements the count of a given category. This is equal to telling the classifier, that this category has occurred once less.
  • int getFeatureCount(T feature, K category) Retrieves the number of occurrences of the given feature in the given category.
  • int getFeatureCount(T feature) Retrieves the total number of occurrences of the given feature.
  • int getCategoryCount(K category) Retrieves the number of occurrences of the given category.
  • float featureProbability(T feature, K category) (implements IFeatureProbability<T, K>.featureProbability) Returns the probability that the given feature occurs in the given category.
  • float featureWeighedAverage(T feature, K category) Retrieves the weighed average P(feature|category) with overall weight of 1.0 and an assumed probability of 0.5. The probability defaults to the overall feature probability.
  • float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator) Retrieves the weighed average P(feature|category) with overall weight of 1.0, an assumed probability of 0.5 and the given object to use for probability calculation.
  • float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight)Retrieves the weighed average P(feature|category) with the given weight and an assumed probability of 0.5 and the given object to use for probability calculation.
  • float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight, float assumedProbability) Retrieves the weighed average P(feature|category) with the given weight, the given assumed probability and the given object to use for probability calculation.
  • void learn(K category, Collection<T> features) Train the classifier by telling it that the given features resulted in the given category.
  • void learn(Classification<T, K> classification) Train the classifier by telling it that the given features resulted in the given category.

The BayesClassifier<T, K> class implements the following abstract method:

  • Classification<T, K> classify(Collection<T> features) It will retrieve the most likely category for the features given and depends on the concrete classifier implementation.

Running the example

$ git clone https://github.com/ptnplanet/Java-Naive-Bayes-Classifier.git
$ cd Java-Naive-Bayes-Classifier
$ javac -cp src/main/java example/RunnableExample.java
$ java -cp example:src/main/java RunnableExample

Possible Performance issues

Performance improvements, I am currently thinking of:

  • Store the natural logarithms of the feature probabilities and add them together instead of multiplying the probability numbers

The MIT License (MIT)

Copyright (c) 2012-2017 Philipp Nolte

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].