All Projects → danielfm → judgr

danielfm / judgr

Licence: other
Multi-class Naïve Bayes Classifier library written in Clojure.

Programming Languages

clojure
4091 projects

Judgr

Judgr (pronounced as judger) is a naïve Bayes classifier library written in Clojure which features multivariate classification, support for cross validation, and more.

Features

  • Multivariate classification
  • Biased and unbiased class probabability
  • Configurable Laplace Smoothing
  • Configurable threshold validation
  • K-fold cross-validation
    • Precision, Recall, Specificity, Accuracy, and F1 score

Getting Started

Add the following dependency to your project.clj file:

[judgr "0.3.0"]

Training The Classifier

The first step is to instantiate the classifier given the current settings:

user=> (use '[judgr.core]
            '[judgr.settings])
nil
user=> (def classifier (classifier-from settings))
#'user/classifier

Now you can start training the classifier with (.train! classifier item :class):

(.train! classifier "How are you?" :positive)
(.train! classifier "Burn in hell!" :negative)
(.train! classifier ...)

If you want to train all examples of a given class at once, there's also (.train-all! classifier items :class):

(def positive-items ["How are you?" ...])
(def negative-items ["Burn in hell!" ...])

(.train-all! classifier positive-items :positive)
(.train-all! classifier negative-items :negative)

Or train all examples of different classes:

(.train-all! classifier [{:item "How are you?"  :class :positive}
                         {:item "Burn in hell!" :class :negative}])

The default classifier saves data to memory and are capable of extracting words from the given text using a porter stemmer. Also, items can be classified as either :positive or :negative. If your problem requires different settings, please take a look at the Extending The Classifier section below.

Classifying Items

After some training you should be able to use the classifier to guess on which class that item falls into:

user=> (.classify classifier "Long time, no see.")
:positive
user=> (.classify classifier "Go to hell.")
:negative

It's also possible to get the probabilities for all classes:

user=> (.probabilities classifier "Long time, no see.")
{:negative 0.38461539149284363, :positive 0.6153846383094788}

Evaluating The Classifier

It's not that trivial to measure how well the classifier is generalizing to examples it doesn't know about. Fortunately, there's a common technique to evaluate an algorithm's performance that is known as Cross-validation.

The output of a K-Fold Cross-validation process is a Confusion Matrix.

user=> (use 'judgr.cross-validation)
nil
user=> (def conf-matrix (k-fold-crossval 2 classifier))
#'user/conf-matrix
user=> conf-matrix
{:positive {:positive 102
            :negative 3}
 :negative {:positive 7
            :negative 186}}

This Confusion Matrix will tell, for each known class, how many items it predicted correctly, and how many items it predicted as being in another class. For example, for all items known as :positive, 102 items were flagged correctly and 3 were flagged incorrectly as :negative.

Although this helps, it would be nice to have ways to calculate a single number score.

Accuracy

The Accuracy is the percentage of predictions that the classifier got correct:

user=> (accuracy conf-matrix)
144/149

In case of low accuracy, there are other calculations that might help you identify what's wrong.

Precision

Precision is a measure of the accuracy provided that a specific class has been predicted:

user=> (precision :positive conf-matrix)
102/109
user=> (precision :negative conf-matrix)
62/63

Recall

Recall is a measure of the ability of a model to select instances of a certain class from a data set. It is commonly also called Sensitivity, and corresponds to the true positive rate:

user=> (recall :positive conf-matrix)
34/35
user=> (recall :negative conf-matrix)
186/193
user=> (sensitivity :negative conf-matrix)
186/193

Specificity

Specificity indicates the ability of a model to identify negative results, that is, the proportion of negative instances predicted as negative:

user=> (specificity :positive conf-matrix)
186/193
user=> (specificity :negative conf-matrix)
34/35

F1 Score

F1 Score is a weighted average of the precision and recall of a given class:

user=> (f1-score :positive conf-matrix)
102/107
user=> (f1-score :negative conf-matrix)
186/191

References

  1. Accurary and precision
  2. Precision and recall

Extending The Classifier

There are several ways to change the way the classifier works.

Supported Classes

Change the [:classes] setting to the classes you want to use. For example, if you are building a spam classifier:

(use 'judgr.settings)

(def my-settings
  (update-settings settings
                   [:classes] [:ham :spam]
                   [:classifier :default :thresholds] {:ham 1.2
                                                       :spam 2.5}))

Note that we also specified thresholds for the new classes.

Feature Extraction

English And Brazilian Portuguese

We provide simple implementations for English (default) and Brazilian Portuguese, based on the work done in Apache Lucene.

Providing Your Own Feature Extractor

The first thing you have to do is create a type that extends the FeatureExtractor protocol:

(ns your-ns
  (:use [judgr.extractor.base]))

(deftype CustomExtractor [settings]
  FeatureExtractor

  (extract-features [fe item]
    ;; Feature extraction logic here
    ))

Finally, define a new method for extractor-from multimethod that knows how to create a new instance of CustomExtractor:

(ns your-ns
  (:use [judgr.core]))

(defmethod extractor-from :custom [settings]
  (CustomExtractor. settings))

To use the new extractor, just create a new settings map with [:extractor :type] setting configured to :custom, the same key used in defmethod:

user=> (use 'judgr.settings)
nil
user=> (def my-settings
         (update-settings settings
                          [:extractor :type] :custom))
#'user/my-settings
user=> (extractor-from my-settings)
#<CustomExtractor ...>

Database Integration

Memory

In-memory integration is enabled by default.

Third-Party Database Support

There are ready-to-use integration packages for other databases:

Providing Your Own Database Layer

The procedure is similar to what was shown in Providing Your Own Feature Extractor section.

First, create a new type that extends the FeatureDB protocol:

(ns your-ns
  (:use [judgr.db.base]))

(deftype CustomDB [settings]
  FeatureDB

  (add-item! [db item class]
    ;; ...
    )

  ;; Implement the other methods
)

Then, define a new method for db-from multimethod that knows how to create a new instance of CustomDB:

(ns your-ns
  (:use [judgr.core]))

(defmethod db-from :custom [settings]
  (CustomDB. settings))

To use the new database layer, just create a new settings map with [:database :type] setting configured to :custom, the same key used in defmethod:

user=> (use 'judgr.settings)
nil
user=> (def my-settings
         (update-settings settings
                          [:database :type] :custom))
#'user/settings
user=> (db-from my-settings)
#<CustomDB ...>

Classifier Implementation

Default Classifier

There's a default classifier implementation that should be enough for most cases since it is already fairly configurable.

Threshold Validation

If threshold validation is enabled, i.e. [:classifier :default :threshold?] setting is true, an item will only be flagged as a class if its probability is at least X times greater than the second highest probability. The threshold for each class can be configured in [:classifier :default :thresholds] setting:

(use 'judgr.settings)

(def my-settings
  (update-settings settings
                   [:classifier :default :threshold?] true
                   [:classifier :default :thresholds] {:positive 1 :negative 2}))

If the probabilities for an item are {:positive 0.45 :negative 0.55}, and their thesholds are 1 and 2, respectively, the item will be flagged with the value defined in [:classifier :default :unknown-class] setting, which is :unknown by default.

Smoothing

Smoothing is enabled by default, and it's useful to deal with unknown features by not returning a flat zero probability.

You can change the [:classifier :default :smoothing-factor] setting to change the smoothing intensity, although the default value is usually good enough:

(use 'judgr.settings)

(def my-settings
  (update-settings settings
                   [:classifier :default :smoothing-factor] 0.7))

Although it's not recommended, you can turn smoothing off by changing the [:classifier :default :smoothing-factor] setting to zero.

Using Unbiased Class Probabilities

By default, class probabilities are calculated in a biased fashion, that is, considering the number of items flagged in each class. For example, considering smoothing is disabled, if there's no item flagged as :negative, the probability P(negative) = 0. Similarly, if there's 3 negative items out of 10, then P(negative) = 3/10.

If the [:classifier :default :unbiased?] setting is configured to true, the probability P(any_class) = 1/(number_of_classes):

(use 'judgr.settings)

(def my-settings
  (update-settings settings
                   [:classifier :default :unbiased?] true))
```

### Providing Your Own Classifier

First, create a new type that extends the `Classifier` protocol:

````clojure

(ns your-ns
  (:use [judgr.classifier.base]))

(deftype CustomClassifier [settings db extractor]
  Classifier

  (train! [c item class]
    ;; ...
    )

  ;; Implement the other methods
)

Then, define a new method for classifier-from multimethod that knows how to create a new instance of CustomClassifier:

(ns your-ns
  (:use [judgr.core]))

(defmethod classifier-from :custom [settings]
  (let [db (db-from settings)
        extractor (extractor-from settings)]
    (CustomClassifier. settings db extractor)))

To use the new classifier, just create a new settings map with [:classifier :type] setting configured to :custom, the same key used in defmethod:

user=> (use 'judgr.settings)
nil
user=> (def my-settings
         (update-settings settings
                          [:classifier :type] :custom))
#'user/settings
user=> (classifier-from my-settings)
#<CustomClassifier ...>

Donate

If this project is useful for you, buy me a beer!

Bitcoin: bc1qtwyfcj7pssk0krn5wyfaca47caar6nk9yyc4mu

License

Copyright (C) Daniel Fernandes Martins

Distributed under the New BSD License. See COPYING for further details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].