All Projects → coreygirard → classy

coreygirard / classy

Licence: other
Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to classy

text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (+25%)
Mutual labels:  classifier, text, classification
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+1083.33%)
Mutual labels:  classifier, classification
simple-image-classifier
Simple image classifier microservice using tensorflow and sanic
Stars: ✭ 22 (+83.33%)
Mutual labels:  classifier, simple
Multi Matcher
simple rules engine
Stars: ✭ 84 (+600%)
Mutual labels:  classifier, classification
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (+50%)
Mutual labels:  classifier, classification
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (+66.67%)
Mutual labels:  classifier, classification
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (+216.67%)
Mutual labels:  classifier, classification
Route
Simple isomorphic router
Stars: ✭ 199 (+1558.33%)
Mutual labels:  minimal, simple
Keras transfer cifar10
Object classification with CIFAR-10 using transfer learning
Stars: ✭ 120 (+900%)
Mutual labels:  classifier, classification
bayes
naive bayes in php
Stars: ✭ 61 (+408.33%)
Mutual labels:  classifier, bayes
ytmous
Anonymous Youtube Proxy
Stars: ✭ 60 (+400%)
Mutual labels:  simple, easy
NN-scratch
Coding up a Neural Network Classifier from Scratch
Stars: ✭ 78 (+550%)
Mutual labels:  classifier, classification
Riot
Simple and elegant component-based UI library
Stars: ✭ 14,596 (+121533.33%)
Mutual labels:  minimal, simple
Mu
The μ css framework — a 1 ko css file.
Stars: ✭ 202 (+1583.33%)
Mutual labels:  minimal, simple
ByteCopy
Simple C99 program and API for copying files.
Stars: ✭ 16 (+33.33%)
Mutual labels:  minimal, simple
Awesome Fraud Detection Papers
A curated list of data mining papers about fraud detection.
Stars: ✭ 843 (+6925%)
Mutual labels:  classifier, classification
Bonsai
🌱 a tiny distro-independent package manager
Stars: ✭ 188 (+1466.67%)
Mutual labels:  minimal, simple
Styled React Boilerplate
Minimal & Modern boilerplate for building apps with React & styled-components
Stars: ✭ 198 (+1550%)
Mutual labels:  minimal, simple
Sytora
A sophisticated smart symptom search engine
Stars: ✭ 111 (+825%)
Mutual labels:  classifier, classification
pyAudioProcessing
Audio feature extraction and classification
Stars: ✭ 165 (+1275%)
Mutual labels:  classifier, classify

classy

Build Status
Code style: black

What

Easy lightweight text classification via Naive Bayes. Give it a small set of example data, and it will classify similar inputs with blazing speed and pretty good accuracy.

Why

Making a basic chatbot? Want to do basic auto-suggestion of misspelled commands? Don't want to bust out TensorFlow or scikit-learn? Classy's got you covered.

How

Classify some data (probably by hand) into a dict:

data = {'lights':['Could you turn my lights off?',
                  'Turn my lights off',
                  'Are my lights off?',
                  'All lights off, please',
                  'Turn some lights on',
                  'Which bulbs are on?'],
        'alarm': ['Set an alarm for tomorrow at 6:00',
                  'What time is my alarm?',
                  'When will I wake up tomorrow?',
                  'What time is wakeup tomorrow?']}

Create the Classifier object:

import classy
c = classy.Classifier(data)

To classify text, simply use .classify:

c.classify('Which of my lights are off?')
{'lights': 0.9981515711645101, 'alarm': 0.0018484288354898338}


Advanced

If you wish to only receive a single classified label, rather than a full dict of probabilities, set the .threshold property. The .classify() method will then return the label of the matched class, or None if all probabilities are below the threshold (ie, if the classifier is uncertain). Behavior for .threshold values <= 0.5 is undefined.

c = classy.Classifier(data)
c.threshold = 0.9
c.classify('Which of my lights are off?')
c.classify('Some words we've never seen before')
'lights'
None

Classy by default performs minimal preprocessing of incoming text, equivalent to:

def parse(text):
    # makes all uppercase characters lowercase
    text = text.lower()

    # removes all except alphanumerics and spaces
    text = re.sub(r'[^a-z0-9 ]',r'',text)

    # splits by spaces, and discards all empty strings
    return [i for i in text.split(' ') if i != '']

If you wish to supply a custom string parsing function, simply provide it as the f argument when creating a Classifier object:

def newParse(t):
    return [i for i in t.split(',') if i != '']

c = classy.Classifier(data,f=newParse)

Tricks

Want a bare-bones spellchecker over a (very) limited set of inputs?

def allSubsets(text):
    temp = []
    for a in range(len(text)):
        for b in range(a,len(text)):
            temp.append(text[a:b+1])
    return temp

data = {'push':['push'],
        'commit':['commit'],
        'pull':['pull'],
        'diff':['diff']}

c = Classifier(data,f=allSubsets,threshold=0.6)

c.classify('commot')
c.classify('cpmmot')
c.classify('pulll')
c.classify('diffg')
'commit'
'commit'
'pull'
'diff'

The allSubsets function enumerates all substrings of its input, which does a reasonably decent job of spellchecking when paired with Naive Bayes. This usage should be considered a cool trick temporary hackjob, as there are many, many better ways to do this, for example Levenshtein distance.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].