Hellisotherpeople / Active-Explainable-Classification

Licence: GPL-3.0 license

A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification

Programming Languages

HTML

75241 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Active-Explainable-Classification

kserve

Serverless Inferencing on Kubernetes

Stars: ✭ 1,621 (+5689.29%)

Mutual labels: sklearn, model-interpretability

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-21.43%)

Mutual labels: document-classification, active-learning

lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (-3.57%)

Mutual labels: sklearn, word-embeddings

human-in-the-loop-machine-learning-tool-tornado

Tornado is a human-in-the-loop machine learning framework that helps you exploit your unlabelled data to train models through a simple and easy to use web interface.

Stars: ✭ 37 (+32.14%)

Mutual labels: sklearn, active-learning

overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification

NLP tutorial

Stars: ✭ 41 (+46.43%)

Mutual labels: sklearn, word-embeddings

word2vec-on-wikipedia

A pipeline for training word embeddings using word2vec on wikipedia corpus.

Stars: ✭ 68 (+142.86%)

Mutual labels: word-embeddings

GroupDocs.Classification-for-.NET

GroupDocs.Classification-for-.NET samples and showcase (text and documents classification and sentiment analysis)

Stars: ✭ 38 (+35.71%)

Mutual labels: document-classification

skutil

NOTE: skutil is now deprecated. See its sister project: https://github.com/tgsmith61591/skoot. Original description: A set of scikit-learn and h2o extension classes (as well as caret classes for python). See more here: https://tgsmith61591.github.io/skutil

Stars: ✭ 29 (+3.57%)

Mutual labels: sklearn

S-WMD

Code for Supervised Word Mover's Distance (SWMD)

Stars: ✭ 90 (+221.43%)

Mutual labels: word-embeddings

CS-7641-assignments

CS 7641 - All the code

Stars: ✭ 135 (+382.14%)

Mutual labels: sklearn

compress-fasttext

Tools for shrinking fastText models (in gensim format)

Stars: ✭ 124 (+342.86%)

Mutual labels: word-embeddings

Arabic-Word-Embeddings-Word2vec

Arabic Word Embeddings Word2vec

Stars: ✭ 26 (-7.14%)

Mutual labels: word-embeddings

dasem

Danish Semantic analysis

Stars: ✭ 17 (-39.29%)

Mutual labels: word-embeddings

emotiw2017

Emotiw2017 code

Stars: ✭ 16 (-42.86%)

Mutual labels: sklearn

banglabert

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…

Stars: ✭ 186 (+564.29%)

Mutual labels: document-classification

differential-privacy-bayesian-optimization

This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"

Stars: ✭ 22 (-21.43%)

Mutual labels: active-learning

sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.

Stars: ✭ 71 (+153.57%)

Mutual labels: sklearn

KMeans elbow

Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'

Stars: ✭ 35 (+25%)

Mutual labels: sklearn

MorphologicalPriorsForWordEmbeddings

Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings

Stars: ✭ 53 (+89.29%)

Mutual labels: word-embeddings

src

tools for fast reading of docs

Stars: ✭ 40 (+42.86%)

Mutual labels: active-learning

View All Similar Projects ➔

Active-Explainable-Classification

A set of tools for leveraging active learning and model explainability for effecient document classification

Note: A webapp which implements much of the functionality of this repo (minus the active labeling part, at this time anyway!) can be found here: https://huggingface.co/spaces/Hellisotherpeople/Interpretable_Text_Classification_And_Clustering

What is this?

One component of my vision of FULLY AUTOMATED competative debate case production.

I want to take in massive sums of articles from a news API which will be placed in their corresponding file based on where my classifier says I should put them. I have to generate my own labeled data for this. That is a problem. Most people don't realize that the sample effeciency in models which utilize transfer learning is so great that AI-assisted data labeling is extremely useful and can significantly shorten what is ordinarily a painful data labeling process.

We need a way to quickly create word embedding powered document classifiers which learn with a human in the loop. For some classes, an extremely limited number of examples may be all that is necessary to get results that a user would consider to be succesful for their task.
I want to know what my model is learning - so I integrate the word embeddings avalible with Flair, combine with Classifiers in Sklearn and Tensorflow/Keras/PyTorch, and finish it off with a nice squeeze of the LIME algorithim for model interpretability (implemented within the ELI5 Library)

TODO:

Integrate Uncertainty measurments and only have it prompt to label those examples (self label what it's certain about)
Finish README - Cite relavent technologies and papers
Documentation/Examples/Installation Instructions
More examples
Enable Cross Validation and Grid Search
Figure out better way to store embeddings (stop moving the embeddings from GPU to CPU ineffeciently)

Changelog:

8/12/2019 -

got Keras RNN/CNN working!
Now Prints out misclassified examples in validation test set and we see probabilities.
Easy to switch between MLP/CNN/RNN

8/9/2019 -

Added model, model weights, updated dataset, and misc code updates
Tried to get Keras LSTM/CNN to work (failed)
Experimented with different settings on LIME.

8/8/2019 -

Added Keras model support - now utilizes by default KNN if not in Keras mode and a Neural Network if in Keras mode.
Added HTML exporting of model explanations. - Thank you ELI5!
Tested the possibility of doing Multilabel classification with TextExplainer... doesn't seem to work :(
Added pictures

Examples

Toy example of a possible debate classifier seperating between 11 classes

ANB = Antiblackness, CAP = Capitalism, ECON = Economy, EDU = Education, ENV = Environment, EX = Extinction, FED = Federalism, HEG = Hegemony, NAT = Natives, POL = Politics, TOP = Topicality

Top matrix is a confusion matrix of my validation set

This classifier gets 75% accuracy (~150 examples in train set, 0.2 * 150 in val set)

Bottom matrix is showing classification probabilities for each individual example in my validation set.

Takes in documents from the user using Standard Input - Then the model classifies, explains why it classified the way it did, and asks the user if the predicted label is the ground truth or not. User supplies the ground truth, the model incrementally trains on the new example, and that new example (with human supplied label) is appended to my dataset and the cycle continues. This is called active learning

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Hellisotherpeople / Active-Explainable-Classification

Programming Languages

Labels

Projects that are alternatives of or similar to Active-Explainable-Classification

Active-Explainable-Classification

What is this?

Examples