All Projects → Hellisotherpeople → Active-Explainable-Classification

Hellisotherpeople / Active-Explainable-Classification

Licence: GPL-3.0 license
A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Active-Explainable-Classification

kserve
Serverless Inferencing on Kubernetes
Stars: ✭ 1,621 (+5689.29%)
Mutual labels:  sklearn, model-interpretability
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-21.43%)
Mutual labels:  document-classification, active-learning
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-3.57%)
Mutual labels:  sklearn, word-embeddings
human-in-the-loop-machine-learning-tool-tornado
Tornado is a human-in-the-loop machine learning framework that helps you exploit your unlabelled data to train models through a simple and easy to use web interface.
Stars: ✭ 37 (+32.14%)
Mutual labels:  sklearn, active-learning
overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
Stars: ✭ 41 (+46.43%)
Mutual labels:  sklearn, word-embeddings
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+142.86%)
Mutual labels:  word-embeddings
GroupDocs.Classification-for-.NET
GroupDocs.Classification-for-.NET samples and showcase (text and documents classification and sentiment analysis)
Stars: ✭ 38 (+35.71%)
Mutual labels:  document-classification
skutil
NOTE: skutil is now deprecated. See its sister project: https://github.com/tgsmith61591/skoot. Original description: A set of scikit-learn and h2o extension classes (as well as caret classes for python). See more here: https://tgsmith61591.github.io/skutil
Stars: ✭ 29 (+3.57%)
Mutual labels:  sklearn
S-WMD
Code for Supervised Word Mover's Distance (SWMD)
Stars: ✭ 90 (+221.43%)
Mutual labels:  word-embeddings
CS-7641-assignments
CS 7641 - All the code
Stars: ✭ 135 (+382.14%)
Mutual labels:  sklearn
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+342.86%)
Mutual labels:  word-embeddings
Arabic-Word-Embeddings-Word2vec
Arabic Word Embeddings Word2vec
Stars: ✭ 26 (-7.14%)
Mutual labels:  word-embeddings
dasem
Danish Semantic analysis
Stars: ✭ 17 (-39.29%)
Mutual labels:  word-embeddings
emotiw2017
Emotiw2017 code
Stars: ✭ 16 (-42.86%)
Mutual labels:  sklearn
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+564.29%)
Mutual labels:  document-classification
differential-privacy-bayesian-optimization
This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"
Stars: ✭ 22 (-21.43%)
Mutual labels:  active-learning
sklearn-pmml-model
A library to parse and convert PMML models into Scikit-learn estimators.
Stars: ✭ 71 (+153.57%)
Mutual labels:  sklearn
KMeans elbow
Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'
Stars: ✭ 35 (+25%)
Mutual labels:  sklearn
MorphologicalPriorsForWordEmbeddings
Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings
Stars: ✭ 53 (+89.29%)
Mutual labels:  word-embeddings
src
tools for fast reading of docs
Stars: ✭ 40 (+42.86%)
Mutual labels:  active-learning

Active-Explainable-Classification

A set of tools for leveraging active learning and model explainability for effecient document classification

Note: A webapp which implements much of the functionality of this repo (minus the active labeling part, at this time anyway!) can be found here: https://huggingface.co/spaces/Hellisotherpeople/Interpretable_Text_Classification_And_Clustering

What is this?

One component of my vision of FULLY AUTOMATED competative debate case production.

I want to take in massive sums of articles from a news API which will be placed in their corresponding file based on where my classifier says I should put them. I have to generate my own labeled data for this. That is a problem. Most people don't realize that the sample effeciency in models which utilize transfer learning is so great that AI-assisted data labeling is extremely useful and can significantly shorten what is ordinarily a painful data labeling process.

  1. We need a way to quickly create word embedding powered document classifiers which learn with a human in the loop. For some classes, an extremely limited number of examples may be all that is necessary to get results that a user would consider to be succesful for their task.

  2. I want to know what my model is learning - so I integrate the word embeddings avalible with Flair, combine with Classifiers in Sklearn and Tensorflow/Keras/PyTorch, and finish it off with a nice squeeze of the LIME algorithim for model interpretability (implemented within the ELI5 Library)

TODO:

  • Integrate Uncertainty measurments and only have it prompt to label those examples (self label what it's certain about)
  • Finish README - Cite relavent technologies and papers
  • Documentation/Examples/Installation Instructions
  • More examples
  • Enable Cross Validation and Grid Search
  • Figure out better way to store embeddings (stop moving the embeddings from GPU to CPU ineffeciently)

Changelog:

8/12/2019 -

  • got Keras RNN/CNN working!
  • Now Prints out misclassified examples in validation test set and we see probabilities.
  • Easy to switch between MLP/CNN/RNN

8/9/2019 -

  • Added model, model weights, updated dataset, and misc code updates
  • Tried to get Keras LSTM/CNN to work (failed)
  • Experimented with different settings on LIME.

8/8/2019 -

  • Added Keras model support - now utilizes by default KNN if not in Keras mode and a Neural Network if in Keras mode.
  • Added HTML exporting of model explanations. - Thank you ELI5!
  • Tested the possibility of doing Multilabel classification with TextExplainer... doesn't seem to work :(
  • Added pictures

Examples

Toy example of a possible debate classifier seperating between 11 classes

ANB = Antiblackness, CAP = Capitalism, ECON = Economy, EDU = Education, ENV = Environment, EX = Extinction, FED = Federalism, HEG = Hegemony, NAT = Natives, POL = Politics, TOP = Topicality

Top matrix is a confusion matrix of my validation set

This classifier gets 75% accuracy (~150 examples in train set, 0.2 * 150 in val set)

Bottom matrix is showing classification probabilities for each individual example in my validation set.

Takes in documents from the user using Standard Input - Then the model classifies, explains why it classified the way it did, and asks the user if the predicted label is the ground truth or not. User supplies the ground truth, the model incrementally trains on the new example, and that new example (with human supplied label) is appended to my dataset and the cycle continues. This is called active learning

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].