All Projects → MartinThoma → clana

MartinThoma / clana

Licence: MIT License
CLANA is a toolkit for classifier analysis.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
Makefile
30231 projects

Projects that are alternatives of or similar to clana

Code Sleep Python
Awesome Projects in Python - Machine Learning Applications, Games, Desktop Applications all in Python 🐍
Stars: ✭ 306 (+992.86%)
Mutual labels:  analysis, classification
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (+203.57%)
Mutual labels:  analysis, classification
volkscv
A Python toolbox for computer vision research and project
Stars: ✭ 58 (+107.14%)
Mutual labels:  analysis, classification
unity-asset-validator
The Asset Validator is an editor tool for validating assets in the project and in scenes.
Stars: ✭ 30 (+7.14%)
Mutual labels:  mit-license
TheScopeReport
This is a Java program that calls the Jamf Pro API to collect scoping details.
Stars: ✭ 13 (-53.57%)
Mutual labels:  mit-license
Python-Machine-Learning
Python Machine Learning Algorithms
Stars: ✭ 80 (+185.71%)
Mutual labels:  classification
bundle-inspector-webpack-plugin
Bundle Inspector | Analysis Tool for Webpack
Stars: ✭ 19 (-32.14%)
Mutual labels:  analysis
msmtools
Tools for estimating and analyzing Markov state models
Stars: ✭ 31 (+10.71%)
Mutual labels:  analysis
MGT-python
Musical Gestures Toolbox for Python
Stars: ✭ 25 (-10.71%)
Mutual labels:  analysis
redis-key-dashboard
This tool allows you to do a small analysis of the amount of keys and memory you use in Redis. It allows you to see overlooked keys and notice overuse.
Stars: ✭ 42 (+50%)
Mutual labels:  analysis
mri-deep-learning-tools
Resurces for MRI images processing and deep learning in 3D
Stars: ✭ 56 (+100%)
Mutual labels:  classification
Skin-cancer-recoginition
Recognizing and localizing melanoma from other skin disease
Stars: ✭ 28 (+0%)
Mutual labels:  classification
syncopy
Systems Neuroscience Computing in Python: user-friendly analysis of large-scale electrophysiology data
Stars: ✭ 19 (-32.14%)
Mutual labels:  analysis
CNN-SoilTextureClassification
1-dimensional convolutional neural networks (CNN) for the classification of soil texture based on hyperspectral data
Stars: ✭ 35 (+25%)
Mutual labels:  classification
serverless-transformers-on-aws-lambda
Deploy transformers serverless on AWS Lambda
Stars: ✭ 100 (+257.14%)
Mutual labels:  classification
well-classified-examples-are-underestimated
Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
Stars: ✭ 21 (-25%)
Mutual labels:  classification
mdtoolbox
MDToolbox: A MATLAB/Octave toolbox for statistical analysis of molecular dynamics trajectories
Stars: ✭ 21 (-25%)
Mutual labels:  analysis
fc-solve
Freecell Solver - a C library for automatically solving Freecell and some other variants of card Solitaire
Stars: ✭ 49 (+75%)
Mutual labels:  mit-license
shellnet
ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics
Stars: ✭ 80 (+185.71%)
Mutual labels:  classification
projection-pursuit
An implementation of multivariate projection pursuit regression and univariate classification
Stars: ✭ 24 (-14.29%)
Mutual labels:  classification

DOI PyPI version Python Support Documentation Status Build Status Coverage Status Code style: black GitHub last commit GitHub commits since latest release (by SemVer) CodeFactor

clana

clana is a library and command line application to visualize confusion matrices of classifiers with lots of classes. The two key contribution of clana are Confusion Matrix Ordering (CMO) as explained in chapter 5 of Analysis and Optimization of Convolutional Neural Network Architectures and an optimization algorithm to to achieve it. The CMO technique can be applied to any multi-class classifier and helps to understand which groups of classes are most similar.

Installation

The recommended way to install clana is:

$ pip install clana --user --upgrade

If you want the latest version:

$ git clone https://github.com/MartinThoma/clana.git; cd clana
$ pip install -e . --user

Usage

$ clana --help
Usage: clana [OPTIONS] COMMAND [ARGS]...

  Clana is a toolkit for classifier analysis.

  See https://arxiv.org/abs/1707.09725, Chapter 4.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  distribution   Get the distribution of classes in a dataset.
  get-cm         Generate a confusion matrix from predictions and ground...
  get-cm-simple  Generate a confusion matrix.
  visualize      Optimize and visualize a confusion matrix.

The visualize command gives you images like this:

Confusion Matrix after Confusion Matrix Ordering of the WiLI-2018 dataset

MNIST example

$ cd docs/
$ python mnist_example.py  # creates `train-pred.csv` and `test-pred.csv`
$ clana get-cm --gt gt-train.csv  --predictions train-pred.csv --n 10
2019-09-14 09:47:30,655 - root - INFO - cm was written to 'cm.json'
$ clana visualize --cm cm.json --zero_diagonal
Score: 13475
2019-09-14 09:49:41,593 - root - INFO - n=10
2019-09-14 09:49:41,593 - root - INFO - ## Starting Score: 13475.00
2019-09-14 09:49:41,594 - root - INFO - Current: 13060.00 (best: 13060.00, hot_prob_thresh=100.0000%, step=0, swap=False)
[...]
2019-09-14 09:49:41,606 - root - INFO - Current: 9339.00 (best: 9339.00, hot_prob_thresh=100.0000%, step=238, swap=False)
Score: 9339
Perm: [0, 6, 5, 8, 3, 2, 1, 7, 9, 4]
2019-09-14 09:49:41,639 - root - INFO - Classes: [0, 6, 5, 8, 3, 2, 1, 7, 9, 4]
Accuracy: 93.99%
2019-09-14 09:49:41,725 - root - INFO - Save figure at '/home/moose/confusion_matrix.tmp.pdf'
2019-09-14 09:49:41,876 - root - INFO - Found threshold for local connection: 398
2019-09-14 09:49:41,876 - root - INFO - Found 9 clusters
2019-09-14 09:49:41,877 - root - INFO - silhouette_score=-0.012313948323292875
    1: [0]
    1: [6]
    1: [5]
    1: [8]
    1: [3]
    1: [2]
    1: [1]
    2: [7, 9]
    1: [4]

This gives

Label Manipulation

Prepare a labels.csv which has to have a header row:

$ clana visualize --cm cm.json --zero_diagonal --labels mnist/labels.csv

Data distribution

$ clana distribution --gt gt.csv --labels labels.csv [--out out/] [--long]

prints one line per label, e.g.

60% cat (56789 elements)
20% dog (12345 elements)
 5% mouse (1337 elements)
 1% tux (314 elements)

If --out is specified, it creates a horizontal bar chart. The first bar is the most common class, the second bar is the second most common class, ...

It uses the short labels, except --long is added to the command.

Visualizations

See visualizations

Usage as a library

>>> import numpy as np
>>> arr = np.array([[9, 4, 7, 3, 8, 5, 2, 8, 7, 6],
                    [4, 9, 2, 8, 5, 8, 7, 3, 6, 7],
                    [7, 2, 9, 1, 6, 3, 0, 8, 5, 4],
                    [3, 8, 1, 9, 4, 7, 8, 2, 5, 6],
                    [8, 5, 6, 4, 9, 6, 3, 7, 8, 7],
                    [5, 8, 3, 7, 6, 9, 6, 4, 7, 8],
                    [2, 7, 0, 8, 3, 6, 9, 1, 4, 5],
                    [8, 3, 8, 2, 7, 4, 1, 9, 6, 5],
                    [7, 6, 5, 5, 8, 7, 4, 6, 9, 8],
                    [6, 7, 4, 6, 7, 8, 5, 5, 8, 9]])
>>> from clana.optimize import simulated_annealing
>>> result = simulated_annealing(arr)
>>> result.cm
array([[9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
       [8, 9, 8, 7, 6, 5, 4, 3, 2, 1],
       [7, 8, 9, 8, 7, 6, 5, 4, 3, 2],
       [6, 7, 8, 9, 8, 7, 6, 5, 4, 3],
       [5, 6, 7, 8, 9, 8, 7, 6, 5, 4],
       [4, 5, 6, 7, 8, 9, 8, 7, 6, 5],
       [3, 4, 5, 6, 7, 8, 9, 8, 7, 6],
       [2, 3, 4, 5, 6, 7, 8, 9, 8, 7],
       [1, 2, 3, 4, 5, 6, 7, 8, 9, 8],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> result.perm
array([2, 7, 0, 4, 8, 9, 5, 1, 3, 6])

You can visualize the result.cm and use the result.perm to get your labels in the same order:

# Just some example labels
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
>>> labels = [str(el) for el in range(11)]
>>> np.array(labels)[result.perm]
array(['2', '7', '0', '4', '8', '9', '5', '1', '3', '6'], dtype='<U2')
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].