All Projects → justinbois → Altair Catplot

justinbois / Altair Catplot

Licence: mit
Utility to generate plots with categorical variables using Altair.

Projects that are alternatives of or similar to Altair Catplot

Mnist 1stclass on aznotebook
給想體驗如何在微軟AZ notebook上操作GPU及做圖像辨識(以MNIST資料集為例)
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Facealignmentcompare
Empirical Study of Recent Face Alignment Methods
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Deep Learning Experiments
Notes and experiments to understand deep learning concepts
Stars: ✭ 883 (+4315%)
Mutual labels:  jupyter-notebook
Pytorch Bicubic Interpolation
Bicubic interpolation for PyTorch
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Mj583
J583 Advanced Interactive Media
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Syde 522
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Tutorials
A project for developing tutorials for Streams
Stars: ✭ 14 (-30%)
Mutual labels:  jupyter-notebook
Grab Aiforsea
Entry for Grab's AI for S.E.A. challenge
Stars: ✭ 20 (+0%)
Mutual labels:  jupyter-notebook
Deepbayes
Bayesian methods in deep learning Summer School
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Sage
Home of the Semi-Analytic Galaxy Evolution (SAGE) galaxy formation model
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Nearby exoplanet map
Creating a stylised pseudo-3D exoplanet map using matplotlib
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Udacity Deep Learning Nanodegree
This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Core Stories
All the notebooks for the analysis of Emotional Arcs within the Project Gutenberg corpus, see "The emotional arcs of stories are dominated by six basic shapes"
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Seq 2 Seq Ocr
Handwritten text recognition with Keras
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Lstm Sentiment Analysis
Sentiment Analysis with LSTMs in Tensorflow
Stars: ✭ 886 (+4330%)
Mutual labels:  jupyter-notebook
Tensorflow2 Generative Models
Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab.
Stars: ✭ 883 (+4315%)
Mutual labels:  jupyter-notebook
Azure Webapp W Cntk
Deployment template for Azure WebApp, CNTK, Python 3 (x64) and sample model
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook
Intrusion Detection System
I have tried some of the machine learning and deep learning algorithm for IDS 2017 dataset. The link for the dataset is here: http://www.unb.ca/cic/datasets/ids-2017.html. By keeping Monday as the training set and rest of the csv files as testing set, I tried one class SVM and deep CNN model to check how it works. Here the Monday dataset contains only normal data and rest of the days contains both normal and attacked data. Also, from the same university (UNB) for the Tor and Non Tor dataset, I tried K-means clustering and Stacked LSTM models in order to check the classification of multiple labels.
Stars: ✭ 20 (+0%)
Mutual labels:  jupyter-notebook
Anda
Code for our ICAR 2019 paper "ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection"
Stars: ✭ 20 (+0%)
Mutual labels:  jupyter-notebook
Ud810 Intro Computer Vision
My solutions for Udacity's "Introduction to Computer Vision" MOOC
Stars: ✭ 15 (-25%)
Mutual labels:  jupyter-notebook

Altair-catplot

A utility to use Altair to generate box plots, jitter plots, and ECDFs, i.e. plots with a categorical variable where a data transformation not covered in Altair is required.

Motivation

Altair is a Python interface for Vega-Lite. The resulting plots are easily displayed in JupyterLab and/or exported. The grammar of Vega-Lite which is largely present in Altair is well-defined, well-documented, and clear. This is one of many strong features of Altair and Vega-Lite.

There is always a trade-off when using high level plotting libraries. You can rapidly make plots, but they are less configurable. The developers of Altair have (wisely, in my opinion) adhered to the grammar of Vega-Lite. If Vega-Lite does not have a feature, Altair does not try to add it.

The developers of Vega-Lite have an have plans to add more functionality. Indeed, in the soon to be released (as of August 23, 2018) Vega-Lite 3.0, box plots are included. Adding a jitter transform is also planned. It would be useful to be able to conveniently make jitter and box plots with the current features of Vega-Lite and Altair. I wrote Altair-catplot to fill in this gap until the functionality is implemented in Vega-Lite and Altair.

The box plots and jitter plots I have in mind apply to the case where one axis is quantitative and the other axis is nominal or ordinal (that is, categorical). So, we are making plots with one categorical variable and one quantitative. Hence the name, Altair-catplot.

Installation

You can install altair-catplot using pip. You will need to have a recent version of Altair and all of its dependencies installed.

pip install altair_catplot

Usage

I will import Altair-catplot as altcat, and while I'm at it will import the other modules we need.

import numpy as np
import pandas as pd

import altair as alt
import altair_catplot as altcat

Every plot is made using the altcat.catplot() function. It has the following call signature.

catplot(data=None,
        height=Undefined,
        width=Undefined, 
        mark=Undefined,
        encoding=Undefined,
        transform=None,
        sort=Undefined,
        jitter_width=0.2,
        box_mark=Undefined,
        whisker_mark=Undefined,
        box_overlay=False,
        **kwargs)

The data, mark, encoding, and transform arguments must all be provided. The data, mark, and encoding fields are as for alt.Chart(). Note that these are specified as constructor attributes, not as you would using Altair's more idiomatic methods like mark_point(), encode(), etc.

In this package, I consider a box plot, jitter plot, or ECDF to be transforms of the data, as they are constructed by performing some aggegration of transformation to the data. The exception is for a box plot, since in Vega-Lite 3.0+'s specification for box plots, where boxplot is a mark.

The utility is best shown by example, so below I present several.

Sample data

To demonstrate usage, I will first create a data frame with sample data for plotting.

np.random.seed(4288233)

data = {'data ' + str(i): np.random.normal(*musig, size=50) 
            for i, musig in enumerate(zip([0, 1, 2, 3], [1, 1, 2, 3]))}

df = pd.DataFrame(data=data).melt()
df['dummy metadata'] = np.random.choice(['poodle', 'beagle', 'collie', 'dalmation', 'terrier'],
                                        size=len(df))

df.head()
variable value dummy metadata
0 data 0 1.980946 collie
1 data 0 -0.442286 dalmation
2 data 0 1.093249 terrier
3 data 0 -0.233622 collie
4 data 0 -0.799315 dalmation

The categorical variable is 'variable' and the quantitative variable is 'value'.

Box plot

We can create a box plot as follows. Note that the mark is a string specifying a box plot (as will be in the future with Altair), and the encoding is specified as a dictionary of key-value pairs.

altcat.catplot(df,
               mark='boxplot',
               encoding=dict(x='value:Q',
                             y=alt.Y('variable:N', title=None),
                             color=alt.Color('variable:N', legend=None)))

png

This box plot can be generated in future editions of Altair after Vega-Lite 3.0 is formally released as follows.

alt.Chart(df
    ).mark_boxplot(
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

The resulting plot looks different from what I have shown here, using instead the Vega-Lite defaults. Specifically, the whiskers are black and do not have caps, and the boxes are thinner. You can check it out here.

Because box plots are unique in that they are specified with a mark and not a transform, we could use the mark argument above to specify a box plot. We could equivalently do it with the transform argument. (Note that this will not be possible when box plots are implemented in Altair.)

box = altcat.catplot(df,
                     encoding=dict(y=alt.Y('variable:N', title=None),
                                   x='value:Q',
                                   color=alt.Color('variable:N', legend=None)),
                     transform='box')
box

png

type(box)
altair.vegalite.v2.api.LayerChart

We can independently specify properties of the box and whisker marks using the box_mark and whisker_mark kwargs. For example, say we wanted our colors to be Betancourt red.

altcat.catplot(df,
               mark=dict(type='point', color='#7C0000'),
               box_mark=dict(color='#7C0000'),
               whisker_mark=dict(strokeWidth=2, color='#7C0000'),
               encoding=dict(x='value:Q',
                             y=alt.Y('variable:N', title=None)),
               transform='box')

png

Jitter plot

I try my best to subscribe to the "plot all of your data" philosophy. To that end, a strip plot is a useful way to show all of the measurements. Here is one way to make a strip plot in Altair.

alt.Chart(df
    ).mark_tick(
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

png

The problem with strip plots is that they can have trouble with overlapping data point. A common approach to deal with this is to "jitter," or place the glyphs with small random displacements along the categorical axis. This involves using a jitter transform. While the current release candidate for Vega-Lite 3.0 has box plot capabilities, it does not have a jitter transform, though that will likely be coming in the future (see here and here). Have a proper transform where data points are offset, but the categorial axis truly has nominal or ordinal value is desired, but not currently possible. The jitter plot here is a hack wherein the axes are quantitative and the tick labels and actually carefully placed text. This means that the "axis labels" will be wrecked if you try interactivity with the jitter plot. Nonetheless, tooltips still work.

jitter = altcat.catplot(df,
                        height=250,
                        width=450,
                        mark='point',
                        encoding=dict(y=alt.Y('variable:N', title=None),
                                      x='value:Q',
                                      color=alt.Color('variable:N', legend=None),
                                      tooltip=alt.Tooltip(['dummy metadata:N'], title='breed')),
                        transform='jitter')
jitter

png

Alternatively, we could color the jitter points with the dummy metadata.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('dummy metadata:N', title='breed')),
               transform='jitter')

png

Jitter-box plots

Even while plotting all of the data, we sometimes was to graphically display summary statistics. We could (in Vega-Lite 3.0) make a strip-box plot, in which we have a strip plot overlayed on a box plot. In the future, you can generate this using Altais as follows.

strip = alt.Chart(df
    ).mark_point(
        opacity=0.3
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

box = alt.Chart(df
    ).mark_boxplot(
        color='lightgray'
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None)
    )

box + strip

The result may be viewed here.

The strip-box plots have the same issue as strip plots and could stand to have a little jitter. Jitter-box plots consist of a jitter plot overlayed with a box plot. Why not just make a box plot and a jitter plot and then compose them using Altair's nifty composition capabilities as I did in the plot I just described? We cannot do that because box plots have a truly categorical axis, but jitter plots have a hacked "categorical" axis that is really quantitative, so we can't overlay. We can try. The result is not pretty.

box + jitter

png

Instead, we use 'jitterbox' for our transform. The default color for the boxes and whiskers is light gray.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('variable:N', legend=None)),
               transform='jitterbox')

png

Note that the mark kwarg applies to the jitter plot. If we want to make specifications about the boxes and whiskers we need to separately specify them using the box_mark and whisker_mark kwargs as we did with box plots. Note that if the box_mark and whisker_mark are specified and their color is not explicitly included in the specification, their color matches the specification for the jitter plot.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               box_mark=dict(strokeWidth=2, opacity=0.5),
               whisker_mark=dict(strokeWidth=2, opacity=0.5),
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('variable:N', legend=None)),
               transform='jitterbox')

png

ECDFs

An empirical cumulative distribution function, or ECDF, is a convenient way to visualize a univariate probability distribution. Consider a measurement x in a set of measurements X. The ECDF evaluated at x is defined as

ECDF(x) = fraction of data points in X that are ≤ x.

To generate ECDFs colored by category, we use the 'ecdf' transform.

altcat.catplot(df,
               mark='line',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='ecdf')

png

Note that here we have chosen to represent the ECDF as a line, which is a more formal way of plotting the ECDF. We could, without loss of information, plot the "corners of the steps", which represent the actual measurements that were made. We do this by specifying the mark as 'point'.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='ecdf')

png

This kind of plot can be easily made directly using Pandas and Altair by adding a column to the data frame containing the y-values of the ECDF.

df['ECDF'] = df.groupby('variable')['value'].transform(lambda x: x.rank(method='first') / len(x))

alt.Chart(df
    ).mark_point(
    ).encode(
        x='value:Q',
        y='ECDF:Q',
        color='variable:N'
    )

png

This, however, is not possible when making a formal line plot of the ECDF.

An added advantage of plotting the ECDF as dots, which represent individual measurements, is that we can color the points. We may instead which to show the ECDF over all measurements and color the dots by the categorical variable. We do that using the colored_ecdf transform.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='colored_ecdf')

png

ECCDFs

We may also make a complementary empirical cumulative distribution, an ECCDF. This is defined as

ECCDF(x) = 1 - ECDF(x).

These are often useful when looking for powerlaw-like behavior in you want the ECCDF axis to have a logarithmic scale.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             y=alt.Y('ECCDF:Q', scale=alt.Scale(type='log')),
                             color='variable:N'),
               transform='eccdf')

png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].