Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

All the notebooks for the analysis of Emotional Arcs within the Project Gutenberg corpus, see "The emotional arcs of stories are dominated by six basic shapes"

Stars: ✭ 15 (-25%)

Mutual labels: jupyter-notebook

Seq 2 Seq Ocr

Handwritten text recognition with Keras

Stars: ✭ 15 (-25%)

Mutual labels: jupyter-notebook

Lstm Sentiment Analysis

Sentiment Analysis with LSTMs in Tensorflow

Stars: ✭ 886 (+4330%)

Mutual labels: jupyter-notebook

Tensorflow2 Generative Models

Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab.

Stars: ✭ 883 (+4315%)

Mutual labels: jupyter-notebook

Azure Webapp W Cntk

Deployment template for Azure WebApp, CNTK, Python 3 (x64) and sample model

Stars: ✭ 15 (-25%)

Mutual labels: jupyter-notebook

Intrusion Detection System

I have tried some of the machine learning and deep learning algorithm for IDS 2017 dataset. The link for the dataset is here: http://www.unb.ca/cic/datasets/ids-2017.html. By keeping Monday as the training set and rest of the csv files as testing set, I tried one class SVM and deep CNN model to check how it works. Here the Monday dataset contains only normal data and rest of the days contains both normal and attacked data. Also, from the same university (UNB) for the Tor and Non Tor dataset, I tried K-means clustering and Stacked LSTM models in order to check the classification of multiple labels.

Stars: ✭ 20 (+0%)

Mutual labels: jupyter-notebook

Anda

Code for our ICAR 2019 paper "ANDA: A Novel Data Augmentation Technique Applied to Salient Object Detection"

Stars: ✭ 20 (+0%)

Mutual labels: jupyter-notebook

Ud810 Intro Computer Vision

My solutions for Udacity's "Introduction to Computer Vision" MOOC

Stars: ✭ 15 (-25%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Altair-catplot

A utility to use Altair to generate box plots, jitter plots, and ECDFs, i.e. plots with a categorical variable where a data transformation not covered in Altair is required.

Motivation

Altair is a Python interface for Vega-Lite. The resulting plots are easily displayed in JupyterLab and/or exported. The grammar of Vega-Lite which is largely present in Altair is well-defined, well-documented, and clear. This is one of many strong features of Altair and Vega-Lite.

There is always a trade-off when using high level plotting libraries. You can rapidly make plots, but they are less configurable. The developers of Altair have (wisely, in my opinion) adhered to the grammar of Vega-Lite. If Vega-Lite does not have a feature, Altair does not try to add it.

The developers of Vega-Lite have an have plans to add more functionality. Indeed, in the soon to be released (as of August 23, 2018) Vega-Lite 3.0, box plots are included. Adding a jitter transform is also planned. It would be useful to be able to conveniently make jitter and box plots with the current features of Vega-Lite and Altair. I wrote Altair-catplot to fill in this gap until the functionality is implemented in Vega-Lite and Altair.

The box plots and jitter plots I have in mind apply to the case where one axis is quantitative and the other axis is nominal or ordinal (that is, categorical). So, we are making plots with one categorical variable and one quantitative. Hence the name, Altair-catplot.

Installation

You can install altair-catplot using pip. You will need to have a recent version of Altair and all of its dependencies installed.

pip install altair_catplot

Usage

I will import Altair-catplot as altcat, and while I'm at it will import the other modules we need.

import numpy as np
import pandas as pd

import altair as alt
import altair_catplot as altcat

Every plot is made using the altcat.catplot() function. It has the following call signature.

catplot(data=None,
        height=Undefined,
        width=Undefined, 
        mark=Undefined,
        encoding=Undefined,
        transform=None,
        sort=Undefined,
        jitter_width=0.2,
        box_mark=Undefined,
        whisker_mark=Undefined,
        box_overlay=False,
        **kwargs)

The data, mark, encoding, and transform arguments must all be provided. The data, mark, and encoding fields are as for alt.Chart(). Note that these are specified as constructor attributes, not as you would using Altair's more idiomatic methods like mark_point(), encode(), etc.

In this package, I consider a box plot, jitter plot, or ECDF to be transforms of the data, as they are constructed by performing some aggegration of transformation to the data. The exception is for a box plot, since in Vega-Lite 3.0+'s specification for box plots, where boxplot is a mark.

The utility is best shown by example, so below I present several.

Sample data

To demonstrate usage, I will first create a data frame with sample data for plotting.

np.random.seed(4288233)

data = {'data ' + str(i): np.random.normal(*musig, size=50) 
            for i, musig in enumerate(zip([0, 1, 2, 3], [1, 1, 2, 3]))}

df = pd.DataFrame(data=data).melt()
df['dummy metadata'] = np.random.choice(['poodle', 'beagle', 'collie', 'dalmation', 'terrier'],
                                        size=len(df))

df.head()

	variable	value	dummy metadata
0	data 0	1.980946	collie
1	data 0	-0.442286	dalmation
2	data 0	1.093249	terrier
3	data 0	-0.233622	collie
4	data 0	-0.799315	dalmation

The categorical variable is 'variable' and the quantitative variable is 'value'.

Box plot

We can create a box plot as follows. Note that the mark is a string specifying a box plot (as will be in the future with Altair), and the encoding is specified as a dictionary of key-value pairs.

altcat.catplot(df,
               mark='boxplot',
               encoding=dict(x='value:Q',
                             y=alt.Y('variable:N', title=None),
                             color=alt.Color('variable:N', legend=None)))

This box plot can be generated in future editions of Altair after Vega-Lite 3.0 is formally released as follows.

alt.Chart(df
    ).mark_boxplot(
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

The resulting plot looks different from what I have shown here, using instead the Vega-Lite defaults. Specifically, the whiskers are black and do not have caps, and the boxes are thinner. You can check it out here.

Because box plots are unique in that they are specified with a mark and not a transform, we could use the mark argument above to specify a box plot. We could equivalently do it with the transform argument. (Note that this will not be possible when box plots are implemented in Altair.)

box = altcat.catplot(df,
                     encoding=dict(y=alt.Y('variable:N', title=None),
                                   x='value:Q',
                                   color=alt.Color('variable:N', legend=None)),
                     transform='box')
box

type(box)

altair.vegalite.v2.api.LayerChart

We can independently specify properties of the box and whisker marks using the box_mark and whisker_mark kwargs. For example, say we wanted our colors to be Betancourt red.

altcat.catplot(df,
               mark=dict(type='point', color='#7C0000'),
               box_mark=dict(color='#7C0000'),
               whisker_mark=dict(strokeWidth=2, color='#7C0000'),
               encoding=dict(x='value:Q',
                             y=alt.Y('variable:N', title=None)),
               transform='box')

Jitter plot

I try my best to subscribe to the "plot all of your data" philosophy. To that end, a strip plot is a useful way to show all of the measurements. Here is one way to make a strip plot in Altair.

alt.Chart(df
    ).mark_tick(
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

The problem with strip plots is that they can have trouble with overlapping data point. A common approach to deal with this is to "jitter," or place the glyphs with small random displacements along the categorical axis. This involves using a jitter transform. While the current release candidate for Vega-Lite 3.0 has box plot capabilities, it does not have a jitter transform, though that will likely be coming in the future (see here and here). Have a proper transform where data points are offset, but the categorial axis truly has nominal or ordinal value is desired, but not currently possible. The jitter plot here is a hack wherein the axes are quantitative and the tick labels and actually carefully placed text. This means that the "axis labels" will be wrecked if you try interactivity with the jitter plot. Nonetheless, tooltips still work.

jitter = altcat.catplot(df,
                        height=250,
                        width=450,
                        mark='point',
                        encoding=dict(y=alt.Y('variable:N', title=None),
                                      x='value:Q',
                                      color=alt.Color('variable:N', legend=None),
                                      tooltip=alt.Tooltip(['dummy metadata:N'], title='breed')),
                        transform='jitter')
jitter

Alternatively, we could color the jitter points with the dummy metadata.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('dummy metadata:N', title='breed')),
               transform='jitter')

Jitter-box plots

Even while plotting all of the data, we sometimes was to graphically display summary statistics. We could (in Vega-Lite 3.0) make a strip-box plot, in which we have a strip plot overlayed on a box plot. In the future, you can generate this using Altais as follows.

strip = alt.Chart(df
    ).mark_point(
        opacity=0.3
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None),
        color=alt.Color('variable:N', legend=None)
    )

box = alt.Chart(df
    ).mark_boxplot(
        color='lightgray'
    ).encode(
        x='value:Q',
        y=alt.Y('variable:N', title=None)
    )

box + strip

The result may be viewed here.

The strip-box plots have the same issue as strip plots and could stand to have a little jitter. Jitter-box plots consist of a jitter plot overlayed with a box plot. Why not just make a box plot and a jitter plot and then compose them using Altair's nifty composition capabilities as I did in the plot I just described? We cannot do that because box plots have a truly categorical axis, but jitter plots have a hacked "categorical" axis that is really quantitative, so we can't overlay. We can try. The result is not pretty.

box + jitter

Instead, we use 'jitterbox' for our transform. The default color for the boxes and whiskers is light gray.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('variable:N', legend=None)),
               transform='jitterbox')

Note that the mark kwarg applies to the jitter plot. If we want to make specifications about the boxes and whiskers we need to separately specify them using the box_mark and whisker_mark kwargs as we did with box plots. Note that if the box_mark and whisker_mark are specified and their color is not explicitly included in the specification, their color matches the specification for the jitter plot.

altcat.catplot(df,
               height=250,
               width=450,
               mark='point',
               box_mark=dict(strokeWidth=2, opacity=0.5),
               whisker_mark=dict(strokeWidth=2, opacity=0.5),
               encoding=dict(y=alt.Y('variable:N', title=None),
                             x='value:Q',
                             color=alt.Color('variable:N', legend=None)),
               transform='jitterbox')

ECDFs

An empirical cumulative distribution function, or ECDF, is a convenient way to visualize a univariate probability distribution. Consider a measurement x in a set of measurements X. The ECDF evaluated at x is defined as

ECDF(x) = fraction of data points in X that are ≤ x.

To generate ECDFs colored by category, we use the 'ecdf' transform.

altcat.catplot(df,
               mark='line',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='ecdf')

Note that here we have chosen to represent the ECDF as a line, which is a more formal way of plotting the ECDF. We could, without loss of information, plot the "corners of the steps", which represent the actual measurements that were made. We do this by specifying the mark as 'point'.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='ecdf')

This kind of plot can be easily made directly using Pandas and Altair by adding a column to the data frame containing the y-values of the ECDF.

df['ECDF'] = df.groupby('variable')['value'].transform(lambda x: x.rank(method='first') / len(x))

alt.Chart(df
    ).mark_point(
    ).encode(
        x='value:Q',
        y='ECDF:Q',
        color='variable:N'
    )

This, however, is not possible when making a formal line plot of the ECDF.

An added advantage of plotting the ECDF as dots, which represent individual measurements, is that we can color the points. We may instead which to show the ECDF over all measurements and color the dots by the categorical variable. We do that using the colored_ecdf transform.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             color='variable:N'),
               transform='colored_ecdf')

ECCDFs

We may also make a complementary empirical cumulative distribution, an ECCDF. This is defined as

ECCDF(x) = 1 - ECDF(x).

These are often useful when looking for powerlaw-like behavior in you want the ECCDF axis to have a logarithmic scale.

altcat.catplot(df,
               mark='point',
               encoding=dict(x='value:Q',
                             y=alt.Y('ECCDF:Q', scale=alt.Scale(type='log')),
                             color='variable:N'),
               transform='eccdf')

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 20

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗