Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Gramm is a complete data visualization toolbox for Matlab. It provides an easy to use and high-level interface to produce publication-quality plots of complex data with varied statistical visualizations. Gramm is inspired by R's ggplot2 library.

Stars: ✭ 541 (+469.47%)

Mutual labels: data-visualization, plot

Lets Plot

An open-source plotting library for statistical data.

Stars: ✭ 531 (+458.95%)

Mutual labels: data-visualization, plot

Plotly

Plotly for Rust

Stars: ✭ 433 (+355.79%)

Mutual labels: data-visualization, plot

Moderndive book

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse

Stars: ✭ 527 (+454.74%)

Mutual labels: ggplot2, data-visualization

Soccergraphr

Soccer Analytics in R using OPTA data

Stars: ✭ 42 (-55.79%)

Mutual labels: ggplot2, data-visualization

Gr.rb

Ruby wrapper for the GR framework

Stars: ✭ 60 (-36.84%)

Mutual labels: data-visualization, plotting

View All Similar Projects ➔

ggplotnim - ggplot2 in Nim [[https://github.com/Vindaar/ggplotnim/workflows/ggplotnim%20CI/badge.svg]]

#+ATTR_HTML: title="Join the chat at https://gitter.im/SciNim/Community" [[https://gitter.im/SciNim/Community][file:https://badges.gitter.im/SciNim/Community.svg]]

This package, as the name suggests, will become a "sort of" port of [[https://ggplot2.tidyverse.org/][ggplot2]] for Nim.

It is based on the [[https://github.com/vindaar/ginger/][ginger]] package.

If you're unfamiliar with the Grammar of Graphics to create plots, one of the best resources is probably Hadley Wickham's book on =ggplot2=, for which also an online version exists at: https://ggplot2-book.org/

In general this library tries (and will continue to do so) to stay mostly compliant with the =ggplot2= syntax. So searching for a solution in =ggplot2= should hopefully be applicable to this (unless the feature isn't implemented yet of course).

** IMPORTANT NOTE on version =v0.3.0=!

=v0.3.0= contains breaking changes regarding the usage of formulas via the =f{}= macro and is mostly considered a stop-gap release until =v0.4.0= is released.

Originally =v0.3.0= was supposed to contain:

=geom_density=
=geom_contour=
working =facet_wrap=

and some miscellaneous things like updated documentation.

Instead I started a rewrite of the data frame on top of arraymancer, which was more successful, than I imagined. This sidelined my other work. But since I don't want to keep this out of ggplotnim anylonger, I made this the main part of =v0.3.0=.

=v0.4.0= will probably not take too long and will include proper documentation on the formula syntax and the above. By then however the data frame will also have been turned into its own module (probably named) =datamancer=.

Short notes on formula syntax. The following rules apply:

Use:

no infix symbol and only code, which does not involve a column in the sense defined below in [[Column access]]: #+BEGIN_SRC nim f{1 + 2} f{"aColumn"} f{true} #+END_SRC a =FormulaNode= of kind =fkVariable=. Stores the values as a =Value= variant object.
=<-= for assignment #+BEGIN_SRC nim f{"newName" <- "oldName"} #+END_SRC a =FormulaNode= of kind =fkAssign=. This does not involve a closure and is just a simple object storing a LHS as a string and the RHS as a =Value= (to also support constant columns via =f{"constantCol" <- 5}=). Typically used for =rename= or as an argument for =transmute= and =mutate= to just rename a column or to assign a constant column.
=<<= for reduce operations #+BEGIN_SRC nim f{"meanHwy" << mean(hwy)} #+END_SRC a =FormulaNode= of kind =fkScalar=. Used only for =summarize= and means we reduce a full column to a single =Value=. This generates a closure, which computes the RHS and assigns it to a result variable of type =Value=. Type hints are required (for now) if only a single proc call is involved on the RHS to tell the macro as what to read the column "hwy" and what the result variable is.
=~= for vector like proc #+BEGIN_SRC nim f{"xSquared" ~ x * x} #+END_SRC a =FormulaNode= of kind =fkVector=. Used in =mutate=, =transmute= to calculate a full column. This also generates a closure as the reduce operations =<<= does, except here we loop over the length of the DF and access each read tensor via =[idx]=.
a formula without any infix symbols will be considered:
- =fkVariable= if no column involved
- =fkVector= else

*** Column access To access columns in the context of formula, the biggest change occured. In the old formula system, a literal string was attempted to be resolved as a DF column dynamically. Since the new formulas are compiled to closures, this would involve overhead and is thus avoided for clearer separation between columns and real strings. This also helps readers of a formula. This means:

=columnName=: accented quotes refer to a DF column
=c"columnName"= : call string literals (by convention use a =c= before the string) are interpreted as a column
or directly via: =df[<someIdent/Sym>/string literal]=: to access columns using identifiers / symbols defined in the scope / or string literals (either including accented quotes, call string literals or just string literals).
=idx=: can be used to access the loop iteration index

The closures take a data frame as an argument, which is named =df=. The =df["columnName"]= refers to that argument, although not literally (it is gen'symmed and =df["columnName"]= refers to a =Column=). From that column we get the underlying =Tensor=.

In the context of calling procedures, e.g.: #+BEGIN_SRC nim f{"newCol" ~ someProc(c"columnName")} #+END_SRC it may not be clear whether the procedure is supposed to take the whole tensor as an argument or hand each element of the tensor in a loop. By default it is assumed that a given column in a call refers to a full column (/ tensor). To clarify that the proc takes a value, you have to clarify it via: #+BEGIN_SRC nim f{string -> float: "asFloat" ~ parseFloat(df["colName"][idx])}

^--- type of the tensors involved on the RHS

^--- type of the resulting tensor (the new column `asFloat`)

#+END_SRC where =parseFloat= acts on each element individually. For such a proc type hints are required, since it's not clear as what type =colName= is supposed to be read.

*** Type hints Type hints are required if the formula does not involve any more complex operations (e.g. single proc call to reduce, ...). They are of the form:

=: =: simple type hint for the type of the underlying tensor of the columns involved in the formula.
= -> : =: full type for closure. == is the dtype used for input tensors, == the resulting type.

NOTE: it is not possible to include tensors of different data types in a single formula. All input tensors of a computation will be read either by the automatically deduced data type or the == argument mentioned here. If an underlying tensor is not actually of the given data type, it will be converted via =T(val)=, where =T= is the type.

There is a step from an untyped to a typed macro involved, which tries to determine data types, but that is very experimental. Also the macro tries to guess data types based on symbols involved in the computation of the formula, e.g. if =*=, =/= is involved, it's assumed that the input tensors are floats and the output as well. If =&= or =$= is involved, it's assumed to be strings. Finally if =and= and other logic keywords are used, the result is assumed to be =bool= (not the input thought!). The full list of symbols used is found here:

https://github.com/Vindaar/ggplotnim/blob/arraymancerBackend/src/ggplotnim/dataframe/arraymancer_backend.nim#L981-L984

#+BEGIN_SRC nim const floatSet = toSet(@["+", "-", "*", "/", "mod"]) const stringSet = toSet(@["&", "$"]) const boolSet = toSet(@["and", "or", "xor", ">", "<", ">=", "<=", "==", "!=", "true", "false", "in", "notin"]) #+END_SRC

For now please mainly refer to the recipes on how to use this, because they are checked in the CI and will work for sure!

** Recipes

For a more nimish approach, check out the [[file:recipes.org][recipes]], which should give you examples for typical use cases and things I encountered and the solutions I found. Please feel free to add examples to this file to help other people!

Note that all recipes shown there are part of the test suite. So it's guaranteed that the plots shown there for a given version actually produce the shown result!

** Documentation

The documentation is found at:

https://vindaar.github.io/ggplotnim

** Installation & dependencies

Installation should be just a #+BEGIN_SRC sh nimble install ggplotnim #+END_SRC away. Maybe consider installing the =#head=, since new version probably won't be released after every change, due to rapid development still ongoing.

Since this library is written from scratch there is only a single external dependency, which is =cairo=.

*** Windows

Using =ggplotnim= on Windows is made slightly more problematic, because of the default =cairo= backend. Installing =cairo= on Windows is not as straightforward as on Linux or OSX.

There are multiple options, from most complicated to easiest:

installing a program, which also uses =cairo= on Windows, for example =emacs= and adding said program to Windows' PATH. Some instructions here: https://gist.github.com/Vindaar/6cb4e93baff3e1ab88a7ab7ed1ae5686
using @pietroppeter's approach to only install the shared libraries that are actually required, see here: https://gist.github.com/pietroppeter/80266c634b22b3861273089dab3e1af2
or to thank @preshing's work and use his standalone single DLL for =cairo= on windows: https://github.com/preshing/cairo-windows/ See how it's used in the Github Actions workflow for Windows here: https://github.com/Vindaar/ggplotnim/blob/master/.github/workflows/ci.yml#L61-L64

Personally I would recommend the last option. Note however that the standalone DLL is called =cairo.dll=, but =ggplotnim= expects the name =libcairo-2.dll=. I would recommend to put the DLL in some sane place and adding that location to your Windows PATH variable:

Simple text only instructions on how to do that: #+begin_quote

=Win= key
search for "path"
click on “edit system environment variables”
click on “Environment Variables” in the bottom right corner
under “System variables” select “PATH” and click edit
click “New” and add the full path to your installation location of choice that contains the now called =libcairo-2.dll= #+end_quote

After saving those changes and restarting PowerShell / the command prompt everything should work.

** Currently working features

Geoms:

=geom_point=
=geom_line=
=geom_histogram=
=geom_freqpoly=
=geom_bar=
=geom_errorbar=
=geom_linerange=
=geom_tile=
=geom_raster=
=geom_text=
=geom_ridgeline=
soon:
- =geom_density=

Facets:

=facet_wrap=

Scales:

size (both for discrete and continuous data)
color (both for discrete and continuous data) Shape as a scale is not properly implemented, simply because ginger only provides 2 (circle, cross) different marker shapes so far. Feel free to [[https://github.com/Vindaar/ginger/blob/master/src/ginger.nim#L2267-L2292][add more]]!

** Data frame

The library implements a naive dynamic and column based data frame. Each column is represented as a [[https://github.com/PMunch/nim-persistent-vector][persistent vector]] of =Values=. A =Value= is a variant object, similar to a =JsonNode= of the standard library.

NOTE: Due to the dynamic nature and naive implementations performance is not a priority. Heavy calculations should be done before creation of the data frame. Simple arithmetic, filtering, reducing etc. is the main aim.

UPDATE: the note above does not hold for the arraymancer backend data frame. That implementation is plenty fast (for simple operations it's faster than pandas!), see [[benchmarks/pandas_compare]] for a few numbers.

The data frame provides the "5 verbs" of [[https://dplyr.tidyverse.org/][dplyr]] and more. Main implemented functions:

=filter=
=mutate=, =transmute=
=select=, =rename=
=arrange=
=summarize=
=group_by=
=arrange=
=inner_join=
=set_diff=
=count=
=bind_rows=
=gather=
=unique=, which are all based on the =FormulaNode= object. Basically they all receive =varargs[FormulaNode]=, which is evaluated in context of the given dataframe. Other convenience procs

Creationg of a =FormulaNode= can be done either directly via untyped templates acting on =+=, =-=, =*=, =/=, ==. Using the =mpg= data set as an example: #+BEGIN_SRC nim let f = displ ~ hwy / cty #+END_SRC would describe the dependence of the displacement (=displ=) of the ratio of the highway to the freeway mpg. Echoeing this formula prints it as a lisp like tree: #+BEGIN_SRC ( displ (/ hwy cty)) #+END_SRC Note that the =~= in the untyped templates always acts as the root node of the resulting tree. The LHS of it is always considered the dependend quantity. In these templates however, the identifiers are converted to strings and must match the names in the data frame!

*** =f{}= macro to create formulas The second way to create a =FormulaNode= is via the =f{}= macro. This provides a little more flexibility: #+BEGIN_SRC nim let f = f{ "displ" ~ "hwy" / mean("cty") } #+END_SRC Note that here all keys must be explicit strings. Everything that is not a string, will be interepreted in the calling scope.

If the identifier is the first element of a =nnkCall=, e.g. as in =mean("cty")=, it will be stored in a =FormulaNode= of kind =fkFunction=. An =fkFunction= itself may contain two different kinds of functions, as evident by the implementation: #+BEGIN_SRC nim

storing a function to be applied to the data

fnName: string arg: FormulaNode case fnKind*: FuncKind of funcVector: fnV: proc(s: PersistentVector[Value]): Value res: Option[Value] # the result of fn(arg), so that we can cache it # instead of recalculating it for every index potentially of funcScalar: fnS: proc(s: Value): Value #+END_SRC We store the name of the function as a string for debugging and echoeing. The function must only take a single argument (this may be changed in the future / we may wrap a function with multiple arguments in a template in the future). It can either be a procedure taking a vector of =Values= corresponding to a proc working on a whole column as the input (e.g. =mean=) or a scalar function taking a single =Value= (e.g. =abs=). In the latter case the function is applied to each index of the key of the data frame given by =arg=.

Lifting templates are provided to lift any:

=liftVector[T]Proc=: =proc (s: seq[T]): T= proc to =proc(s: PersistentVector[Value]): Value=
=liftScalar[T]Proc=: =proc (s: T): T= proc to =proc(s: Value): Value= where =T= may be =float, int, string=.

The =PersistentVector= is an implementation detail of the data frame at the moment and may be changed back to =seq= soon.

On the other hand if an identifier is not part of a =nnkCall= it is interpreted as a variable declared in the calling scope and will be converted to a =Value= using =%= and stored as a =fkVariable=.

Literal interger and float values are also allowed.

Each formula can be evaluated using =evaluate= and =reduce=. The available procs have the following signature #+BEGIN_SRC nim

for formulas independent of DFs, e.g. `evaluate f{1 + 2} == %~ 3`

proc evaluate*(node: FormulaNode): Value

evaluate formula at row index `idx`. Possible calculation of a whole row

proc evaluate*(node: FormulaNode, data: DataFrame, idx: int): Value

reduce a DF to a single `Value` based on a formula `reduce(f{mean("someCol")}, df)`

proc reduce*(node: FormulaNode, data: DataFrame): Value

create new DF column based on formula and DF

proc evaluate*(node: FormulaNode, data: DataFrame): PersistentVector[Value] #+END_SRC

**** DF examples

Using a lifted vector valued function and local variables as keys and integer values: #+BEGIN_SRC nim let val = 1000 let key = "cty" let f = f{"cty_norm" ~ "cty" / mean(key) * val} #+END_SRC

Using a lifted scalar valued function and local variables as keys and float literal values for a random calculation: #+BEGIN_SRC nim let g = f{"cty_by_2ln_hwy" ~ "cty" / (ln("hwy") * 2)} #+END_SRC

** Examples

Consider looking at the [[file:recipes.org][recipes]] in addition to the below to get a fuller picture!

The following is a short example from the recipe section that shows multiple features:

parsing CSV files to a DF
performing DF operations using formulas (=f{}= syntax)
general =ggplot= functionality
composing multiple geoms to annotate specific datapoints

#+BEGIN_SRC nim import ggplotnim let df = toDf(readCsv("data/mpg.csv")) let dfMax = df.mutate(f{"mpgMean" ~ (cty + hwy) / 2.0}) .arrange("mpgMean") .tail(1) ggplot(df, aes("hwy", "displ")) + geom_point(aes(color = "cty")) + # set point specific color mapping

Add the annotation for the car model below the point

geom_text(data = dfMax, aes = aes(y = f{c"displ" - 0.2}, text = "model")) +

and add another annotation of the mean mpg above the point

geom_text(data = dfMax, aes = aes(y = f{c"displ" + 0.2}, text = "mpgMean")) + theme_opaque() + ggsave("media/recipes/rAnnotateMaxValues.png") #+END_SRC

[[./media/recipes/rAnnotateMaxValues.png]]

** Experimental Vega-Lite backend

From the beginning one of my goals for this library was to provide not only a Cairo backend, but also to support [[https://vega.github.io/vega-lite/][Vega-Lite]] (or possibly Vega) as a backend. To share plots and data online (and possibly add support for interactive features) is much easier in such a way.

An experimental version is implemented in [[https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/ggplot_vega.nim][ggplot_vega.nim]], which provides most functionality of the native backend, with the exception of support for facetted plots.

See the [[https://github.com/Vindaar/ggplotnim/blob/master/recipes.org#simple-vega-lite-example][full example in the recipe here]].

Creating a vega plot is done by also importing the =ggplot_vega= submodule and then just replacing a =ggsave= call by a =ggvega= call: #+begin_src nim import ggplotnim import ggplotnim/ggplot_vega let mpg = toDf(readCsv("data/mpg.csv")) ggplot(mpg, aes(x = "displ", y = "cty", color = "class")) + geom_point() + ggtitle("ggplotnim in Vega-Lite!") + ggvega("media/recipes/rSimpleVegaLite.html") # w/o arg creates a /tmp/vega_lite_plot.html #+end_src

This recipe gives us the following plot:

[[media/recipes/rSimpleVegaLite.png]]

To view it as an interactive plot in the Vega viewer, [[https://vega.github.io/editor/?#/gist/0bef3ed0cf7c6d26da927732f1c81582/rSimpleVegaLite.json][click here]].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 95

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (17) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Vindaar / Ggplotnim

Programming Languages

Labels

Projects that are alternatives of or similar to Ggplotnim

^--- type of the tensors involved on the RHS

^--- type of the resulting tensor (the new column asFloat)

storing a function to be applied to the data

for formulas independent of DFs, e.g. evaluate f{1 + 2} == %~ 3

evaluate formula at row index idx. Possible calculation of a whole row

reduce a DF to a single Value based on a formula reduce(f{mean("someCol")}, df)

create new DF column based on formula and DF

Add the annotation for the car model below the point

and add another annotation of the mean mpg above the point

^--- type of the resulting tensor (the new column `asFloat`)

for formulas independent of DFs, e.g. `evaluate f{1 + 2} == %~ 3`

evaluate formula at row index `idx`. Possible calculation of a whole row

reduce a DF to a single `Value` based on a formula `reduce(f{mean("someCol")}, df)`