All Projects → jkrumbiegel → Chain.jl

jkrumbiegel / Chain.jl

Licence: mit
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

Programming Languages

julia
2034 projects
macro
33 projects

Projects that are alternatives of or similar to Chain.jl

Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-33.05%)
Mutual labels:  data-science, data-analysis, pipeline
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+4068.64%)
Mutual labels:  data-science, data-analysis, pipeline
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1033.9%)
Mutual labels:  data-science, data-analysis
Blurr
Data transformations for the ML era
Stars: ✭ 96 (-18.64%)
Mutual labels:  data-science, pipeline
Ai Expert Roadmap
Roadmap to becoming an Artificial Intelligence Expert in 2021
Stars: ✭ 15,441 (+12985.59%)
Mutual labels:  data-science, data-analysis
Loandefault Prediction
Lending Club Loan data analysis
Stars: ✭ 113 (-4.24%)
Mutual labels:  data-science, data-analysis
Fklearn
fklearn: Functional Machine Learning
Stars: ✭ 1,305 (+1005.93%)
Mutual labels:  data-science, data-analysis
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1468.64%)
Mutual labels:  data-science, data-analysis
Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+949.15%)
Mutual labels:  data-science, data-analysis
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-3.39%)
Mutual labels:  data-science, data-analysis
Scikit Learn
scikit-learn: machine learning in Python
Stars: ✭ 48,322 (+40850.85%)
Mutual labels:  data-science, data-analysis
Ml Da Coursera Yandex Mipt
Machine Learning and Data Analysis Coursera Specialization from Yandex and MIPT
Stars: ✭ 108 (-8.47%)
Mutual labels:  data-science, data-analysis
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-4.24%)
Mutual labels:  data-science, data-analysis
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+1002.54%)
Mutual labels:  data-science, pipeline
Bayesian Cognitive Modeling In Pymc3
PyMC3 codes of Lee and Wagenmakers' Bayesian Cognitive Modeling - A Pratical Course
Stars: ✭ 93 (-21.19%)
Mutual labels:  data-science, data-analysis
Flyte
Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+952.54%)
Mutual labels:  data-science, data-analysis
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-9.32%)
Mutual labels:  data-science, data-analysis
Xda
R package for exploratory data analysis
Stars: ✭ 112 (-5.08%)
Mutual labels:  data-science, data-analysis
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+941.53%)
Mutual labels:  data-science, data-analysis
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+36030.51%)
Mutual labels:  data-science, data-analysis

Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

Chain.jl Base Julia
@chain df begin
  dropmissing
  filter(:id => >(6), _)
  groupby(:group)
  combine(:age => sum)
end
df |>
  dropmissing |>
  x -> filter(:id => >(6), x) |>
  x -> groupby(x, :group) |>
  x -> combine(x, :age => sum)
Pipe.jl Lazy.jl
@pipe df |>
  dropmissing |>
  filter(:id => >(6), _)|>
  groupby(_, :group) |>
  combine(_, :age => sum)
@> df begin
  dropmissing
  x -> filter(:id => >(6), x)
  groupby(:group)
  combine(:age => sum)
end

Build Status

Run tests

Summary

Chain.jl defines the @chain macro. It takes a start value and a begin ... end block of expressions.

The result of each expression is fed into the next one using one of two rules:

  1. There is at least one underscore in the expression
  • every _ is replaced with the result of the previous expression
  1. There is no underscore
  • the result of the previous expression is used as the first argument in the current expression, as long as it is a function call, a macro call or a symbol representing a function.

Lines that are prefaced with @aside are executed, but their result is not fed into the next pipeline step. This is very useful to inspect pipeline state during debugging, for example.

Motivation

  • The implicit first argument insertion is useful for many data pipeline scenarios, like groupby, transform and combine in DataFrames.jl
  • The _ syntax is there to either increase legibility or to use functions like filter or map which need the previous result as the second argument
  • There is no need to type |> over and over
  • Any line can be commented out or in without breaking syntax, there is no problem with dangling |> symbols
  • The state of the pipeline can easily be checked with the @aside macro
  • The begin ... end block marks very clearly where the macro is applied and works well with auto-indentation
  • Because everything is just lines with separate expressions and not one huge function call, IDEs can show exactly in which line errors happened
  • Pipe is a name defined by Base Julia which can lead to conflicts

Example

An example with a DataFrame:

using DataFrames, Chain

df = DataFrame(group = [1, 2, 1, 2, missing], weight = [1, 3, 5, 7, missing])

result = @chain df begin
    dropmissing
    filter(r -> r.weight < 6, _)
    groupby(:group)
    combine(:weight => sum => :total_weight)
end

The pipeless block is equivalent to this:

result = let
    var1 = dropmissing(df)
    var2 = filter(r -> r.weight < 6, var1)
    var3 = groupby(var2, :group)
    var4 = combine(var3, :weight => sum => :total_weight)
end

Alternative one-argument syntax

If your initial argument name is long and / or the chain's result is assigned to a long variable, it can look cleaner if the initial value is moved into the chain. Here is such a long expression:

a_long_result_variable_name = @chain a_long_input_variable_name begin
    do_something
	do_something_else(parameter)
    do_other_thing(parameter, _)
end

This is equivalent to the following expression:

a_long_result_variable_name = @chain begin
    a_long_input_variable_name
    do_something
	do_something_else(parameter)
    do_other_thing(parameter, _)
end

One-liner syntax

You can also use @chain as a one-liner, where no begin-end block is necessary. This works well for short sequences that are still easy to parse visually without being on separate lines.

@chain 1:10 filter(isodd, _) sum sqrt

The @aside macro

For debugging, it's often useful to look at values in the middle of a pipeline. You can use the @aside macro to mark expressions that should not pass on their result. For these expressions there is no implicit first argument spliced in if there is no _, because that would be impractical for most purposes.

If for example, we wanted to know how many groups were created after step 3, we could do this:

result = @chain df begin
    dropmissing
    filter(r -> r.weight < 6, _)
    groupby(:group)
    @aside println("There are $(length(_)) groups after step 3.")
    combine(:weight => sum => :total_weight)
end

Which is again equivalent to this:

result = let
    var1 = dropmissing(df)
    var2 = filter(r -> r.weight < 6, var1)
    var3 = groupby(var2, :group)
    println("There are $(length(var3)) groups after step 3.")
    var4 = combine(var3, :weight => sum => :total_weight)
end

Nested Chains

The @chain macro replaces all underscores in the following block, unless it encounters another @chain macrocall. In that case, the only underscore that is still replaced by the outer macro is the first argument of the inner @chain. You can use this, for example, in combination with the @aside macro if you need to process a side result further.

@chain df begin
    dropmissing
    filter(r -> r.weight < 6, _)
    @aside @chain _ begin
            select(:group)
            CSV.write("filtered_groups.csv", _)
        end
    groupby(:group)
    combine(:weight => sum => :total_weight)
end

Rewriting Rules

Here is a list of equivalent expressions, where _ is replaced by prev and the new variable is next. In reality, each new variable simply gets a new name via gensym, which is guaranteed not to conflict with anything else.

Before After Comment
sum next = sum(prev) Symbol gets expanded into function call
sum() next = sum(prev) First argument is inserted
sum(_) next = sum(prev) Call expression gets _ replaced
_ + 3 next = prev + 3 Infix call expressions work the same way as other calls
+(3) next = prev + 3 Infix notation with _ would look better, but this is also possible
1 + 2 next = prev + 1 + 2 This might feel weird, but 1 + 2 is a normal call expression
filter(isodd, _) next = filter(isodd, prev) Underscore can go anywhere
@aside println(_) println(prev) println without affecting the pipeline; using _
@aside println("hello") println("hello") println without affecting the pipeline; no implicit first arg
@. sin next = sin.(prev) Special-cased alternative to sin.()
sin.() next = sin.(prev) First argument is prepended for broadcast calls as well
somefunc.(x) next = somefunc.(prev, x) First argument is prepended for broadcast calls as well
@somemacro next = @somemacro(prev) Macro calls without arguments get an argument spliced in
@somemacro(x) next = @somemacro(prev, x) First argument splicing is the same as with functions
@somemacro(x, _) next = @somemacro(x, prev) Also underscore behavior
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].