dcherian / flox

Licence: Apache-2.0 License

Fast & furious GroupBy operations for dask.array

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to flox

Xarray

N-D labeled arrays and datasets in Python

Stars: ✭ 2,353 (+5502.38%)

Mutual labels: xarray, dask

xarray-beam

Distributed Xarray with Apache Beam

Stars: ✭ 83 (+97.62%)

Mutual labels: xarray, dask

esmlab

Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️

Stars: ✭ 23 (-45.24%)

Mutual labels: xarray, dask

aospy

Python package for automated analysis and management of gridded climate data

Stars: ✭ 80 (+90.48%)

Mutual labels: xarray

dvc dask use case

A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.

Stars: ✭ 22 (-47.62%)

Mutual labels: dask

spyndex

Awesome Spectral Indices in Python.

Stars: ✭ 56 (+33.33%)

Mutual labels: xarray

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (+57.14%)

Mutual labels: map-reduce

FoldsCUDA.jl

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

Stars: ✭ 48 (+14.29%)

Mutual labels: map-reduce

gcpy

Python toolkit for GEOS-Chem.

Stars: ✭ 34 (-19.05%)

Mutual labels: xarray

climate system

Notes and practicals for my "Physics of the Climate System" lecture

Stars: ✭ 13 (-69.05%)

Mutual labels: xarray

future.mapreduce

[EXPERIMENTAL] R package: future.mapreduce - Utility Functions for Future Map-Reduce API Packages

Stars: ✭ 12 (-71.43%)

Mutual labels: map-reduce

xpublish

Publish Xarray Datasets via a REST API.

Stars: ✭ 86 (+104.76%)

Mutual labels: xarray

php-uavt-adreskodu-botu

Php ile uavt adres kodu botu

Stars: ✭ 2 (-95.24%)

Mutual labels: dask

clisops

Climate Simulation Operations

Stars: ✭ 17 (-59.52%)

Mutual labels: xarray

hypothesis-gufunc

Extension to hypothesis for testing numpy general universal functions

Stars: ✭ 32 (-23.81%)

Mutual labels: xarray

madpy-dask

MadPy Dask talk materials

Stars: ✭ 33 (-21.43%)

Mutual labels: dask

restee

Python package to call processed EE objects via the REST API to local data

Stars: ✭ 26 (-38.1%)

Mutual labels: xarray

arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.

Stars: ✭ 33 (-21.43%)

Mutual labels: dask

open-soql

Open source implementation of the SOQL.

Stars: ✭ 15 (-64.29%)

Mutual labels: map-reduce

dask-rasterio

Read and write rasters in parallel using Rasterio and Dask

Stars: ✭ 82 (+95.24%)

Mutual labels: dask

View All Similar Projects ➔

flox

This project explores strategies for fast GroupBy reductions with dask.array. It used to be called dask_groupby It was motivated by

Dask Dataframe GroupBy blogpost
numpy_groupies in Xarray issue

(See a presentation about this package, from the Pangeo Showcase).

Acknowledgements

This work was funded in part by NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System Data in the Cloud" (PI J. Hamman), and NCAR's Earth System Data Science Initiative. It was motivated by very very many discussions in the Pangeo community.

API

There are two main functions

flox.groupby_reduce(dask_array, by_dask_array, "mean") "pure" dask array interface
flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean") "pure" xarray interface; though work is ongoing to integrate this package in xarray.

Implementation

See the documentation for details on the implementation.

Custom reductions

flox implements all common reductions provided by numpy_groupies in aggregations.py. It also allows you to specify a custom Aggregation (again inspired by dask.dataframe), though this might not be fully functional at the moment. See aggregations.py for examples.

    mean = Aggregation(
        # name used for dask tasks
        name="mean",
        # operation to use for pure-numpy inputs
        numpy="mean",
        # blockwise reduction
        chunk=("sum", "count"),
        # combine intermediate results: sum the sums, sum the counts
        combine=("sum", "sum"),
        # generate final result as sum / count
        finalize=lambda sum_, count: sum_ / count,
        # Used when "reindexing" at combine-time
        fill_value=0,
        # Used when any member of `expected_groups` is not found
        final_fill_value=np.nan,
    )

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dcherian / flox

Programming Languages

Labels

Projects that are alternatives of or similar to flox

flox

Acknowledgements

API

Implementation

Custom reductions