All Projects → ajcr → Rolling

ajcr / Rolling

Licence: mit
Computationally efficient rolling window iterators for Python (including sum, min/max, median and more...)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rolling

Tstl
TypeScript-STL (Standard Template Library, migrated from C++)
Stars: ✭ 397 (+151.27%)
Mutual labels:  algorithm, iterator
Tbox
🎁 A glib-like multi-platform c library
Stars: ✭ 3,800 (+2305.06%)
Mutual labels:  algorithm, iterator
Competitive Programming
VastoLorde95's solutions to 2000+ competitive programming problems from various online judges
Stars: ✭ 147 (-6.96%)
Mutual labels:  algorithm
Jaro winkler
Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
Stars: ✭ 156 (-1.27%)
Mutual labels:  algorithm
Wechart
Create all the [ch]arts by cax or three.js - Cax 和 three.js 创造一切图[表]
Stars: ✭ 152 (-3.8%)
Mutual labels:  algorithm
Algorithm
算法和数据结构练习(Leetcode)
Stars: ✭ 148 (-6.33%)
Mutual labels:  algorithm
Ebooks
A repository for ebooks, including C, C plus plus, Linux Kernel, Compiler, OS, Algorithm, Security, Database, Network, ML and DL
Stars: ✭ 151 (-4.43%)
Mutual labels:  algorithm
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (-7.59%)
Mutual labels:  algorithm
C Plus Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
Stars: ✭ 17,151 (+10755.06%)
Mutual labels:  algorithm
Bytecount
Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Stars: ✭ 150 (-5.06%)
Mutual labels:  algorithm
Algorithms Js
Consumable Data Structures and Algorithms library in JavaScript
Stars: ✭ 155 (-1.9%)
Mutual labels:  algorithm
Tradzqai
Trading environnement for RL agents, backtesting and training.
Stars: ✭ 150 (-5.06%)
Mutual labels:  algorithm
Learn Python
Python Top 45 Articles of 2017
Stars: ✭ 148 (-6.33%)
Mutual labels:  algorithm
Algorithms
The codes and my solutions to exercises from the book "Algorithms" (4th edition) by Robert Sedgewick and Kevin Wayne.
Stars: ✭ 2,001 (+1166.46%)
Mutual labels:  algorithm
Data structures and algorithms in python
📖 Worked Solutions of "Data Structures & Algorithms in Python", written by Michael T. Goodrich, Roberto Tamassia and Michael H. Goldwasser. ✏️
Stars: ✭ 147 (-6.96%)
Mutual labels:  algorithm
Data Structures Algorithms
Your personal library of every algorithm and data structure code that you will ever encounter
Stars: ✭ 157 (-0.63%)
Mutual labels:  algorithm
Data Structure And Algorithms
Every thing related to data structure and algorithms.
Stars: ✭ 146 (-7.59%)
Mutual labels:  algorithm
Leetcode
LeetCode Problems' Solutions
Stars: ✭ 150 (-5.06%)
Mutual labels:  algorithm
Graphview
Flutter GraphView is used to display data in graph structures. It can display Tree layout, Directed and Layered graph. Useful for Family Tree, Hierarchy View.
Stars: ✭ 152 (-3.8%)
Mutual labels:  algorithm
Pythonrobotics
Python sample codes for robotics algorithms.
Stars: ✭ 13,934 (+8718.99%)
Mutual labels:  algorithm

rolling

PyPI version travis-ci codecov

A collection of computationally efficient rolling window iterators and operations for Python.

This module implements useful arithmetical, logical and statistical functions on rolling/moving/sliding windows, including Sum, Min, Max, Median and Standard Deviation. There's also a flexible 'Apply' iterator whereby any function can be applied to the window. Both fixed-length and variable-length window iteration is supported.

To get started, see the Quickstart section below, or have a look at the some Recipes.

Overview

Suppose we have a list of integer values x and we want to find the maximum of each window of 3 values. We could use Python's max() function and write [max(x[i:i+3]) for i in range(len(x) - 2)] to do this, for example.

But applying builtin Python's functions (like max() and sum()) to a window becomes increasingly slow as the window gets larger. The complexity is typically linear (i.e. O(k) where k is the size of the window).

For for many operations there are algorithms that return the value for the next window in sublinear time (e.g. O(log k)) or constant time (i.e. O(1)). The algorithms implemented so far in this module are summarised below:

Operation Update Memory Comments
Sum O(1) O(k) Sum of window values
Product O(1) O(k) Product of window values
Nunique O(1) O(k) Number of unique window values
Mean O(1) O(k) Arithmetic mean of window values
Median O(log k) O(k) Median, uses an indexable skiplist to maintain sorted order
Mode O(1) O(k) Set of most common values, tracked using a bi-directional counter
Var O(1) O(k) Variance, uses Welford's algorithm for better numerical stability
Std O(1) O(k) Standard deviation, uses Welford's algorithm
Skew O(1) O(k) Skewness of the window
Kurtosis O(1) O(k) Kurtosis of the window
Any O(1) O(1) True if any value in the window is True, else False
All O(1) O(1) True if all values in the window are True, else False
Min O(1) O(k) Minimum value, tracks ascending minima using a deque
MinHeap O(1) O(k) Minimum value, tracks ascending minima using a heap
Max O(1) O(k) Maximum value, tracks descending maxima using a deque
Entropy O(1) O(k) Shannon entropy of the window (for fixed-size windows only)

See the References section below for more details about the algorithms and links to other resources.

Installation

You can install the latest release of the module using pip:

pip install rolling

Alternatively, you can install from source on GitHub to include the very latest changes. For example:

git clone https://github.com/ajcr/rolling.git
cd rolling/
pip install .

There are no external library dependencies for running this module.

The module is tested with Python 3.5 and above, and Python 3.4 is also known to work. Python 2 is not currently supported.

If you want to run the tests you'll need to install pytest. Once done, just run pytest from the base directory.

Quickstart

Import the rolling module:

>>> import rolling

Now suppose we have this list:

>>> counts = [1, 5, 2, 0, 3]

The rolling module allows us to can create an iterator object over this list that performs a reduction operation for a given window size (we'll use a window size of 3 for this example).

Now let's create three iterator objects that roll over this list, each performing a different operation:

>>> r_sum = rolling.Sum(counts, 3)
>>> r_all = rolling.All(counts, 3)
>>> r_max = rolling.Max(counts, 3)

Here's the representation of the rolling.Sum iterator object. Note that the window type is 'fixed' by default, meaning that only full windows of the specified size are computed (the window does not roll on/off the list):

>>> r_sum
Rolling(operation='Sum', window_size=3, window_type='fixed')

The result of iterating over each of these rolling iterator objects using list() is shown below.

>>> list(r_sum)
[8, 7, 5] # i.e. [1+5+2, 5+2+0, 2+0+3]

>>> list(r_all)
[True, False, False]

>>> list(r_max)
[5, 5, 3]

As well as the built-in efficient algorithms provided by this module, any callable Python object can be applied to a rolling window using the Apply() class. For instance, Python's tuple() function:

>>> r_list = rolling.Apply(counts, 3, operation=tuple)
>>> list(r_list)
[(1, 5, 2), (5, 2, 0), (2, 0, 3)]

Variable-length windows can be specified using the window_type argument. This allows windows smaller than the specified size to be evaluated at the beginning and end of the iterable. For instance:

>>> r_list = rolling.Apply([1, 5, 2, 0, 3], 3, operation=list, window_type='variable')
>>> list(r_list)
[[1],
 [1, 5],
 [1, 5, 2],
 [5, 2, 0],
 [2, 0, 3],
 [0, 3],
 [3]]

References and resources

Some rolling algorithms are widely known (e.g. 'Sum') and I am not sure which source to cite. Some algorithms I made up as I was putting the module together (e.g. 'Any', 'All'), but these are relatively simple and probably exist elsewhere.

Other rolling algorithms are very cleverly designed and I learned a lot by reading about them and seeing other peoples' implementations. Here are the main resources that I used:

  • Max and Min are implemented using the Ascending Minima and Descending Maxima algorithms described by Richard Harter here. This algorithm is also used in pandas and bottleneck. My attention was first drawn to this algorithm by Jaime Fernandez del Rio's excellent talk The Secret Life Of Rolling Pandas. The algorithm is also described by Keegan Carruthers-Smith here, along with code examples.

  • Median uses the indexable skiplist approach presented by Raymond Hettinger here.

  • Var and Std use Welford's algorithm. I referred to the rolling variance implementation in pandas as well as an older edit of the Wikipedia page Algorithms for calculating variance.

Discussion and future work

The algorithms implemented by this module are chosen to be efficient in the sense that the cost of computing each new return value scales efficiently with the size of window.

In practice you might find that it is quicker not to use the the 'efficient' algorithm, and instead apply a function to the window. This is especially true for very small window sizes where the cost of updating a window is relatively complex. For instance, while the window size k is less than approximately 50, it may quicker to use rolling.Apply(array, k, min) (apply Python's builtin minimum function min()) rather than using rolling.Min(array, k).

With this in mind, in future it might be worth implementing some of the algorithms here in compiled code (e.g. as a C extension module, or using Cython) to improve speed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].