ajcr / Rolling
Programming Languages
Projects that are alternatives of or similar to Rolling
rolling
A collection of computationally efficient rolling window iterators and operations for Python.
This module implements useful arithmetical, logical and statistical functions on rolling/moving/sliding windows, including Sum, Min, Max, Median and Standard Deviation. There's also a flexible 'Apply' iterator whereby any function can be applied to the window. Both fixed-length and variable-length window iteration is supported.
To get started, see the Quickstart section below, or have a look at the some Recipes.
Overview
Suppose we have a list of integer values x
and we want to find the maximum of each window of 3 values. We could use Python's max()
function and write [max(x[i:i+3]) for i in range(len(x) - 2)]
to do this, for example.
But applying builtin Python's functions (like max()
and sum()
) to a window becomes increasingly slow as the window gets larger. The complexity is typically linear (i.e. O(k) where k is the size of the window).
For for many operations there are algorithms that return the value for the next window in sublinear time (e.g. O(log k)) or constant time (i.e. O(1)). The algorithms implemented so far in this module are summarised below:
Operation | Update | Memory | Comments |
---|---|---|---|
Sum | O(1) | O(k) | Sum of window values |
Product | O(1) | O(k) | Product of window values |
Nunique | O(1) | O(k) | Number of unique window values |
Mean | O(1) | O(k) | Arithmetic mean of window values |
Median | O(log k) | O(k) | Median, uses an indexable skiplist to maintain sorted order |
Mode | O(1) | O(k) | Set of most common values, tracked using a bi-directional counter |
Var | O(1) | O(k) | Variance, uses Welford's algorithm for better numerical stability |
Std | O(1) | O(k) | Standard deviation, uses Welford's algorithm |
Skew | O(1) | O(k) | Skewness of the window |
Kurtosis | O(1) | O(k) | Kurtosis of the window |
Any | O(1) | O(1) | True if any value in the window is True, else False |
All | O(1) | O(1) | True if all values in the window are True, else False |
Min | O(1) | O(k) | Minimum value, tracks ascending minima using a deque |
MinHeap | O(1) | O(k) | Minimum value, tracks ascending minima using a heap |
Max | O(1) | O(k) | Maximum value, tracks descending maxima using a deque |
Entropy | O(1) | O(k) | Shannon entropy of the window (for fixed-size windows only) |
See the References section below for more details about the algorithms and links to other resources.
Installation
You can install the latest release of the module using pip:
pip install rolling
Alternatively, you can install from source on GitHub to include the very latest changes. For example:
git clone https://github.com/ajcr/rolling.git
cd rolling/
pip install .
There are no external library dependencies for running this module.
The module is tested with Python 3.5 and above, and Python 3.4 is also known to work. Python 2 is not currently supported.
If you want to run the tests you'll need to install pytest. Once done, just run pytest
from the base directory.
Quickstart
Import the rolling
module:
>>> import rolling
Now suppose we have this list:
>>> counts = [1, 5, 2, 0, 3]
The rolling
module allows us to can create an iterator object over this list that performs a reduction operation for a given window size (we'll use a window size of 3 for this example).
Now let's create three iterator objects that roll over this list, each performing a different operation:
>>> r_sum = rolling.Sum(counts, 3)
>>> r_all = rolling.All(counts, 3)
>>> r_max = rolling.Max(counts, 3)
Here's the representation of the rolling.Sum
iterator object. Note that the window type is 'fixed' by default, meaning that only full windows of the specified size are computed (the window does not roll on/off the list):
>>> r_sum
Rolling(operation='Sum', window_size=3, window_type='fixed')
The result of iterating over each of these rolling iterator objects using list()
is shown below.
>>> list(r_sum)
[8, 7, 5] # i.e. [1+5+2, 5+2+0, 2+0+3]
>>> list(r_all)
[True, False, False]
>>> list(r_max)
[5, 5, 3]
As well as the built-in efficient algorithms provided by this module, any callable Python object can be applied to a rolling window using the Apply()
class. For instance, Python's tuple()
function:
>>> r_list = rolling.Apply(counts, 3, operation=tuple)
>>> list(r_list)
[(1, 5, 2), (5, 2, 0), (2, 0, 3)]
Variable-length windows can be specified using the window_type
argument. This allows windows smaller than the specified size to be evaluated at the beginning and end of the iterable. For instance:
>>> r_list = rolling.Apply([1, 5, 2, 0, 3], 3, operation=list, window_type='variable')
>>> list(r_list)
[[1],
[1, 5],
[1, 5, 2],
[5, 2, 0],
[2, 0, 3],
[0, 3],
[3]]
References and resources
Some rolling algorithms are widely known (e.g. 'Sum') and I am not sure which source to cite. Some algorithms I made up as I was putting the module together (e.g. 'Any', 'All'), but these are relatively simple and probably exist elsewhere.
Other rolling algorithms are very cleverly designed and I learned a lot by reading about them and seeing other peoples' implementations. Here are the main resources that I used:
-
Max and Min are implemented using the Ascending Minima and Descending Maxima algorithms described by Richard Harter here. This algorithm is also used in pandas and bottleneck. My attention was first drawn to this algorithm by Jaime Fernandez del Rio's excellent talk The Secret Life Of Rolling Pandas. The algorithm is also described by Keegan Carruthers-Smith here, along with code examples.
-
Median uses the indexable skiplist approach presented by Raymond Hettinger here.
-
Var and Std use Welford's algorithm. I referred to the rolling variance implementation in pandas as well as an older edit of the Wikipedia page Algorithms for calculating variance.
Discussion and future work
The algorithms implemented by this module are chosen to be efficient in the sense that the cost of computing each new return value scales efficiently with the size of window.
In practice you might find that it is quicker not to use the the 'efficient' algorithm, and instead apply a function to the window. This is especially true for very small window sizes where the cost of updating a window is relatively complex. For instance, while the window size k
is less than approximately 50, it may quicker to use rolling.Apply(array, k, min)
(apply Python's builtin minimum function min()
) rather than using rolling.Min(array, k)
.
With this in mind, in future it might be worth implementing some of the algorithms here in compiled code (e.g. as a C extension module, or using Cython) to improve speed.