ritchie46 / Polars
Programming Languages
Labels
Projects that are alternatives of or similar to Polars
Polars
Blazingly fast DataFrames in Rust & Python
Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.
It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.
To learn more about the inner workings of Polars read the WIP book.
Python users read this!
Polars is currently transitioning from py-polars
to polars
. Some docs may still refer the old name.
We're working towards a new 0.7.0
release. For the
mean time install a pre-release version. This will likely be more stable than 0.6.7
.
Install the latest pre-release version with:
$ pip install polars==0.7.0-beta.2
Functionality | Eager | Lazy (DataFrame) | Lazy (Series) |
---|---|---|---|
Filters | ✔ | ✔ | ✔ |
Shifts | ✔ | ✔ | ✔ |
Joins | ✔ | ✔ | |
GroupBys + aggregations | ✔ | ✔ | |
Comparisons | ✔ | ✔ | ✔ |
Arithmetic | ✔ | ✔ | |
Sorting | ✔ | ✔ | ✔ |
Reversing | ✔ | ✔ | ✔ |
Closure application (User Defined Functions) | ✔ | ✔ | |
SIMD | ✔ | ✔ | |
Pivots | ✔ | ✗ | |
Melts | ✔ | ✗ | |
Filling nulls + fill strategies | ✔ | ✗ | ✔ |
Aggregations | ✔ | ✔ | ✔ |
Moving Window aggregates | ✔ | ✗ | ✗ |
Find unique values | ✔ | ✗ | |
Rust iterators | ✔ | ✔ | |
IO (csv, json, parquet, Arrow IPC | ✔ | ✗ | |
Query optimization: (predicate pushdown) | ✗ | ✔ | |
Query optimization: (projection pushdown) | ✗ | ✔ | |
Query optimization: (type coercion) | ✗ | ✔ | |
Query optimization: (simplify expressions) | ✗ | ✔ | |
Query optimization: (aggregate pushdown) | ✗ | ✔ |
Note that almost all eager operations supported by Eager on Series
/ChunkedArrays
can be used in Lazy via UDF's
Documentation
Want to know about all the features Polars support? Read the docs!
Rust
Python
- installation guide:
pip install polars
- the book
- Reference guide
Performance
Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.
Cargo Features
Additional cargo features:
-
temporal (default)
- Conversions between Chrono and Polars for temporal data
-
simd (nightly)
- SIMD operations
-
parquet
- Read Apache Parquet format
-
json
- Json serialization
-
ipc
- Arrow's IPC format serialization
-
random
- Generate array's with randomly sampled values
-
ndarray
- Convert from
DataFrame
tondarray
- Convert from
-
lazy
- Lazy api
-
strings
- String utilities for
Utf8Chunked
- String utilities for
-
object
- Support for generic ChunkedArray's called
ObjectChunked<T>
(generic overT
). These will downcastable from Series through the Any trait.
- Support for generic ChunkedArray's called
-
parallel
- ChunkedArrays can be used by rayon::par_iter()
-
[plain_fmt | pretty_fmt]
(mutually exclusive)- one of them should be chosen to fmt DataFrames.
pretty_fmt
can deal with overflowing cells and looks nicer but has more dependencies.plain_fmt (default)
is plain formatting.
- one of them should be chosen to fmt DataFrames.
Contribution
Want to contribute? Read our contribution guideline.
ENV vars
-
POLARS_PAR_SORT_BOUND
-> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows. -
POLARS_FMT_MAX_COLS
-> maximum number of columns shown when formatting DataFrames. -
POLARS_FMT_MAX_ROWS
-> maximum number of rows shown when formatting DataFrames. -
POLARS_TABLE_WIDTH
-> width of the tables used during DataFrame formatting. -
POLARS_MAX_THREADS
-> maximum number of threads used in join algorithm. Default is unbounded.
[Python] compile py-polars from source
If you want a bleeding edge release or maximal performance you should compile py-polars from source.
This can be done by going through the following steps in sequence:
- install the latest rust compiler
$ pip3 install maturin
$ cd py-polars && maturin develop --release
Note that the Rust crate implementing the Python bindings is called py-polars
to distinguish from the wrapped
Rust crate polars
itself. However, both the Python package and the Python module are named polars
, so you
can pip install polars
and import polars
(previously, these were called py-polars
and pypolars
).
Acknowledgements
Development of Polars is proudly powered by