All Projects → ritchie46 → Polars

ritchie46 / Polars

Licence: mit
Rust DataFrame library

Programming Languages

rust
11053 projects

Labels

Projects that are alternatives of or similar to Polars

Dataframe Go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-59.88%)
Mutual labels:  dataframe
Modin
Modin: Speed up your Pandas workflows by changing a single line of code
Stars: ✭ 6,639 (+446.87%)
Mutual labels:  dataframe
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (-23.48%)
Mutual labels:  dataframe
Sequoia
A股自动选股程序,实现了海龟交易法则、缠中说禅牛市买点,以及其他若干种技术形态
Stars: ✭ 564 (-53.54%)
Mutual labels:  dataframe
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-49.67%)
Mutual labels:  dataframe
Spark Redis
A connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (-36.33%)
Mutual labels:  dataframe
Awesome Cybersecurity Datasets
A curated list of amazingly awesome Cybersecurity datasets
Stars: ✭ 380 (-68.7%)
Mutual labels:  dataframe
Dframcy
Dataframe Integration with spaCy.
Stars: ✭ 74 (-93.9%)
Mutual labels:  dataframe
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (-46.71%)
Mutual labels:  dataframe
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-98.11%)
Mutual labels:  dataframe
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (-51.4%)
Mutual labels:  dataframe
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+345.8%)
Mutual labels:  dataframe
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (-31.8%)
Mutual labels:  dataframe
Spark Daria
Essential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (-54.45%)
Mutual labels:  dataframe
Pandas Ta
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators
Stars: ✭ 962 (-20.76%)
Mutual labels:  dataframe
Arquero
Query processing and transformation of array-backed data tables.
Stars: ✭ 384 (-68.37%)
Mutual labels:  dataframe
Vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Stars: ✭ 6,793 (+459.56%)
Mutual labels:  dataframe
Jardin
A pandas.DataFrame-based ORM.
Stars: ✭ 81 (-93.33%)
Mutual labels:  dataframe
Net.jgp.labs.spark
Apache Spark examples exclusively in Java
Stars: ✭ 55 (-95.47%)
Mutual labels:  dataframe
Foxcross
AsyncIO serving for data science models
Stars: ✭ 18 (-98.52%)
Mutual labels:  dataframe

Polars

rust docs Build and test Gitter

Blazingly fast DataFrames in Rust & Python

Polars is a blazingly fast DataFrames library implemented in Rust. Its memory model uses Apache Arrow as backend.

It currently consists of an eager API similar to pandas and a lazy API that is somewhat similar to spark. Amongst more, Polars has the following functionalities.

To learn more about the inner workings of Polars read the WIP book.

Python users read this!

Polars is currently transitioning from py-polars to polars. Some docs may still refer the old name. We're working towards a new 0.7.0 release. For the mean time install a pre-release version. This will likely be more stable than 0.6.7.

Install the latest pre-release version with: $ pip install polars==0.7.0-beta.2

Functionality Eager Lazy (DataFrame) Lazy (Series)
Filters
Shifts
Joins
GroupBys + aggregations
Comparisons
Arithmetic
Sorting
Reversing
Closure application (User Defined Functions)
SIMD
Pivots
Melts
Filling nulls + fill strategies
Aggregations
Moving Window aggregates
Find unique values
Rust iterators
IO (csv, json, parquet, Arrow IPC
Query optimization: (predicate pushdown)
Query optimization: (projection pushdown)
Query optimization: (type coercion)
Query optimization: (simplify expressions)
Query optimization: (aggregate pushdown)

Note that almost all eager operations supported by Eager on Series/ChunkedArrays can be used in Lazy via UDF's

Documentation

Want to know about all the features Polars support? Read the docs!

Rust

Python

Performance

Polars is written to be performant, and it is! But don't take my word for it, take a look at the results in h2oai's db-benchmark.

Cargo Features

Additional cargo features:

  • temporal (default)
    • Conversions between Chrono and Polars for temporal data
  • simd (nightly)
    • SIMD operations
  • parquet
    • Read Apache Parquet format
  • json
    • Json serialization
  • ipc
    • Arrow's IPC format serialization
  • random
    • Generate array's with randomly sampled values
  • ndarray
    • Convert from DataFrame to ndarray
  • lazy
    • Lazy api
  • strings
    • String utilities for Utf8Chunked
  • object
    • Support for generic ChunkedArray's called ObjectChunked<T> (generic over T). These will downcastable from Series through the Any trait.
  • parallel
    • ChunkedArrays can be used by rayon::par_iter()
  • [plain_fmt | pretty_fmt] (mutually exclusive)
    • one of them should be chosen to fmt DataFrames. pretty_fmt can deal with overflowing cells and looks nicer but has more dependencies. plain_fmt (default) is plain formatting.

Contribution

Want to contribute? Read our contribution guideline.

ENV vars

  • POLARS_PAR_SORT_BOUND -> Sets the lower bound of rows at which Polars will use a parallel sorting algorithm. Default is 1M rows.
  • POLARS_FMT_MAX_COLS -> maximum number of columns shown when formatting DataFrames.
  • POLARS_FMT_MAX_ROWS -> maximum number of rows shown when formatting DataFrames.
  • POLARS_TABLE_WIDTH -> width of the tables used during DataFrame formatting.
  • POLARS_MAX_THREADS -> maximum number of threads used in join algorithm. Default is unbounded.

[Python] compile py-polars from source

If you want a bleeding edge release or maximal performance you should compile py-polars from source.

This can be done by going through the following steps in sequence:

  1. install the latest rust compiler
  2. $ pip3 install maturin
  3. $ cd py-polars && maturin develop --release

Note that the Rust crate implementing the Python bindings is called py-polars to distinguish from the wrapped Rust crate polars itself. However, both the Python package and the Python module are named polars, so you can pip install polars and import polars (previously, these were called py-polars and pypolars).

Acknowledgements

Development of Polars is proudly powered by

Xomnia

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].