All Projects → pola-rs → polars

pola-rs / polars

Licence: MIT license
Fast multi-threaded DataFrame library in Rust | Python | Node.js

Programming Languages

rust
11053 projects
python
139335 projects - #7 most used programming language
typescript
32286 projects
javascript
184084 projects - #8 most used programming language
Makefile
30231 projects
r
7636 projects

Projects that are alternatives of or similar to polars

heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (-99.62%)
Mutual labels:  dataframe, dataframes, dataframe-library
bow
Go data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-99.69%)
Mutual labels:  arrow, dataframe
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (-62.94%)
Mutual labels:  arrow, dataframe
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (-99.21%)
Mutual labels:  dataframe, dataframe-library
DataFrame
DataFrame Library for Java
Stars: ✭ 51 (-99.2%)
Mutual labels:  dataframe, dataframe-library
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-90.41%)
Mutual labels:  arrow, dataframe
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (-64.29%)
Mutual labels:  arrow, dataframe
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-99.56%)
Mutual labels:  dataframe, dataframes
woodwork
Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
Stars: ✭ 97 (-98.48%)
Mutual labels:  dataframe, dataframes
avit-da2k
💲 oh-my-zsh theme based on avit theme
Stars: ✭ 15 (-99.76%)
Mutual labels:  arrow
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (-84.77%)
Mutual labels:  dataframe
tooltip
[DEPRECATED] The tooltip that has all the right moves
Stars: ✭ 133 (-97.91%)
Mutual labels:  arrow
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-99.51%)
Mutual labels:  dataframe
Pointy
A jQuery plugin that dynamically points one element at another ~
Stars: ✭ 25 (-99.61%)
Mutual labels:  arrow
tablecloth
Dataset manipulation library built on the top of tech.ml.dataset
Stars: ✭ 167 (-97.38%)
Mutual labels:  dataframe
arrow-site
Mirror of Apache Arrow site
Stars: ✭ 16 (-99.75%)
Mutual labels:  arrow
hood
The plugin to manage benchmarks on your CI
Stars: ✭ 17 (-99.73%)
Mutual labels:  arrow
spark-vcf
Spark VCF data source implementation for Dataframes
Stars: ✭ 15 (-99.76%)
Mutual labels:  dataframe
torch-dataframe
Utility class to manipulate dataset from CSV file
Stars: ✭ 67 (-98.95%)
Mutual labels:  dataframe
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-99.26%)
Mutual labels:  dataframes

Polars

rust docs Build and test PyPI Latest Release NPM Latest Release

Python Documentation | Rust Documentation | User Guide | Discord | StackOverflow

Blazingly fast DataFrames in Rust, Python & Node.js

Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.

  • Lazy | eager execution
  • Multi-threaded
  • SIMD
  • Query optimization
  • Powerful expression API
  • Rust | Python | ...

To learn more, read the User Guide.

>>> import polars as pl
>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )

# embarrassingly parallel execution
# very expressive query language
>>> (
...     df
...     .sort("fruits")
...     .select(
...         [
...             "fruits",
...             "cars",
...             pl.lit("fruits").alias("literal_string_fruits"),
...             pl.col("B").filter(pl.col("cars") == "beetle").sum(),
...             pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"),     # groups by "cars"
...             pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),                         # groups by "fruits"
...             pl.col("A").reverse().over("fruits").alias("rev_A_by_fruits"),                     # groups by "fruits
...             pl.col("A").sort_by("B").over("fruits").alias("sort_A_by_B_by_fruits"),            # groups by "fruits"
...         ]
...     )
... )
shape: (5, 8)
┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ fruitscarsliteral_striBsum_A_by_casum_A_by_frrev_A_by_frsort_A_by_B │
│ ------ng_fruits---rsuitsuits_by_fruits  │
│ strstr---i64------------         │
│          ┆          ┆ str          ┆     ┆ i64i64i64i64         │
╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ "apple""beetle""fruits"114744           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "apple""beetle""fruits"114733           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""beetle""fruits"114855           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""audi""fruits"112822           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana""beetle""fruits"114811           │
└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘

Performance 🚀🚀

Polars is very fast, and in fact is one of the best performing solutions available. See the results in h2oai's db-benchmark.

Python setup

Install the latest polars version with:

$ pip3 install -U polars[pyarrow]

Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea.

Rust setup

You can take latest release from crates.io, or if you want to use the latest features / performance improvements point to the master branch of this repo.

polars = { git = "https://github.com/pola-rs/polars", rev = "<optional git tag>" }

Rust version

Required Rust version >=1.58

Documentation

Want to know about all the features Polars supports? Read the docs!

Python

Rust

Node

Contribution

Want to contribute? Read our contribution guideline.

[Python]: compile polars from source

If you want a bleeding edge release or maximal performance you should compile polars from source.

This can be done by going through the following steps in sequence:

  1. Install the latest Rust compiler
  2. Install maturin: $ pip3 install maturin
  3. Choose any of:
    • Fastest binary, very long compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C target-cpu=native" --release
    • Fast binary, Shorter compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C codegen-units=16 -C lto=thin -C target-cpu=native" --release

Note that the Rust crate implementing the Python bindings is called py-polars to distinguish from the wrapped Rust crate polars itself. However, both the Python package and the Python module are named polars, so you can pip install polars and import polars.

Arrow2

Polars has transitioned to arrow2. Arrow2 is a faster and safer implementation of the Apache Arrow Columnar Format. Arrow2 also has a more granular code base, helping to reduce the compiler bloat.

Use custom Rust function in python?

See this example.

Going big...

Do you expect more than 2^32 ~4,2 billion rows? Compile polars with the bigidx feature flag.

Or for python users install $ pip install -U polars-u64-idx.

Don't use this unless you hit the row boundary as the default polars is faster and consumes less memory.

Acknowledgements

Development of Polars is proudly powered by

Xomnia

Sponsors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].