All Projects → jmcarpenter2 → Swifter

jmcarpenter2 / Swifter

Licence: mit
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Swifter

Just Pandas Things
An ongoing list of pandas quirks
Stars: ✭ 660 (-64.21%)
Mutual labels:  pandas, pandas-dataframe
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-93.22%)
Mutual labels:  pandas, pandas-dataframe
Modin
Modin: Speed up your Pandas workflows by changing a single line of code
Stars: ✭ 6,639 (+260.03%)
Mutual labels:  pandas, modin
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (-68%)
Mutual labels:  pandas, pandas-dataframe
Gspread Dataframe
Read/write Google spreadsheets using pandas DataFrames
Stars: ✭ 118 (-93.6%)
Mutual labels:  pandas, pandas-dataframe
Sdc
Intel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-66.21%)
Mutual labels:  parallel-computing, pandas
Quickviz
Visualize a pandas dataframe in a few clicks
Stars: ✭ 18 (-99.02%)
Mutual labels:  pandas, pandas-dataframe
Prettypandas
A Pandas Styler class for making beautiful tables
Stars: ✭ 376 (-79.61%)
Mutual labels:  pandas, pandas-dataframe
10 Simple Hacks To Speed Up Your Data Analysis In Python
Some useful Tips and Tricks to speed up the data analysis process in Python.
Stars: ✭ 45 (-97.56%)
Mutual labels:  pandas, pandas-dataframe
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+351.68%)
Mutual labels:  pandas, pandas-dataframe
Pandera
A light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (-72.56%)
Mutual labels:  pandas, pandas-dataframe
Pymarketstore
Python driver for MarketStore
Stars: ✭ 74 (-95.99%)
Mutual labels:  pandas, pandas-dataframe
Dataframe Go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-73.59%)
Mutual labels:  pandas, pandas-dataframe
Df2gspread
Manage Google Spreadsheets in Pandas DataFrame with Python
Stars: ✭ 114 (-93.82%)
Mutual labels:  pandas, pandas-dataframe
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (-77.11%)
Mutual labels:  pandas, pandas-dataframe
Future
🚀 R package: future: Unified Parallel and Distributed Processing in R for Everyone
Stars: ✭ 735 (-60.14%)
Mutual labels:  parallel-computing, parallelization
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (-95.61%)
Mutual labels:  pandas-dataframe, pandas
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-85.2%)
Mutual labels:  pandas, pandas-dataframe
S3bp
Read and write Python objects to S3, caching them on your hard drive to avoid unnecessary IO.
Stars: ✭ 24 (-98.7%)
Mutual labels:  pandas, pandas-dataframe
Dask
Parallel computing with task scheduling
Stars: ✭ 9,309 (+404.83%)
Mutual labels:  pandas, dask

swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.

PyPI version CircleCI codecov Code style: black GitHub stars PyPI - Downloads

Blog posts

Documentation

To know about latest improvements, please check the changelog.

Further documentations on swifter is available here.

Check out the examples notebook, along with the speed benchmark notebook. The benchmarks are created using the library perfplot.

Installation:

$ pip install -U pandas # upgrade pandas
$ pip install swifter # first time installation
$ pip install swifter[modin-ray] # first time installation including modin[ray]
$ pip install swifter[modin-dask] # first time installation including modin[dask]

$ pip install -U swifter # upgrade to latest version if already installed

alternatively, to install on Anaconda:

conda install -c conda-forge swifter

...after installing, import swifter into your code along with pandas using:

import pandas as pd
import swifter

...alternatively, swifter can be used with modin dataframes in the same manner:

import modin.pandas as pd
import swifter

NOTE: if you import swifter before modin, you will have to additionally register modin: swifter.register_modin()

Easy to use

df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8]})

# runs on single core
df['x2'] = df['x'].apply(lambda x: x**2)
# runs on multiple cores
df['x2'] = df['x'].swifter.apply(lambda x: x**2)

# use swifter apply on whole dataframe
df['agg'] = df.swifter.apply(lambda x: x.sum() - x.min())

# use swifter apply on specific columns
df['outCol'] = df[['inCol1', 'inCol2']].swifter.apply(my_func)
df['outCol'] = df[['inCol1', 'inCol2', 'inCol3']].swifter.apply(my_func,
             positional_arg, keyword_arg=keyword_argval)

Vectorizes your function, when possible

Alt text Alt text

When vectorization is not possible, automatically decides which is faster: to use dask parallel processing or a simple pandas apply

Alt text Alt text

Notes

  1. The function is documented in the .py file. In Jupyter Notebooks, you can see the docs by pressing Shift+Tab(x3). Also, check out the complete documentation here along with the changelog.

  2. Please upgrade your version of pandas, as the pandas extension api used in this module is a recent addition to pandas.

  3. Import modin before importing swifter, if you wish to use modin with swifter. Otherwise, use swifter.register_modin() to access it.

  4. Do not use swifter to apply a function that modifies external variables. Under the hood, swifter does sample applies to optimize performance. These sample applies will modify the external variable in addition to the final apply. Thus, you will end up with an erroneously modified external variable.

  5. It is advised to disable the progress bar if calling swifter from a forked process as the progress bar may get confused between various multiprocessing modules.

  6. If swifter return is different than pandas try explicitly casting type e.g.: df.swifter.apply(lambda x: float(np.angle(x)))

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].