Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → pdpipe → Pdpipe

pdpipe / Pdpipe

Licence: other

Easy pipelines for pandas DataFrames.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

data-science data pandas pipeline dataframe pandas-dataframe

Projects that are alternatives of or similar to Pdpipe

Dataframe Go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Stars: ✭ 487 (-17.46%)

Mutual labels: dataframe, data-science, pandas, pandas-dataframe

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (-53.73%)

Mutual labels: data-science, pandas, pandas-dataframe, data

Datasheets

Read data from, write data to, and modify the formatting of Google Sheets

Stars: ✭ 593 (+0.51%)

Mutual labels: dataframe, data-science, pandas, data

Gspread Pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Stars: ✭ 226 (-61.69%)

Mutual labels: data-science, pandas, data

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+733.73%)

Mutual labels: data-science, pipeline, data

Data Science Projects With Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

Stars: ✭ 198 (-66.44%)

Mutual labels: data-science, pandas, pandas-dataframe

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (+9.66%)

Mutual labels: dataframe, pandas, data

Dataframe

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Stars: ✭ 828 (+40.34%)

Mutual labels: dataframe, data-science, pandas

Foxcross

AsyncIO serving for data science models

Stars: ✭ 18 (-96.95%)

Mutual labels: dataframe, data-science, pandas

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+415.93%)

Mutual labels: dataframe, data-science, pandas

Danfojs

danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

Stars: ✭ 1,304 (+121.02%)

Mutual labels: dataframe, data-science, pandas

skippa

SciKIt-learn Pipeline in PAndas

Stars: ✭ 33 (-94.41%)

Mutual labels: pipeline, pandas-dataframe, pandas

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (-75.08%)

Mutual labels: data-science, pandas, data

Rightmove webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

Stars: ✭ 125 (-78.81%)

Mutual labels: data-science, pandas, pandas-dataframe

Sweetviz

Visualize and compare datasets, target values and associations, with one line of code.

Stars: ✭ 1,851 (+213.73%)

Mutual labels: data-science, pandas, pandas-dataframe

10 Simple Hacks To Speed Up Your Data Analysis In Python

Some useful Tips and Tricks to speed up the data analysis process in Python.

Stars: ✭ 45 (-92.37%)

Mutual labels: data-science, pandas, pandas-dataframe

Just Pandas Things

An ongoing list of pandas quirks

Stars: ✭ 660 (+11.86%)

Mutual labels: data-science, pandas, pandas-dataframe

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+1311.69%)

Mutual labels: data-science, pandas, pandas-dataframe

Boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (-96.1%)

Mutual labels: dataframe, data-science, pandas

Pandasvault

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

Stars: ✭ 316 (-46.44%)

Mutual labels: dataframe, data-science, pandas

View All Similar Projects ➔

pdpipe ˨ ########

Easy pipelines for pandas DataFrames (learn how! <https://tirthajyoti.github.io/Notebooks/Pandas-pipeline-with-pdpipe>_).

Website: https://pdpipe.github.io/pdpipe/ <https://pdpipe.github.io/pdpipe/>_

Documentation: https://pdpipe.github.io/pdpipe/doc/pdpipe/ <https://pdpipe.github.io/pdpipe/doc/pdpipe/>_

.. code-block:: python

df = pd.DataFrame( data=[[4, 165, 'USA'], [2, 180, 'UK'], [2, 170, 'Greece']], index=['Dana', 'Jane', 'Nick'], columns=['Medals', 'Height', 'Born'] ) import pdpipe as pdp pipeline = pdp.ColDrop('Medals').OneHotEncode('Born') pipeline(df) Height Born_UK Born_USA Dana 165 0 1 Jane 180 1 0 Nick 170 0 0

.. .. alternative symbols: ˨ ᛪ ᛢ ᚶ ᚺ ↬ ⑀ ⤃ ⤳ ⥤ 』

.. contents::

.. section-numbering::

Documentation

This is the repository of the pdpipe package, and this readme file is aimed to help potential contributors to the project.

To learn more about how to use pdpipe, either visit pdpipe's homepage <https://pdpipe.github.io/pdpipe/>_ or read the online documentation of pdpipe <https://pdpipe.github.io/pdpipe/doc/pdpipe/>_.

Installation

Install pdpipe with:

.. code-block:: bash

pip install pdpipe

Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn <http://scikit-learn.org/stable/install.html>_.

Similarly, some pipeline stages require nltk; they will simply not be loaded if nltk is not found on your system, and pdpipe will issue a warning. To use them you must additionally install nltk <http://www.nltk.org/install.html>_.

Contributing

Package author and current maintainer is Shay Palachy <http://www.shaypalachy.com/>_ ([email protected]); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other pipeline stages can be added.

Installing for development

Clone:

.. code-block:: bash

git clone [email protected]:pdpipe/pdpipe.git

Install in development mode with test dependencies:

.. code-block:: bash

cd pdpipe pip install -e ".[test]"

Running the tests

To run the tests, use:

.. code-block:: bash

python -m pytest

Notice pytest runs are configured by the pytest.ini file. Read it to understand the exact pytest arguments used.

Adding tests

At the time of writing, pdpipe is maintained with a test coverage of 100%. Although challenging, I hope to maintain this status. If you add code to the package, please make sure you thoroughly test it. Codecov automatically reports changes in coverage on each PR, and so PR reducing test coverage will not be examined before that is fixed.

Tests reside under the tests directory in the root of the repository. Each module has a separate test folder, with each class - usually a pipeline stage - having a dedicated file (always starting with the string "test") containing several tests (each a global function starting with the string "test"). Please adhere to this structure, and try to separate tests cases to different test functions; this allows us to quickly focus on problem areas and use cases. Thank you! :)

Code style

pdpip code is written to adhere to the coding style dictated by flake8 <http://flake8.pycqa.org/en/latest/>. Practically, this means that one of the jobs that runs on the project's Travis <https://travis-ci.org/pdpipe/pdpipe> for each commit and pull request checks for a successfull run of the flake8 CLI command in the repository's root. Which means pull requests will be flagged red by the Travis bot if non-flake8-compliant code was added.

To solve this, please run flake8 on your code (whether through your text editor/IDE or using the command line) and fix all resulting errors. Thank you! :)

Adding documentation

This project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow these conventions.

.. _numpy docstring conventions: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard .. _these conventions: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

Adding doctests

Please notice that for pdoc3 - the Python package used to generate the html documentation files for pdpipe - to successfully include doctests in the generated documentation files, the whole doctest must be indented in relation to the opening multi-string indentation, like so:

.. code-block:: python

class ApplyByCols(PdPipelineStage):
    """A pipeline stage applying an element-wise function to columns.

    Parameters
    ----------
    columns : str or list-like
        Names of columns on which to apply the given function.
    func : function
        The function to be applied to each element of the given columns.
    result_columns : str or list-like, default None
        The names of the new columns resulting from the mapping operation. Must
        be of the same length as columns. If None, behavior depends on the
        drop parameter: If drop is True, the name of the source column is used;
        otherwise, the name of the source column is used with the suffix
        '_app'.
    drop : bool, default True
        If set to True, source columns are dropped after being mapped.
    func_desc : str, default None
        A function description of the given function; e.g. 'normalizing revenue
        by company size'. A default description is used if None is given.


    Example
    -------
        >>> import pandas as pd; import pdpipe as pdp; import math;
        >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]]
        >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"])
        >>> round_ph = pdp.ApplyByCols("ph", math.ceil)
        >>> round_ph(df)
           ph  lbl
        1   4  acd
        2   8  alk
        3  13  alk
    """

Credits

Created by Shay Palachy ([email protected]).

.. alternative: .. https://badge.fury.io/py/yellowbrick.svg

.. |PyPI-Status| image:: https://img.shields.io/pypi/v/pdpipe.svg :target: https://pypi.org/project/pdpipe

.. |PyPI-Versions| image:: https://img.shields.io/pypi/pyversions/pdpipe.svg :target: https://pypi.org/project/pdpipe

.. |Build-Status| image:: https://travis-ci.org/pdpipe/pdpipe.svg?branch=master :target: https://travis-ci.org/pdpipe/pdpipe

.. |LICENCE| image:: https://img.shields.io/badge/License-MIT-ff69b4.svg :target: https://pypi.python.org/pypi/pdpipe

.. .. |LICENCE| image:: https://github.com/shaypal5/pdpipe/blob/master/mit_license_badge.svg :target: https://pypi.python.org/pypi/pdpipe

.. https://img.shields.io/pypi/l/pdpipe.svg

.. |Codecov| image:: https://codecov.io/github/pdpipe/pdpipe/coverage.svg?branch=master :target: https://codecov.io/github/pdpipe/pdpipe?branch=master

.. |Codacy| image:: https://api.codacy.com/project/badge/Grade/7d605e063f114ecdb5569266bd0226cd :alt: Codacy Badge :target: https://app.codacy.com/app/shaypal5/pdpipe?utm_source=github.com&utm_medium=referral&utm_content=shaypal5/pdpipe&utm_campaign=Badge_Grade_Dashboard

.. |Requirements| image:: https://requires.io/github/shaypal5/pdpipe/requirements.svg?branch=master :target: https://requires.io/github/shaypal5/pdpipe/requirements/?branch=master :alt: Requirements Status

.. |Downloads| image:: https://pepy.tech/badge/pdpipe :target: https://pepy.tech/project/pdpipe :alt: PePy stats

.. |Codefactor| image:: https://www.codefactor.io/repository/github/pdpipe/pdpipe/badge?style=plastic :target: https://www.codefactor.io/repository/github/pdpipe/pdpipe :alt: Codefactor code quality

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 590

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗