Alternatives and detailed information of preprocessy

preprocessy / preprocessy

Licence: MIT license

Python package for Customizable Data Preprocessing Pipelines

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to preprocessy

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+14367.65%)

Mutual labels: pipelines, data-engineering

generaptr

Generaptr is a node package that helps when starting up a project by generating boilerplate code for Express api.

Stars: ✭ 16 (-52.94%)

Mutual labels: hacktoberfest2022

E-Learning-freesite

This site is mainly design for those student who don't know how to start their journey in the field of programming

Stars: ✭ 57 (+67.65%)

Mutual labels: hacktoberfest2022

conteudos-tech

- Esse repositório foi criado por mim, Fernanda Souza, com o intuito de divulgar ferramentas gratuitas que possam auxiliar pessoas em seus estudos.

Stars: ✭ 62 (+82.35%)

Mutual labels: hacktoberfest2022

FizzBuzz-Hacktoberfest-2021

🎃 Submit creative FizzBuzz solutions in any language you want! Open for beginners !

Stars: ✭ 17 (-50%)

Mutual labels: hacktoberfest2022

SquirrelJME

SquirrelJME is a Java ME 8 Virtual Machine for embedded and Internet of Things devices. It has the ultimate goal of being 99.9% compatible with the Java ME standard.

Stars: ✭ 148 (+335.29%)

Mutual labels: hacktoberfest2022

aditof sdk

Analog Devices 3D ToF software suite

Stars: ✭ 61 (+79.41%)

Mutual labels: hacktoberfest2022

LuluTest

LuluTest is a Python framework for creating automated browser tests.

Stars: ✭ 14 (-58.82%)

Mutual labels: hacktoberfest2022

aries-vcx

AriesVCX is a Rust framework for building web and mobile applications issuing, holding, presenting and verifying Verifiable Credentials in accordance to the standards set by Hyperledger Aries.

Stars: ✭ 33 (-2.94%)

Mutual labels: hacktoberfest2022

practical-data-engineering

Real estate dagster pipeline

Stars: ✭ 110 (+223.53%)

Mutual labels: data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+55.88%)

Mutual labels: data-engineering

storybook-addon-mock

This addon allows you to mock fetch or XMLHttpRequest in the storybook.

Stars: ✭ 67 (+97.06%)

Mutual labels: hacktoberfest2022

CP-Snippets

Important codes/functions/snippets required frequently in CP

Stars: ✭ 18 (-47.06%)

Mutual labels: hacktoberfest2022

nextcord

A Python wrapper for the Discord API forked from discord.py

Stars: ✭ 956 (+2711.76%)

Mutual labels: hacktoberfest2022

doto-client

Track your progress and multiply efficiency

Stars: ✭ 41 (+20.59%)

Mutual labels: hacktoberfest2022

recipes

Application for managing recipes, planning meals, building shopping lists and much much more!

Stars: ✭ 3,570 (+10400%)

Mutual labels: hacktoberfest2022

fotongo

Simple boilerplate for building Backend services like ExpressJS with GOFIBER ⚡️

Stars: ✭ 29 (-14.71%)

Mutual labels: hacktoberfest2022

projecthactoberfest

hactoberfest 2022

Stars: ✭ 32 (-5.88%)

Mutual labels: hacktoberfest2022

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+67.65%)

Mutual labels: data-engineering

cicd-images

Images used internally for running continuous integration/delivery tasks

Stars: ✭ 19 (-44.12%)

Mutual labels: pipelines

View All Similar Projects ➔

Preprocessy is a framework that provides data preprocessing pipelines for machine learning. It bundles all the common preprocessing steps that are performed on the data to prepare it for machine learning models. It aims to do so in a manner that is independent of the source and type of dataset. Hence, it provides a set of functions that have been generalised to different types of data.

The pipelines themselves are composed of these functions and flexible so that the users can customise them by adding their processing functions or removing pipeline functions according to their needs. The pipelines thus provide an abstract and high-level interface to the users.

Pipeline Structure

The pipelines are divided into 3 logical stages -

Stage 1 - Pipeline Input

Input datasets with the following extensions are supported - .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt

Stage 2 - Processing

This is the major part of the pipeline consisting of processing functions. The following functions are provided out of the box as individual functions as well as a part of the pipelines -

Handling Null Values
Handling Outliers
Normalisation and Scaling
Label Encoding
Correlation and Feature Extraction
Training and Test set splitting

Stage 3 - Pipeline Output

The output consists of processed dataset and pipeline parameters depending on the verbosity required.

Contributing

Please read our Contributing Guide before submitting a Pull Request to the project.

Support

Feel free to contact any of the maintainers. We're happy to help!

Roadmap

Check out our roadmap to stay informed of the latest features released and the upcoming ones. Feel free to give us your insights!

Documentation

The documentation can be found at here. Currently, some parts of the documentation are under development. All contributions are welcome! Please see our Contributing Guide.

Research Paper and Citations

Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs was presented at the 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) and is published in IEEE Xplore.

Link to full paper: https://ieeexplore.ieee.org/document/9736366

If you're using Preprocessy as a part of scientific research, please use the below citations.

Plain Text Citation

S. Kazi et al., "Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs," 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), 2022, pp. 206-211, doi: 10.1109/CDMA54072.2022.00039.

BibTeX Citation

@INPROCEEDINGS{9736366,
  author={Kazi, Saif and Vakharia, Priyesh and Shah, Parth and Gupta, Riya and Tailor, Yash and Mantry, Palak and Rathod, Jash},
  booktitle={2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)},
  title={Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs},
  year={2022},
  volume={},
  number={},
  pages={206-211},
  doi={10.1109/CDMA54072.2022.00039}}

License

See the LICENSE file for licensing information.

Links

Documentation: https://preprocessy.readthedocs.io/en/latest/
Changes: https://preprocessy.readthedocs.io/en/latest/changes/
PyPI Releases: https://pypi.org/project/preprocessy/
Source Code: https://github.com/preprocessy/preprocessy
Datasets: https://drive.google.com/drive/folders/1MoMHNgd6KR5A_l5PkFIcxeax7lXm72l9?usp=sharing
Issue Tracker: https://github.com/preprocessy/preprocessy/issues
Chat: https://discord.gg/5q2yCqqU6N

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

preprocessy / preprocessy

Programming Languages

Labels

Projects that are alternatives of or similar to preprocessy

Pipeline Structure

Stage 1 - Pipeline Input

Stage 2 - Processing

Stage 3 - Pipeline Output

Contributing

Support

Roadmap

Documentation

Research Paper and Citations

Plain Text Citation

BibTeX Citation

License

Links