All Projects → TMiguelT → Pandasschema

TMiguelT / Pandasschema

Licence: gpl-3.0
A validation library for Pandas data frames using user-friendly schemas

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pandasschema

Pandera
A light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (+274.81%)
Mutual labels:  schema, validation, pandas
Specs
Technical specifications and guidelines for implementing Frictionless Data.
Stars: ✭ 403 (+198.52%)
Mutual labels:  schema, data-science, validation
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1271.11%)
Mutual labels:  data-science, pandas
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-15.56%)
Mutual labels:  data-science, pandas
D6t Python
Accelerate data science
Stars: ✭ 118 (-12.59%)
Mutual labels:  data-science, pandas
Sspipe
Simple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-28.89%)
Mutual labels:  data-science, pandas
Sigmoidal ai
Tutoriais de Python, Data Science, Machine Learning e Deep Learning - Sigmoidal
Stars: ✭ 103 (-23.7%)
Mutual labels:  data-science, pandas
Rdfunit
An RDF Unit Testing Suite
Stars: ✭ 117 (-13.33%)
Mutual labels:  schema, validation
Seaborn
Statistical data visualization in Python
Stars: ✭ 9,007 (+6571.85%)
Mutual labels:  data-science, pandas
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-7.41%)
Mutual labels:  data-science, pandas
Awesome Python Models
A curated list of awesome Python libraries, which implement models, schemas, serializers/deserializers, ODM's/ORM's, Active Records or similar patterns.
Stars: ✭ 124 (-8.15%)
Mutual labels:  schema, validation
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1666.67%)
Mutual labels:  data-science, pandas
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (+865.93%)
Mutual labels:  data-science, pandas
Pymc Example Project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Stars: ✭ 90 (-33.33%)
Mutual labels:  data-science, pandas
Postguard
🐛 Statically validate Postgres SQL queries in JS / TS code and derive schemas.
Stars: ✭ 104 (-22.96%)
Mutual labels:  schema, validation
Vue Rawmodel
RawModel.js plugin for Vue.js v2. Form validation has never been easier!
Stars: ✭ 79 (-41.48%)
Mutual labels:  schema, validation
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+1022.96%)
Mutual labels:  data-science, pandas
Data Science For Marketing Analytics
Achieve your marketing goals with the data analytics power of Python
Stars: ✭ 127 (-5.93%)
Mutual labels:  data-science, pandas
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-58.52%)
Mutual labels:  data-science, pandas
Govalid
Data validation library for golang. [MIGRATING TO NEW ADDRESS]
Stars: ✭ 59 (-56.3%)
Mutual labels:  schema, validation

PandasSchema


For the full documentation, refer to the Github Pages Website <https://tmiguelt.github.io/PandasSchema/>_.

======================================================================

PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

.. code::

Given Name,Family Name,Age,Sex,Customer ID Gerald,Hampton,82,Male,2582GABK Yuuwa,Miyake,27,Male,7951WVLW Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

.. code:: python

import pandas as pd from io import StringIO from pandas_schema import Column, Schema from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([ Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]), Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]), Column('Age', [InRangeValidation(0, 120)]), Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]), Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')]) ])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID Gerald ,Hampton,82,Male,2582GABK Yuuwa,Miyake,270,male,7951WVLW Edyta,Majewska ,50,Female,775ANSID '''))

errors = schema.validate(test_data)

for error in errors: print(error)

PandasSchema would then output

.. code:: text

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace {row: 1, column: "Age"}: "270" was not in the range [0, 120) {row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other) {row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace {row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].