All Projects → jnmclarty → validada

jnmclarty / validada

Licence: other
Another library for defensive data analysis.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to validada

Pandera
A light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (+1644.83%)
Mutual labels:  validation, data-validation, pandas
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+3234.48%)
Mutual labels:  data, pandas, data-analysis
fairlens
Identify bias and measure fairness of your data
Stars: ✭ 51 (+75.86%)
Mutual labels:  data, pandas, data-analysis
Data Forge Js
JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (+379.31%)
Mutual labels:  data, pandas, data-analysis
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+841.38%)
Mutual labels:  data, pandas, data-analysis
Pandas Datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (+7427.59%)
Mutual labels:  data, pandas, data-analysis
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+2806.9%)
Mutual labels:  pandas, data-analysis
tutorials
Short programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-51.72%)
Mutual labels:  pandas, data-analysis
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (+20.69%)
Mutual labels:  pandas, data-analysis
ipython-notebooks
A collection of Jupyter notebooks exploring different datasets.
Stars: ✭ 43 (+48.28%)
Mutual labels:  pandas, data-analysis
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-55.17%)
Mutual labels:  pandas, data-analysis
online-course-recommendation-system
Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.
Stars: ✭ 31 (+6.9%)
Mutual labels:  pandas, data-analysis
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (+3.45%)
Mutual labels:  pandas, data-analysis
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+455.17%)
Mutual labels:  pandas, data-analysis
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-34.48%)
Mutual labels:  pandas, data-analysis
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (+313.79%)
Mutual labels:  pandas, data-analysis
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-41.38%)
Mutual labels:  pandas, data-analysis
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+1344.83%)
Mutual labels:  pandas, data-analysis
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (+44.83%)
Mutual labels:  pandas, data-analysis
Dominando-Pandas
Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-24.14%)
Mutual labels:  pandas, data-analysis

Validada

(Pronounced "Valid-Data")

This project started as a fork of engarde v0.0.2

Validada differentiates from engarde under the hood, substantially, in order to implement a richer feature set including custom-exceptions, universal slicing API, check object-return. All, with a focus on code brevity.

All of the basics are the same as engarde, with likely a minor hit to speed. Although, in many cases engarde raises on the first problem it finds. Validada's policy is to raise only after checking everything.

As of 7/7/2015, validada passes all of the unit tests of engarde.

Slicing?

All checks slice the dataframe internally, so users of validada never have to pass in a sliced dataframe. Instead, users can pass in a slice-like object as an argument.

How do I pass a slice?

from validada.slicers import iloc, loc

some_check(adf, iloc[-7:], iloc[:-7])

# or...

@some_check(iloc[-1], iloc[:-1])
def somefunc(adf):
	return adf + 1.0

All checks can take up to two slice-like arguments. The first, is the slice which will be checked. The second, is a slice for calculating constants to use during the check. Both are optional. So, say you have a dataframe coming from a source of data, with known "good" data (for instance, before last week), and want to check that the data for just this week is within two standard deviations of the data, excluding the latest week of data, you would pass in iloc[-7:] and iloc[:-7] as arguments to the check.

#To use the same functionality of engarde, one would use...
from validada.functions.raising import none_missing, is_shape, unique_index
#or
from validada.decorators.raising import none_missing, is_shape, unique_index
#But with validada you get more out of the box...
from validada.functions.returning import none_missing, is_shape, unique_index
#or
from validada.decorators.returning import none_missing, is_shape, unique_index

Custom Return-Objects?

Depending on the check, there might be some useful information to pass back out, or maybe you want to perform a bunch of checks and just collect the boolean results for each?

from validada.core import ReturnSet

rs = ReturnSet(('bool', 'obj'))
none_missing = rs.none_missing

print "Since we specified 'bool' and 'obj', in that order:"
a_bool, an_obj = none_missing(adf, ix['2013':], columns='one')
#a_bool, is the result of the check
print a_bool
#an_obj, is a none_missing specific object, it's a way to 
#get other information out of the check.
print an_obj

Custom Exceptions?

To use the advance features instantiate your own CheckSet (or child of, eg. RaiseSet,ReturnSet) via...

from validada.core import RaiseSet
rs = RaiseSet(IOError, "IO error makes no sense, but why not?")
none_missing = rs.none_missing

#ready...
none_missing(adf, ix['2013':])

#or make a decorator
none_missing = rs.decorator_maker('none_missing')

Dependencies

  • Pandas

Supports python 2.7+ and Python 3.6

Overall Design

Every check has a return-function and raise-function created all sharing a common signature. These two functions are used to create one staticfunction, for every check, of the CheckSet. A CheckSet object stores custom-exception, custom-object return, and default slicing settings. A CheckSet object has a generic way to turn any check, into a decorator using one line.
An instance of RaiseSet and ReturnSet is used to declare function..checks and decorators..checks.

See Also

assertr engarde

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].