Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jnmclarty → validada

jnmclarty / validada

Licence: other

Another library for defensive data analysis.

Programming Languages

139335 projects - #7 most used programming language

Labels

data validation data-validation decorators pandas data-analysis checkset

Projects that are alternatives of or similar to validada

A light-weight, flexible, and expressive pandas data validation library

Stars: ✭ 506 (+1644.83%)

Mutual labels: validation, data-validation, pandas

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 967 (+3234.48%)

Mutual labels: data, pandas, data-analysis

Identify bias and measure fairness of your data

Stars: ✭ 51 (+75.86%)

Mutual labels: data, pandas, data-analysis

JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 139 (+379.31%)

Mutual labels: data, pandas, data-analysis

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (+841.38%)

Mutual labels: data, pandas, data-analysis

Pandas Datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

Stars: ✭ 2,183 (+7427.59%)

Mutual labels: data, pandas, data-analysis

What's in your data? Extract schema, statistics and entities from datasets

Stars: ✭ 843 (+2806.9%)

Mutual labels: pandas, data-analysis

Short programming tutorials pertaining to data analysis.

Stars: ✭ 14 (-51.72%)

Mutual labels: pandas, data-analysis

dataquest-guided-projects-solutions

My dataquest project solutions

Stars: ✭ 35 (+20.69%)

Mutual labels: pandas, data-analysis

ipython-notebooks

A collection of Jupyter notebooks exploring different datasets.

Stars: ✭ 43 (+48.28%)

Mutual labels: pandas, data-analysis

DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library

Stars: ✭ 13 (-55.17%)

Mutual labels: pandas, data-analysis

online-course-recommendation-system

Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.

Stars: ✭ 31 (+6.9%)

Mutual labels: pandas, data-analysis

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (+3.45%)

Mutual labels: pandas, data-analysis

pandas-workshop

An introductory workshop on pandas with notebooks and exercises for following along.

Stars: ✭ 161 (+455.17%)

Mutual labels: pandas, data-analysis

Data-Science-101

Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.

Stars: ✭ 19 (-34.48%)

Mutual labels: pandas, data-analysis

PandasVersusExcel

Python数据分析入门，数据分析师入门

Stars: ✭ 120 (+313.79%)

Mutual labels: pandas, data-analysis

Data-Science-Resources

A guide to getting started with Data Science and ML.

Stars: ✭ 17 (-41.38%)

Mutual labels: pandas, data-analysis

A library for managing, validating, summarizing, and visualizing data.

Stars: ✭ 419 (+1344.83%)

Mutual labels: pandas, data-analysis

Data-Analyst-Nanodegree

Kai Sheng Teh - Udacity Data Analyst Nanodegree

Stars: ✭ 42 (+44.83%)

Mutual labels: pandas, data-analysis

Dominando-Pandas

Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.

Stars: ✭ 22 (-24.14%)

Mutual labels: pandas, data-analysis

View All Similar Projects ➔

Validada

(Pronounced "Valid-Data")

This project started as a fork of engarde v0.0.2

Validada differentiates from engarde under the hood, substantially, in order to implement a richer feature set including custom-exceptions, universal slicing API, check object-return. All, with a focus on code brevity.

All of the basics are the same as engarde, with likely a minor hit to speed. Although, in many cases engarde raises on the first problem it finds. Validada's policy is to raise only after checking everything.

As of 7/7/2015, validada passes all of the unit tests of engarde.

Slicing?

All checks slice the dataframe internally, so users of validada never have to pass in a sliced dataframe. Instead, users can pass in a slice-like object as an argument.

How do I pass a slice?

from validada.slicers import iloc, loc

some_check(adf, iloc[-7:], iloc[:-7])

# or...

@some_check(iloc[-1], iloc[:-1])
def somefunc(adf):
	return adf + 1.0

All checks can take up to two slice-like arguments. The first, is the slice which will be checked. The second, is a slice for calculating constants to use during the check. Both are optional. So, say you have a dataframe coming from a source of data, with known "good" data (for instance, before last week), and want to check that the data for just this week is within two standard deviations of the data, excluding the latest week of data, you would pass in iloc[-7:] and iloc[:-7] as arguments to the check.

#To use the same functionality of engarde, one would use...
from validada.functions.raising import none_missing, is_shape, unique_index
#or
from validada.decorators.raising import none_missing, is_shape, unique_index

#But with validada you get more out of the box...
from validada.functions.returning import none_missing, is_shape, unique_index
#or
from validada.decorators.returning import none_missing, is_shape, unique_index

Custom Return-Objects?

Depending on the check, there might be some useful information to pass back out, or maybe you want to perform a bunch of checks and just collect the boolean results for each?

from validada.core import ReturnSet

rs = ReturnSet(('bool', 'obj'))
none_missing = rs.none_missing

print "Since we specified 'bool' and 'obj', in that order:"
a_bool, an_obj = none_missing(adf, ix['2013':], columns='one')
#a_bool, is the result of the check
print a_bool
#an_obj, is a none_missing specific object, it's a way to 
#get other information out of the check.
print an_obj

Custom Exceptions?

To use the advance features instantiate your own CheckSet (or child of, eg. RaiseSet,ReturnSet) via...

from validada.core import RaiseSet
rs = RaiseSet(IOError, "IO error makes no sense, but why not?")
none_missing = rs.none_missing

#ready...
none_missing(adf, ix['2013':])

#or make a decorator
none_missing = rs.decorator_maker('none_missing')

Dependencies

Pandas

Supports python 2.7+ and Python 3.6

Overall Design

Every check has a return-function and raise-function created all sharing a common signature. These two functions are used to create one staticfunction, for every check, of the CheckSet. A CheckSet object stores custom-exception, custom-object return, and default slicing settings. A CheckSet object has a generic way to turn any check, into a decorator using one line.
An instance of RaiseSet and ReturnSet is used to declare function..checks and decorators..checks.

See Also

assertr engarde

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 29

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗