All Projects → data-cleaning → errorlocate

data-cleaning / errorlocate

Licence: other
Find and replace erroneous fields in data using validation rules

Programming Languages

r
7636 projects
TeX
3793 projects

Projects that are alternatives of or similar to errorlocate

Escaya
An blazing fast 100% spec compliant, incremental javascript parser written in Typescript
Stars: ✭ 217 (+1042.11%)
Mutual labels:  errors
errors
errors with paired message and caller stack frame
Stars: ✭ 19 (+0%)
Mutual labels:  errors
exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (+21.05%)
Mutual labels:  data-cleaning
Wtfiswronghere
A collection of simple errors that beginners are likely to hit when they start writing Python.
Stars: ✭ 240 (+1163.16%)
Mutual labels:  errors
HoloClean-Legacy-deprecated
A Machine Learning System for Data Enrichment.
Stars: ✭ 75 (+294.74%)
Mutual labels:  data-cleaning
Cleaner.jl
A toolbox of simple solutions for common data cleaning problems.
Stars: ✭ 21 (+10.53%)
Mutual labels:  data-cleaning
Util
A collection of useful utility functions
Stars: ✭ 201 (+957.89%)
Mutual labels:  errors
objectiv-analytics
Powerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+2000%)
Mutual labels:  data-cleaning
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (+47.37%)
Mutual labels:  data-cleaning
raise if
one liner `raise Exception if condition` for Python
Stars: ✭ 15 (-21.05%)
Mutual labels:  errors
Node Common Errors
Common error classes and utility functions
Stars: ✭ 247 (+1200%)
Mutual labels:  errors
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+7010.53%)
Mutual labels:  data-cleaning
errorsingo.com
Hugo static site setup for errorsingo.com
Stars: ✭ 25 (+31.58%)
Mutual labels:  errors
Merry
Golang errors with stacktrace and context
Stars: ✭ 230 (+1110.53%)
Mutual labels:  errors
rakered
The open source components from rake.red
Stars: ✭ 28 (+47.37%)
Mutual labels:  errors
Bugsnag Ruby
Bugsnag error monitoring & reporting software for rails, sinatra, rack and ruby
Stars: ✭ 211 (+1010.53%)
Mutual labels:  errors
fail
Better error handling solution specially designed for web application servers
Stars: ✭ 27 (+42.11%)
Mutual labels:  errors
easybuggy4django
EasyBuggy clone built on Django
Stars: ✭ 44 (+131.58%)
Mutual labels:  errors
karma-go
Everything has a reason.
Stars: ✭ 15 (-21.05%)
Mutual labels:  errors
bugsnag-java
Bugsnag error reporting for Java.
Stars: ✭ 51 (+168.42%)
Mutual labels:  errors

R build status CRAN Downloads status Codecov test coverage Mentioned in Awesome Official Statistics

Error localization

Find errors in data given a set of validation rules. The errorlocate helps to identify obvious errors in raw datasets.

It works in tandem with the package validate. With validate you formulate data validation rules to which the data must comply.

For example:

  • “age cannot be negative”: age >= 0.
  • “if a person is married, he must be older then 16 years”: if (married ==TRUE) age > 16.
  • “Profit is turnover minus cost”: profit == turnover - cost.

While validate can check if a record is valid or not, it does not identify which of the variables are responsible for the invalidation. This may seem a simple task, but is actually quite tricky: a set of validation rules forms a web of dependent variables: changing the value of an invalid record to repair for rule 1, may invalidate the record for rule 2.

errorlocate provides a small framework for record based error detection and implements the Felligi Holt algorithm. This algorithm assumes there is no other information available then the values of a record and a set of validation rules. The algorithm minimizes the (weighted) number of values that need to be adjusted to remove the invalidation.

Installation

errorlocate can be installed from CRAN:

install.packages("errorlocate")

Beta versions can be installed with drat:

drat::addRepo("data-cleaning")
install.packages("errorlocate")

The latest development version of errorlocate can be installed from github with devtools:

devtools::install_github("data-cleaning/errorlocate")

Usage

library(errorlocate)
#> Loading required package: validate
rules <- validator( profit == turnover - cost
                  , cost >= 0.6 * turnover
                  , turnover >= 0
                  , cost >= 0 # is implied
)

data <- data.frame(profit=750, cost=125, turnover=200)

data_no_error <- replace_errors(data, rules)

# faulty data was replaced with NA
print(data_no_error)
#>   profit cost turnover
#> 1     NA  125      200

er <- errors_removed(data_no_error)

print(er)
#> call:  locate_errors(data, x, ref, ..., cl = cl) 
#> located  1  error(s).
#> located  0  missing value(s).
#> Use 'summary', 'values', '$errors' or '$weight', to explore and retrieve the errors.

summary(er)
#> Variable:
#>       name errors missing
#> 1   profit      1       0
#> 2     cost      0       0
#> 3 turnover      0       0
#> Errors per record:
#>   errors records
#> 1      1       1

er$errors
#>      profit  cost turnover
#> [1,]   TRUE FALSE    FALSE
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].