Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → boxuancui → Dataexplorer

boxuancui / Dataexplorer

Licence: other

Automate Data Exploration and Treatment

Programming Languages

7636 projects

Labels

data-science visualization rstats data-analysis r-package cran eda

Projects that are alternatives of or similar to Dataexplorer

Collapse

Advanced and Fast Data Transformation in R

Stars: ✭ 184 (-49.17%)

Mutual labels: data-science, data-analysis, cran, rstats

Football Data

football (soccer) datasets

Stars: ✭ 18 (-95.03%)

Mutual labels: data-science, data-analysis, rstats

Awesome R

A curated list of awesome R packages, frameworks and software.

Stars: ✭ 4,858 (+1241.99%)

Mutual labels: data-science, data-analysis, rstats

Tsrepr

TSrepr: R package for time series representations

Stars: ✭ 75 (-79.28%)

Mutual labels: data-science, data-analysis, r-package

Targets

Function-oriented Make-like declarative workflows for R

Stars: ✭ 293 (-19.06%)

Mutual labels: data-science, r-package, rstats

Metaflow

🚀 Build and manage real-life data science projects with ease!

Stars: ✭ 5,108 (+1311.05%)

Mutual labels: data-science, r-package, rstats

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+224.59%)

Mutual labels: data-science, data-analysis, eda

Pkgsearch

Search R packages on CRAN

Stars: ✭ 73 (-79.83%)

Mutual labels: r-package, cran, rstats

Sweetviz

Visualize and compare datasets, target values and associations, with one line of code.

Stars: ✭ 1,851 (+411.33%)

Mutual labels: data-science, data-analysis, eda

Mlr

Machine Learning in R

Stars: ✭ 1,542 (+325.97%)

Mutual labels: data-science, r-package, cran

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+2200.83%)

Mutual labels: data-science, data-analysis, eda

heddlr

Bring a functional programming mindset to R Markdown document generation

Stars: ✭ 14 (-96.13%)

Mutual labels: cran, rstats, r-package

Drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

Stars: ✭ 1,301 (+259.39%)

Mutual labels: data-science, r-package, rstats

Elastic

R client for the Elasticsearch HTTP API

Stars: ✭ 227 (-37.29%)

Mutual labels: data-science, r-package, rstats

pbapply

Adding progress bar to '*apply' functions in R

Stars: ✭ 115 (-68.23%)

Mutual labels: cran, rstats, r-package

Rhub

R-hub API client

Stars: ✭ 292 (-19.34%)

Mutual labels: r-package, rstats

Datascience course

Curso de Data Science em Português

Stars: ✭ 294 (-18.78%)

Mutual labels: data-science, data-analysis

Pydataroad

open source for wechat-official-account (ID: PyDataLab)

Stars: ✭ 302 (-16.57%)

Mutual labels: data-science, data-analysis

Rplos

R client for the PLoS Journals API

Stars: ✭ 289 (-20.17%)

Mutual labels: r-package, rstats

Ggextra

📊 Add marginal histograms to ggplot2, and more ggplot2 enhancements

Stars: ✭ 299 (-17.4%)

Mutual labels: r-package, rstats

View All Similar Projects ➔

DataExplorer

Background

Exploratory Data Analysis (EDA) is the initial and an important phase of data analysis/predictive modeling. During this process, analysts/modelers will have a first look of the data, and thus generate relevant hypotheses and decide next steps. However, the EDA process could be a hassle at times. This R package aims to automate most of data handling and visualization, so that users could focus on studying the data and extracting insights.

Installation

The package can be installed directly from CRAN.

install.packages("DataExplorer")

However, the latest stable version (if any) could be found on GitHub, and installed using devtools package.

if (!require(devtools)) install.packages("devtools")
devtools::install_github("boxuancui/DataExplorer")

If you would like to install the latest development version, you may install the develop branch.

if (!require(devtools)) install.packages("devtools")
devtools::install_github("boxuancui/DataExplorer", ref = "develop")

Examples

The package is extremely easy to use. Almost everything could be done in one line of code. Please refer to the package manuals for more information. You may also find the package vignettes here.

Report

To get a report for the airquality dataset:

library(DataExplorer)
create_report(airquality)

To get a report for the diamonds dataset with response variable price:

library(ggplot2)
create_report(diamonds, y = "price")

Visualization

Instead of running create_report, you may also run each function individually for your analysis, e.g.,

## View basic description for airquality data
introduce(airquality)


rows	153
columns	6
discrete_columns	0
continuous_columns	6
all_missing_columns	0
total_missing_values	44
complete_rows	111
total_observations	918
memory_usage	6,376

## Plot basic description for airquality data
plot_intro(airquality)

## View missing value distribution for airquality data
plot_missing(airquality)

## Left: frequency distribution of all discrete variables
plot_bar(diamonds)
## Right: `price` distribution of all discrete variables
plot_bar(diamonds, with = "price")

## View frequency distribution by a discrete variable
plot_bar(diamonds, by = "cut")

## View histogram of all continuous variables
plot_histogram(diamonds)

## View estimated density distribution of all continuous variables
plot_density(diamonds)

## View quantile-quantile plot of all continuous variables
plot_qq(diamonds)

## View quantile-quantile plot of all continuous variables by feature `cut`
plot_qq(diamonds, by = "cut")

## View overall correlation heatmap
plot_correlation(diamonds)

## View bivariate continuous distribution based on `cut`
plot_boxplot(diamonds, by = "cut")

## Scatterplot `price` with all other continuous features
plot_scatterplot(split_columns(diamonds)$continuous, by = "price", sampled_rows = 1000L)

## Visualize principal component analysis
plot_prcomp(diamonds, maxcat = 5L)

#> 2 features with more than 5 categories ignored!
#> color: 7 categories
#> clarity: 8 categories

Feature Engineering

To make quick updates to your data:

## Group bottom 20% `clarity` by frequency
group_category(diamonds, feature = "clarity", threshold = 0.2, update = TRUE)

## Group bottom 20% `clarity` by `price`
group_category(diamonds, feature = "clarity", threshold = 0.2, measure = "price", update = TRUE)

## Dummify diamonds dataset
dummify(diamonds)
dummify(diamonds, select = "cut")

## Set values for missing observations
df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))

## Update columns
update_columns(airquality, c("Month", "Day"), as.factor)
update_columns(airquality, 1L, function(x) x^2)

## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")

Articles

See article wiki page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 362

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (12) 🔗