All Projects → mstaniak → Autoeda Resources

mstaniak / Autoeda Resources

Licence: cc-by-4.0
A list of software and papers related to automatic and fast Exploratory Data Analysis

Projects that are alternatives of or similar to Autoeda Resources

Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+542.54%)
Mutual labels:  exploratory-data-analysis, eda
Data Describe
data⎰describe: Pythonic EDA Accelerator for Data Science
Stars: ✭ 269 (+0.37%)
Mutual labels:  exploratory-data-analysis, eda
Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-47.76%)
Mutual labels:  exploratory-data-analysis, eda
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+3007.84%)
Mutual labels:  exploratory-data-analysis, eda
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-79.1%)
Mutual labels:  exploratory-data-analysis, eda
Hn so analysis
Is there a relationship between popularity of a given technology on Stack Overflow (SO) and Hacker News (HN)? And a few words about causality
Stars: ✭ 94 (-64.93%)
Mutual labels:  exploratory-data-analysis, eda
100 Days Of Ml Code
A day to day plan for this challenge. Covers both theoritical and practical aspects
Stars: ✭ 172 (-35.82%)
Mutual labels:  exploratory-data-analysis, eda
Ditching Excel For Python
Functionalities in Excel translated to Python
Stars: ✭ 172 (-35.82%)
Mutual labels:  exploratory-data-analysis, eda
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-80.97%)
Mutual labels:  exploratory-data-analysis, eda
Ophidian
Ophidian's Mirror Repository on github. https://gitlab.com/eclufsc/eda/ophidian
Stars: ✭ 32 (-88.06%)
Mutual labels:  automation, eda
Dataprep
DataPrep — The easiest way to prepare data in Python
Stars: ✭ 639 (+138.43%)
Mutual labels:  exploratory-data-analysis, eda
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (-70.9%)
Mutual labels:  exploratory-data-analysis, eda
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+2067.16%)
Mutual labels:  exploratory-data-analysis, eda
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+590.67%)
Mutual labels:  exploratory-data-analysis, eda
Inspectdf
🛠️ 📊 Tools for Exploring and Comparing Data Frames
Stars: ✭ 195 (-27.24%)
Mutual labels:  exploratory-data-analysis, eda
skimpy
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
Stars: ✭ 236 (-11.94%)
Mutual labels:  exploratory-data-analysis, eda
olliePy
OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.
Stars: ✭ 46 (-82.84%)
Mutual labels:  exploratory-data-analysis, eda
COVID-19-CaseStudy-and-Predictions
This repository is a case study, analysis and visualization of COVID-19 Pandemic spread along with prediction models.
Stars: ✭ 90 (-66.42%)
Mutual labels:  eda
Roguesploit
Powerfull Wi-Fi trap!
Stars: ✭ 262 (-2.24%)
Mutual labels:  automation
eshopzero
.Net Microservice Application
Stars: ✭ 27 (-89.93%)
Mutual labels:  eda

autoEDA-resources

A list of software and papers related to automated Exploratory Data Analysis, including

  • fast data exploration and visualization,
  • augmented analytics,
  • visualization recommendation and other tools that speed up data exploration (visual exploration in particular).

Pull requests with software, paper and conference presentations are welcome.

Software

R packages

My summary of R packages is in R Journal

General Packages

  • dataMaid (CRAN package) - automated checks of data validity.

  • DataExplorer (CRAN package) - automated data exploration (including univariate and bivariate plots, PCA) and treatment.

  • funModeling (CRAN package) - automated EDA, simple feature engineering and outlier detection.

  • SmartEDA (CRAN package) - automated generation of descriptive statistics and uni- and bivariate plots, parallel coordinate plots. Details can be found in a dedicated paper.

  • autoEDA (GitHub package) - automated EDA with uni- and bivariate plots. An article with an introduction can be found on LinkedIn.

  • visdat (CRAN package) - 6 exploratory/diagnostic plots for initial data analysis.

  • dlookr (CRAN package) - tools for data quality diagnosis, basic exploration and feature transformations.

  • xray (CRAN package) - first look at the data - distributions and anomalies. More in the blog post.

  • arsenal (CRAN package) - statistical summaries (models and exploration) and quick reporting.

  • RtutoR (CRAN package) - learning material with a automatic reports module. More at R-Bloggers.

  • exploreR (CRAN package) - exploration based on univariate linear regression.

  • summarytools (CRAN package) - table to summarise datasets and perform simple uni- and bivariate analyses.

  • inspectdf (CRAN package) - tools for column-wise exploration and comparison of data frames. Examples are provided in a README of the GitHub repo.

  • explore (CRAN package) - interactive Shiny app for comprehensive dataset exploration (including uni- and bivariate relationships, correlation analysis and simple modeling with decision trees) and stand-alone function for quick exploration. Examples are given in a vignette.

  • skimr (CRAN package) - well formatted summaries of data frames, vector and matrices. Examples are provided in a vignette.

  • janitor (CRAN package) - a tools for fast data cleaning. All functionalities are introduced in the vignette.

  • autoplotly (CRAN package) - a library for fast visualization of statistical results supported by ggfortify. Details can be found in the vignette or JOSS paper

  • brinton (CRAN package) - packages for quick exploration and visualization. Details can be found in the documentation.

  • AEDA (GitHub package) - summary statistics, correlation analysis, cluster analysis, PCA & other projections.

  • automatic-data-explorer (GitHub package) - basic EDA and creating Markdown reports from multiple R scripts.

  • xda (GitHub package) - basic data summaries.

  • modeler (GitHub package) - tools for exploration and pre-processing.

  • IEDA (GitHub package) - EDA simplified through interactive visualization.

  • dfvis (GitHub package) - ggplot2 based implementation of tabplot.

Domain-specific packages

Related packages

  • featuretoolsR (CRAN package) - R port to Python library for automated feature engineering.

  • vtreat (CRAN package) - data treatment (pre-processing) that includes dealing with missing data and large categorical variables. Details can be found in the paper about vtreat.

  • report - automated modeling report generation.

  • FactoInvestigate (CRAN package) - has an automatic reporting module which selects best plots that summarise different projection techniques.

  • gtsummary (GitHub package) - presentation-ready tables summarizing data sets, regression models, and more.

  • clean (CRAN package) - fast data cleaning.

  • finalfit (CRAN package) - tables and plots to quickly visualize regression results.

  • modelsummary (GitHub package) - summary tables for regression models.

Python libraries

General Packages

  • DataPrep (pip library) - data preparation library with an EDA package.

  • Dora (pip library) - data cleaning, featuring engineering and simple modeling tools.

  • statsModels (pip library) - collection of statistical tools, including EDA.

  • TPOT (pip library) - autoML tool with feature engineering module.

  • HoloViews (pip library) - automated visualization based on short data annotations.

  • lens (pip library) - fast calculation of summary statistics and correlations. Presentation about the library.

  • pandas-profiling - popular library for quick data summaries and correlation analysis.

  • speedML (pip library) - large library for ML with module dedicated to fast EDA.

  • edaviz - Python library for fast data exploration that provides functions for dataset overviews, bivariate plots and finding good predictors. (Free version only works for small datasets).

  • AutoViz - Python library for automated visualization.

  • ExploriPy - Python library for various EDA tasks.

  • pandas-summary - simple extension to pandas.describe.

  • sweetviz - visualizations for automated EDA.

Related packages

  • featuretools - library for automated feature engineering.

  • pyvtreat - Python version of the R's vtreat package.

  • autoimpute - easier handling of missing values.

  • Auto_TS - automated time series modeling.

Stata packages

  • eda - a package that produces a pdf report with all permutations of univariate and bivariate visualizations and tables. Notably, three-dimensional displays are also possible.

Web services

  • DIVE - MIT's tools for data exploration that tries to choose best (most informative) visualizations.

  • Automatic Statistician - tool for automated EDA and modeling.

  • Several Shiny apps by R Squared Computing, including visulizer and descriptr.

Standalone software

  • auto-eda - automatic EDA with SQL.

  • elycite - tools for exploration and modelling available (locally) as an web application. Designed for NLP problems.

Papers and short articles

Methods and tools for autoEDA

Visualization recommendation frameworks

Augmented analytics

Conference presentations

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].