All Projects → ayush1997 → Visualize_ml

ayush1997 / Visualize_ml

Licence: mit
Python package for consolidated and extensive Univariate,Bivariate Data Analysis and Visualization catering to both categorical and continuous datasets.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Visualize ml

Python Novice Inflammation
Programming with Python
Stars: ✭ 199 (+24.38%)
Mutual labels:  data-analysis, matplotlib
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (-25%)
Mutual labels:  data-analysis, matplotlib
Edaviz
edaviz - Python library for Exploratory Data Analysis and Visualization in Jupyter Notebook or Jupyter Lab
Stars: ✭ 220 (+37.5%)
Mutual labels:  data-analysis, matplotlib
Matplotlib Doc Zh
📖 [译] Matplotlib 用户指南
Stars: ✭ 178 (+11.25%)
Mutual labels:  data-analysis, matplotlib
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (-49.37%)
Mutual labels:  data-analysis, matplotlib
python-data-visualization
Curated Python Notebooks for Data Visualization
Stars: ✭ 22 (-86.25%)
Mutual labels:  data-analysis, matplotlib
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-80.62%)
Mutual labels:  data-analysis, matplotlib
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (-78.12%)
Mutual labels:  data-analysis, matplotlib
ipython-notebooks
A collection of Jupyter notebooks exploring different datasets.
Stars: ✭ 43 (-73.12%)
Mutual labels:  data-analysis, matplotlib
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+161.88%)
Mutual labels:  data-analysis, matplotlib
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+4876.88%)
Mutual labels:  data-analysis, matplotlib
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+2641.88%)
Mutual labels:  data-analysis, matplotlib
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (-11.25%)
Mutual labels:  data-analysis, matplotlib
Phy
phy: interactive visualization and manual spike sorting of large-scale ephys data
Stars: ✭ 148 (-7.5%)
Mutual labels:  data-analysis
Suite2p
cell detection in calcium imaging recordings
Stars: ✭ 153 (-4.37%)
Mutual labels:  data-analysis
Opendatawrangling
공공데이터 분석
Stars: ✭ 148 (-7.5%)
Mutual labels:  matplotlib
Aachartkit Swift
📈📊📱💻🖥️An elegant modern declarative data visualization chart framework for iOS, iPadOS and macOS. Extremely powerful, supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. 极其精美而又强大的跨平台数据可视化图表框架,支持柱状图、条形图、…
Stars: ✭ 1,962 (+1126.25%)
Mutual labels:  data-analysis
Pandas Doc Zh
📖 [译] Pandas 中文文档(待校对)
Stars: ✭ 155 (-3.12%)
Mutual labels:  data-analysis
Sourced Ce
source{d} Community Edition (CE)
Stars: ✭ 153 (-4.37%)
Mutual labels:  data-analysis
Rl Book Challenge
self-studying the Sutton & Barto the hard way
Stars: ✭ 146 (-8.75%)
Mutual labels:  matplotlib

visualize_ML

visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build on libraries like matplotlib for visualization and sklean,scipy for statistical computations.

PyPI version

Table of content:

Requirement

  • python 2.x or python 3.x

Install

Install dependencies needed for matplotlib

sudo apt-get build-dep python-matplotlib

Install it using pip

pip install visualize_ML

Let's Code

While dealing with a Machine Learning problem some of the initial steps involved are data exploration,analysis followed by feature selection.Below are the modules for these tasks.

1) Data Exploration

At this stage, we explore variables one by one using Uni-variate Analysis which depends on whether the variable type is categorical or continuous .To deal with this we have the explore module.

>>> explore module

visualize_ML.explore.plot(data_input,categorical_name=[],drop=[],PLOT_COLUMNS_SIZE=4,bin_size=20,
bar_width=0.2,wspace=0.5,hspace=0.8)

Continuous Variables : In case of continous variables it plots the Histogram for every variable and gives descriptive statistics for them.

Categorical Variables : In case on categorical variables with 2 or more classes it plots the Bar chart for every variable and gives descriptive statistics for them.

Parameters Type Description
data_input Dataframe This is the input Dataframe with all data.(Right now the input can be only be a dataframe input.)
categorical_name list (default=[ ]) Names of all categorical variable columns with more than 2 classes, to distinguish them with the continuous variablesEmply list implies that there are no categorical features with more than 2 classes.
drop list default=[ ] Names of columns to be dropped.
PLOT_COLUMNS_SIZE int (default=4) Number of plots to display vertically in the display window.The row size is adjusted accordingly.
bin_size int (default="auto") Number of bins for the histogram displayed in the categorical vs categorical category.
wspace float32 (default = 0.5) Horizontal padding between subplot on the display window.
hspace float32 (default = 0.8) Vertical padding between subplot on the display window.

Code Snippet

/* The data set is taken from famous Titanic data(Kaggle)*/

import pandas as pd
from visualize_ML import explore
df = pd.read_csv("dataset/train.csv")
explore.plot(df,["Survived","Pclass","Sex","SibSp","Ticket","Embarked"],drop=["PassengerId","Name"])

Alt text

see the dataset

Note: While plotting all the rows with NaN values and columns with Character values are removed(except if values are True and False ),only numeric data is plotted.

2) Feature Selection

This is one of the challenging task to deal with for a ML task.Here we have to do Bi-variate Analysis to find out the relationship between two variables. Here, we look for association and disassociation between variables at a pre-defined significance level.

relation module helps in visualizing the analysis done on various combination of variables and see relation between them.

>>> relation module

visualize_ML.relation.plot(data_input,target_name="",categorical_name=[],drop=[],bin_size=10)

Continuous vs Continuous variables: To do the Bi-variate analysis scatter plots are made as their pattern indicates the relationship between variables. To indicates the strength of relationship amongst them we use Correlation between them.

The graph displays the correlation coefficient along with other information.

Correlation = Covariance(X,Y) / SQRT( Var(X)*Var(Y))
  • -1: perfect negative linear correlation
  • +1:perfect positive linear correlation and
  • 0: No correlation

Categorical vs Categorical variables: Stacked Column Charts are made to visualize the relation.Chi square test is used to derive the statistical significance of relationship between the variables. It returns probability for the computed chi-square distribution with the degree of freedom. For more information on Chi Test see this

Probability of 0: It indicates that both categorical variable are dependent

Probability of 1: It shows that both variables are independent.

The graph displays the p_value along with other information. If it is leass than 0.05 it states that the variables are dependent.

Categorical vs Continuous variables: To explore the relation between categorical and continuous variables,box plots re drawn at each level of categorical variables. If levels are small in number, it will not show the statistical significance. ANOVA test is used to derive the statistical significance of relationship between the variables.

The graph displays the p_value along with other information. If it is leass than 0.05 it states that the variables are dependent.

For more information on ANOVA test see this

Parameters Type Description
data_input Dataframe This is the input Dataframe with all data.(Right now the input can be only be a dataframe input.)
target_name String The name of the target column.
categorical_name list (default=[ ]) Names of all categorical variable columns with more than 2 classes, to distinguish them with the continuous variablesEmply list implies that there are no categorical features with more than 2 classes.
drop list default=[ ] Names of columns to be dropped.
PLOT_COLUMNS_SIZE int (default=4) Number of plots to display vertically in the display window.The row size is adjusted accordingly.
bin_size int (default="auto") Number of bins for the histogram displayed in the categorical vs categorical category.
wspace float32 (default = 0.5) Horizontal padding between subplot on the display window.
hspace float32 (default = 0.8) Vertical padding between subplot on the display window.

Code Snippet

/* The data set is taken from famous Titanic data(Kaggle)*/
import pandas as pd
from visualize_ML import relation
df = pd.read_csv("dataset/train.csv")
relation.plot(df,"Survived",["Survived","Pclass","Sex","SibSp","Ticket","Embarked"],drop=["PassengerId","Name"],bin_size=10)

Alt text

see the dataset

Note: While plotting all the rows with NaN values and columns with Non numeric values are removed only numeric data is plotted.Only categorical taget variable with string values are allowed.

Contribute

If you want to contribute and add new feature feel free to send Pull request here

This project is still under development so to report any bugs or request new features, head over to the Issues page

Tasks To Do

  • [ ] Make input compatible with other formats like Numpy.

  • [ ] Visualize best fit lines and decision boundaries for various models to make Parameter Tuning task easy.

    and many others!

Licence

Licensed under The MIT License (MIT).

Copyright

ayush1997(c) 2016

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].