All Projects → mouradmourafiq → Pandas Summary

mouradmourafiq / Pandas Summary

Licence: mit
An extension to pandas dataframes describe function.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pandas Summary

Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-24.38%)
Mutual labels:  data-science, data-analysis, pandas
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+375.35%)
Mutual labels:  data-science, data-analysis, pandas
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+412.74%)
Mutual labels:  data-science, data-analysis, pandas
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+129.36%)
Mutual labels:  data-science, data-analysis, pandas
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (+693.35%)
Mutual labels:  data-science, data-analysis, pandas
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+2105.82%)
Mutual labels:  data-science, data-analysis, pandas
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+319.94%)
Mutual labels:  data-science, data-analysis, pandas
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+2207.2%)
Mutual labels:  data-science, data-analysis, pandas
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+1115.24%)
Mutual labels:  data-science, data-analysis, pandas
Dtale Desktop
Build a data visualization dashboard with simple snippets of python code
Stars: ✭ 128 (-64.54%)
Mutual labels:  data-science, data-analysis, pandas
Prettypandas
A Pandas Styler class for making beautiful tables
Stars: ✭ 376 (+4.16%)
Mutual labels:  data-science, data-analysis, pandas
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (-45.71%)
Mutual labels:  data-science, data-analysis, pandas
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-68.42%)
Mutual labels:  data-science, data-analysis, pandas
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-65.37%)
Mutual labels:  data-science, data-analysis, pandas
Zebras
Data analysis library for JavaScript built with Ramda
Stars: ✭ 192 (-46.81%)
Mutual labels:  data-science, data-analysis, pandas
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-35.73%)
Mutual labels:  data-science, data-analysis, pandas
Scikit Mobility
scikit-mobility: mobility analysis in Python
Stars: ✭ 339 (-6.09%)
Mutual labels:  data-science, data-analysis
fairlens
Identify bias and measure fairness of your data
Stars: ✭ 51 (-85.87%)
Mutual labels:  pandas, data-analysis
Xlearn
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Stars: ✭ 2,968 (+722.16%)
Mutual labels:  data-science, data-analysis
validada
Another library for defensive data analysis.
Stars: ✭ 29 (-91.97%)
Mutual labels:  pandas, data-analysis

pandas_summary

An extension to pandas dataframes describe function.

The module contains DataFrameSummary object that extend describe() with:

  • properties
    • dfs.columns_stats: counts, uniques, missing, missing_perc, and type per column
    • dsf.columns_types: a count of the types of columns
    • dfs[column]: more in depth summary of the column
  • function
    • summary(): extends the describe() function with the values with columns_stats

Installation

The module can be easily installed with pip:

> pip install pandas-summary

This module depends on numpy and pandas. Optionally you can get also some nice visualisations if you have matplotlib installed.

Tests

To run the tests, execute the command python setup.py test

Usage

The module contains one class:

DataFrameSummary

The DataFrameSummary expect a pandas DataFrame to summarise.

from pandas_summary import DataFrameSummary

dfs = DataFrameSummary(df)

getting the columns types

dfs.columns_types


numeric     9
bool        3
categorical 2
unique      1
date        1
constant    1
dtype: int64

getting the columns stats

dfs.columns_stats


                      A            B        C              D              E 
counts             5802         5794     5781           5781           4617   
uniques            5802            3     5771            128            121   
missing               0            8       21             21           1185   
missing_perc         0%        0.14%    0.36%          0.36%         20.42%   
types            unique  categorical  numeric        numeric        numeric 

getting a single column summary, e.g. numerical column

# we can also access the column using numbers A[1]
dfs['A']

std                                                                 0.2827146
max                                                                  1.072792
min                                                                         0
variance                                                           0.07992753
mean                                                                0.5548516
5%                                                                  0.1603367
25%                                                                 0.3199776
50%                                                                 0.4968588
75%                                                                 0.8274732
95%                                                                  1.011255
iqr                                                                 0.5074956
kurtosis                                                            -1.208469
skewness                                                            0.2679559
sum                                                                  3207.597
mad                                                                 0.2459508
cv                                                                  0.5095319
zeros_num                                                                  11
zeros_perc                                                               0,1%
deviating_of_mean                                                          21
deviating_of_mean_perc                                                  0.36%
deviating_of_median                                                        21
deviating_of_median_perc                                                0.36%
top_correlations                         {u'D': 0.702240243124, u'E': -0.663}
counts                                                                   5781
uniques                                                                  5771
missing                                                                    21
missing_perc                                                            0.36%
types                                                                 numeric
Name: A, dtype: object

Future development

Summary analysis between columns, i.e. dfs[[1, 2]]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].