mouradmourafiq / Pandas Summary
Licence: mit
An extension to pandas dataframes describe function.
Stars: ✭ 361
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Pandas Summary
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-24.38%)
Mutual labels: data-science, data-analysis, pandas
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+375.35%)
Mutual labels: data-science, data-analysis, pandas
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+412.74%)
Mutual labels: data-science, data-analysis, pandas
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+129.36%)
Mutual labels: data-science, data-analysis, pandas
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (+693.35%)
Mutual labels: data-science, data-analysis, pandas
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+2105.82%)
Mutual labels: data-science, data-analysis, pandas
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+319.94%)
Mutual labels: data-science, data-analysis, pandas
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+2207.2%)
Mutual labels: data-science, data-analysis, pandas
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+1115.24%)
Mutual labels: data-science, data-analysis, pandas
Dtale Desktop
Build a data visualization dashboard with simple snippets of python code
Stars: ✭ 128 (-64.54%)
Mutual labels: data-science, data-analysis, pandas
Prettypandas
A Pandas Styler class for making beautiful tables
Stars: ✭ 376 (+4.16%)
Mutual labels: data-science, data-analysis, pandas
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (-45.71%)
Mutual labels: data-science, data-analysis, pandas
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-68.42%)
Mutual labels: data-science, data-analysis, pandas
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-65.37%)
Mutual labels: data-science, data-analysis, pandas
Zebras
Data analysis library for JavaScript built with Ramda
Stars: ✭ 192 (-46.81%)
Mutual labels: data-science, data-analysis, pandas
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-35.73%)
Mutual labels: data-science, data-analysis, pandas
Scikit Mobility
scikit-mobility: mobility analysis in Python
Stars: ✭ 339 (-6.09%)
Mutual labels: data-science, data-analysis
fairlens
Identify bias and measure fairness of your data
Stars: ✭ 51 (-85.87%)
Mutual labels: pandas, data-analysis
Xlearn
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Stars: ✭ 2,968 (+722.16%)
Mutual labels: data-science, data-analysis
validada
Another library for defensive data analysis.
Stars: ✭ 29 (-91.97%)
Mutual labels: pandas, data-analysis
pandas_summary
An extension to pandas dataframes describe function.
The module contains DataFrameSummary
object that extend describe()
with:
-
properties
- dfs.columns_stats: counts, uniques, missing, missing_perc, and type per column
- dsf.columns_types: a count of the types of columns
- dfs[column]: more in depth summary of the column
-
function
- summary(): extends the
describe()
function with the values withcolumns_stats
- summary(): extends the
Installation
The module can be easily installed with pip:
> pip install pandas-summary
This module depends on numpy
and pandas
. Optionally you can get also some nice visualisations if you have matplotlib
installed.
Tests
To run the tests, execute the command python setup.py test
Usage
The module contains one class:
DataFrameSummary
The DataFrameSummary
expect a pandas DataFrame
to summarise.
from pandas_summary import DataFrameSummary
dfs = DataFrameSummary(df)
getting the columns types
dfs.columns_types
numeric 9
bool 3
categorical 2
unique 1
date 1
constant 1
dtype: int64
getting the columns stats
dfs.columns_stats
A B C D E
counts 5802 5794 5781 5781 4617
uniques 5802 3 5771 128 121
missing 0 8 21 21 1185
missing_perc 0% 0.14% 0.36% 0.36% 20.42%
types unique categorical numeric numeric numeric
getting a single column summary, e.g. numerical column
# we can also access the column using numbers A[1]
dfs['A']
std 0.2827146
max 1.072792
min 0
variance 0.07992753
mean 0.5548516
5% 0.1603367
25% 0.3199776
50% 0.4968588
75% 0.8274732
95% 1.011255
iqr 0.5074956
kurtosis -1.208469
skewness 0.2679559
sum 3207.597
mad 0.2459508
cv 0.5095319
zeros_num 11
zeros_perc 0,1%
deviating_of_mean 21
deviating_of_mean_perc 0.36%
deviating_of_median 21
deviating_of_median_perc 0.36%
top_correlations {u'D': 0.702240243124, u'E': -0.663}
counts 5781
uniques 5771
missing 21
missing_perc 0.36%
types numeric
Name: A, dtype: object
Future development
Summary analysis between columns, i.e. dfs[[1, 2]]
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].