All Projects → buabaj → xplore

buabaj / xplore

Licence: MIT license
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to xplore

prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+157.14%)
Mutual labels:  data-wrangling, data-preprocessing
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+328.57%)
Mutual labels:  data-wrangling
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+6333.33%)
Mutual labels:  data-wrangling
Data-Analyst-Nanodegree
This repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stars: ✭ 13 (-38.1%)
Mutual labels:  data-wrangling
r-novice-inflammation
Programming with R
Stars: ✭ 142 (+576.19%)
Mutual labels:  data-wrangling
modelscript
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Stars: ✭ 40 (+90.48%)
Mutual labels:  data-preprocessing
Datatest
Tools for test driven data-wrangling and data validation.
Stars: ✭ 238 (+1033.33%)
Mutual labels:  data-wrangling
Stock-Trading-Using-Machine-Learning
A comprehensive approach for stock trading implemented using Neural Network and Reinforcement Learning separately.
Stars: ✭ 20 (-4.76%)
Mutual labels:  data-preprocessing
whyqd
data wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-23.81%)
Mutual labels:  data-wrangling
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+4.76%)
Mutual labels:  data-preprocessing
sciblox
sciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+128.57%)
Mutual labels:  data-preprocessing
sql-ecology-lesson
Data Management with SQL for Ecologists
Stars: ✭ 37 (+76.19%)
Mutual labels:  data-wrangling
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (+47.62%)
Mutual labels:  data-wrangling
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+1985.71%)
Mutual labels:  data-wrangling
SMMT
Social Media Mining Toolkit (SMMT) main repository
Stars: ✭ 116 (+452.38%)
Mutual labels:  data-preprocessing
Data Cleaning 101
Data Cleaning Libraries with Python
Stars: ✭ 243 (+1057.14%)
Mutual labels:  data-wrangling
sql-novice-survey
Databases and SQL
Stars: ✭ 59 (+180.95%)
Mutual labels:  data-wrangling
pyrefine
Execute OpenRefine JSON scripts without OpenRefine (or Java)
Stars: ✭ 25 (+19.05%)
Mutual labels:  data-wrangling
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+666.67%)
Mutual labels:  data-wrangling
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-9.52%)
Mutual labels:  data-wrangling

xplore Downloads


xplore is a python package built with Pandas for data scientist or analysts, AI/ML engineers or researchers for exploring features of a dataset in one line of code for quick analysis before data wrangling and feature extraction. You can also choose to generate a more detailed report on the exploration of your dataset upon request.

Getting started

Install the package

pip install xplore

Import the package into your code

from xplore.data import xplore

Assign the read/open command to the file path or URL of your structured dataset to a variable name

data = < Read in your dataset file here >

Explore your dataset using the xplore() method

xplore(data)

Testing xplore

Navigate to the test.py file after installing the package and run the code in that file to see and understand how xplore works.

Sample Output

------------------------------------
The fist 5 entries of your dataset are:

   rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0     1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1     2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2     3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3     4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4     5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08

[5 rows x 16 columns]


------------------------------------
The last 5 entries of your dataset are:

       rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[5 rows x 16 columns]


------------------------------------
Stats on your dataset:

<bound method NDFrame.describe of        rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0         1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1         2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2         3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3         4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4         5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08
...     ...          ...          ...           ...  ...                 ...                      ...            ...         ...
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[57793 rows x 16 columns]>


------------------------------------
The Value types of each column are:

rank                         int64
country_full                object
country_abrv                object
total_points               float64
previous_points              int64
rank_change                  int64
cur_year_avg               float64
cur_year_avg_weighted      float64
last_year_avg              float64
last_year_avg_weighted     float64
two_year_ago_avg           float64
two_year_ago_weighted      float64
three_year_ago_avg         float64
three_year_ago_weighted    float64
confederation               object
rank_date                   object
dtype: object


------------------------------------
Info on your Dataset:

<bound method DataFrame.info of        rank country_full country_abrv  total_points  ...  three_year_ago_avg  three_year_ago_weighted  confederation   rank_date
0         1      Germany          GER           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
1         2        Italy          ITA           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
2         3  Switzerland          SUI           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
3         4       Sweden          SWE           0.0  ...                 0.0                      0.0           UEFA  1993-08-08
4         5    Argentina          ARG           0.0  ...                 0.0                      0.0       CONMEBOL  1993-08-08
...     ...          ...          ...           ...  ...                 ...                      ...            ...         ...
57788   206     Anguilla          AIA           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57789   206      Bahamas          BAH           0.0  ...                 0.0                      0.0       CONCACAF  2018-06-07
57790   206      Eritrea          ERI           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57791   206      Somalia          SOM           0.0  ...                 0.0                      0.0            CAF  2018-06-07
57792   206        Tonga          TGA           0.0  ...                 0.0                      0.0            OFC  2018-06-07

[57793 rows x 16 columns]>


------------------------------------
The shape of your dataset in the order of rows and columns is:

(57793, 16)


------------------------------------
The features of your dataset are:

Index(['rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'rank_date'],
      dtype='object')


------------------------------------
The total number of null values from individual columns of your data set are:

rank                       0
country_full               0
country_abrv               0
total_points               0
previous_points            0
rank_change                0
cur_year_avg               0
cur_year_avg_weighted      0
last_year_avg              0
last_year_avg_weighted     0
two_year_ago_avg           0
two_year_ago_weighted      0
three_year_ago_avg         0
three_year_ago_weighted    0
confederation              0
rank_date                  0
dtype: int64


------------------------------------
The number of rows in your dataset are:

57793


------------------------------------
The values in your dataset are:

[[1 'Germany' 'GER' ... 0.0 'UEFA' '1993-08-08']
 [2 'Italy' 'ITA' ... 0.0 'UEFA' '1993-08-08']
 [3 'Switzerland' 'SUI' ... 0.0 'UEFA' '1993-08-08']
 ...
 [206 'Eritrea' 'ERI' ... 0.0 'CAF' '2018-06-07']
 [206 'Somalia' 'SOM' ... 0.0 'CAF' '2018-06-07']
 [206 'Tonga' 'TGA' ... 0.0 'OFC' '2018-06-07']]


------------------------------------


Do you want to generate a detailed report on the exploration of your dataset?
[y/n]: y
Generating report...

Summarize dataset: 100%|████████████████████████████████████████████████████████████████████████████| 30/30 [03:34<00:00,  7.14s/it, Completed] 
Generate report structure: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:31<00:00, 31.42s/it] 
Render HTML: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.07s/it] 
Export report to file: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.00it/s] 
Your Report has been generated and saved as 'output.html'

Contributing to xplore

Fork and clone this repo if you have any contributions you want to make. Push your commits to a new branch and send a PR when done. I'll review your code and merge your PR as soon as possible.

Maintainers:

Jerry Buaba | Labaran Mohammed | Benjamin Acquaah

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].