Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (-54.88%)

Mutual labels: data-analysis, pandas

Data-Analyst-Nanodegree

Kai Sheng Teh - Udacity Data Analyst Nanodegree

Stars: ✭ 42 (-93.06%)

Mutual labels: pandas, data-analysis

data-analysis-using-python

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data

Stars: ✭ 81 (-86.61%)

Mutual labels: pandas, data-analysis

Datagear

数据可视化分析平台，使用Java语言开发，采用浏览器/服务器架构，支持SQL、CSV、Excel、HTTP接口、JSON等多种数据源

Stars: ✭ 266 (-56.03%)

Mutual labels: sql, data-analysis

datatile

A library for managing, validating, summarizing, and visualizing data.

Stars: ✭ 419 (-30.74%)

Mutual labels: pandas, data-analysis

Ai Learn

人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

Stars: ✭ 4,387 (+625.12%)

Mutual labels: data-analysis, pandas

Pandas Summary

An extension to pandas dataframes describe function.

Stars: ✭ 361 (-40.33%)

Mutual labels: data-analysis, pandas

Prettypandas

A Pandas Styler class for making beautiful tables

Stars: ✭ 376 (-37.85%)

Mutual labels: data-analysis, pandas

Dominando-Pandas

Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.

Stars: ✭ 22 (-96.36%)

Mutual labels: pandas, data-analysis

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (-95.04%)

Mutual labels: pandas, data-analysis

visions

Type System for Data Analysis in Python

Stars: ✭ 136 (-77.52%)

Mutual labels: pandas, data-analysis

ipython-notebooks

A collection of Jupyter notebooks exploring different datasets.

Stars: ✭ 43 (-92.89%)

Mutual labels: pandas, data-analysis

fairlens

Identify bias and measure fairness of your data

Stars: ✭ 51 (-91.57%)

Mutual labels: pandas, data-analysis

Pydata Notebook

利用Python进行数据分析第二版 (2017) 中文翻译笔记

Stars: ✭ 4,300 (+610.74%)

Mutual labels: data-analysis, pandas

dataquest-guided-projects-solutions

My dataquest project solutions

Stars: ✭ 35 (-94.21%)

Mutual labels: pandas, data-analysis

online-course-recommendation-system

Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.

Stars: ✭ 31 (-94.88%)

Mutual labels: pandas, data-analysis

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (-49.92%)

Mutual labels: data-analysis, pandas

View All Similar Projects ➔

siuba

scrappy data analysis, with seamless support for pandas and SQL

siuba (小巴) is a port of dplyr and other R libraries. It supports a tabular data analysis workflow centered on 5 common actions:

select() - keep certain columns of data.
filter() - keep certain rows of data.
mutate() - create or modify an existing column of data.
summarize() - reduce one or more columns down to a single number.
arrange() - reorder the rows of data.

These actions can be preceeded by a group_by(), which causes them to be applied individually to grouped rows of data. Moreover, many SQL concepts, such as distinct(), count(), and joins are implemented. Inputs to these functions can be a pandas DataFrame or SQL connection (currently postgres, redshift, or sqlite).

For more on the rationale behind tools like dplyr, see this tidyverse paper. For examples of siuba in action, see the siuba documentation.

Installation

pip install siuba

Examples

See the siuba docs or this live analysis for a full introduction.

Basic use

The code below uses the example DataFrame mtcars, to get the average horsepower (hp) per cylinder.

from siuba import group_by, summarize, _
from siuba.data import mtcars

(mtcars
  >> group_by(_.cyl)
  >> summarize(avg_hp = _.hp.mean())
  )

Out[1]: 
   cyl      avg_hp
0    4   82.636364
1    6  122.285714
2    8  209.214286

There are three key concepts in this example:

concept	example	meaning
verb	`group_by(...)`	a function that operates on a table, like a DataFrame or SQL table
siu expression	`_.hp.mean()`	an expression created with `siuba._`, that represents actions you want to perform
pipe	`mtcars >> group_by(...)`	a syntax that allows you to chain verbs with the `>>` operator

See introduction to siuba.

What is a siu expression (e.g. `_.cyl == 4`)?

A siu expression is a way of specifying what action you want to perform. This allows siuba verbs to decide how to execute the action, depending on whether your data is a local DataFrame or remote table.

from siuba import _

_.cyl == 4

Out[2]:
█─==
├─█─.
│ ├─_
│ └─'cyl'
└─4

You can also think of siu expressions as a shorthand for a lambda function.

from siuba import _

# lambda approach
mtcars[lambda _: _.cyl == 4]

# siu expression approach
mtcars[_.cyl == 4]

Out[3]: 
     mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
2   22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
7   24.4    4  146.7   62  3.69  3.190  20.00   1   0     4     2
..   ...  ...    ...  ...   ...    ...    ...  ..  ..   ...   ...
27  30.4    4   95.1  113  3.77  1.513  16.90   1   1     5     2
31  21.4    4  121.0  109  4.11  2.780  18.60   1   1     4     2

[11 rows x 11 columns]

See siu expression section here.

Using with a SQL database

A killer feature of siuba is that the same analysis code can be run on a local DataFrame, or a SQL source.

In the code below, we set up an example database.

# Setup example data ----
from sqlalchemy import create_engine
from siuba.data import mtcars

# copy pandas DataFrame to sqlite
engine = create_engine("sqlite:///:memory:")
mtcars.to_sql("mtcars", engine, if_exists = "replace")

Next, we use the code from the first example, except now executed a SQL table.

# Demo SQL analysis with siuba ----
from siuba import _, group_by, summarize, filter
from siuba.sql import LazyTbl

# connect with siuba
tbl_mtcars = LazyTbl(engine, "mtcars")

(tbl_mtcars
  >> group_by(_.cyl)
  >> summarize(avg_hp = _.hp.mean())
  )

Out[4]: 
# Source: lazy query
# DB Conn: Engine(sqlite:///:memory:)
# Preview:
   cyl      avg_hp
0    4   82.636364
1    6  122.285714
2    8  209.214286
# .. may have more rows

See querying SQL introduction here.

Example notebooks

Below are some examples I've kept as I've worked on siuba. For the most up to date explanations, see the siuba docs

siu expressions
dplyr style pandas
- select verb case study
sql using dplyr style
- simple sql statements
- the kitchen sink with postgres
tidytuesday examples
- tidytuesday is a weekly R data analysis project. In order to kick the tires on siuba, I've been using it to complete the assignments. More specifically, I've been porting Dave Robinson's tidytuesday analyses to use siuba.

Testing

Tests are done using pytest. They can be run using the following.

# start postgres db
docker-compose up
pytest siuba

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 605

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (91) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

machow / Siuba

Programming Languages

Labels

Projects that are alternatives of or similar to Siuba

siuba

Installation

Examples

Basic use

What is a siu expression (e.g. _.cyl == 4)?

Using with a SQL database

Example notebooks

Testing

What is a siu expression (e.g. `_.cyl == 4`)?