Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

Stars: ✭ 4,387 (+738.81%)

Mutual labels: data-science, data-analysis

Jupyter pivottablejs

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

Stars: ✭ 428 (-18.16%)

Mutual labels: data-science, data-analysis

The Elements Of Statistical Learning Python Notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

Stars: ✭ 405 (-22.56%)

Mutual labels: data-science, data-analysis

Knowledge Repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Stars: ✭ 4,956 (+847.61%)

Mutual labels: data-science, data-analysis

Scikit Mobility

scikit-mobility: mobility analysis in Python

Stars: ✭ 339 (-35.18%)

Mutual labels: data-science, data-analysis

Kneed

Knee point detection in Python 📈

Stars: ✭ 328 (-37.28%)

Mutual labels: data-science, data-analysis

Pandas Summary

An extension to pandas dataframes describe function.

Stars: ✭ 361 (-30.98%)

Mutual labels: data-science, data-analysis

Akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Stars: ✭ 4,334 (+728.68%)

Mutual labels: data-science, data-analysis

Dataexplorer

Automate Data Exploration and Treatment

Stars: ✭ 362 (-30.78%)

Mutual labels: data-science, data-analysis

Pyod

A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Stars: ✭ 5,083 (+871.89%)

Mutual labels: data-science, data-analysis

Datascience course

Curso de Data Science em Português

Stars: ✭ 294 (-43.79%)

Mutual labels: data-science, data-analysis

Pydataroad

open source for wechat-official-account (ID: PyDataLab)

Stars: ✭ 302 (-42.26%)

Mutual labels: data-science, data-analysis

View All Similar Projects ➔

This open source framework fluently implements your ideas for data mining.

DaPy - Enjoy the Tour in Data Mining

中文版

Overview

DaPy is a data analysis library designed with ease of use in mind and it lets you smoothly implement your thoughts by providing well-designed data structures and abundant professional ML models. There has been a lot of famous data operation modules already like Pandas, but there is no module, which

supports writing codes in Chain Programming;
multi-threading safety data containers;
operates feature engineering methods with simple APIs;
handles data as easily as using Excel (do not pay attention to data structures);
shows the log of each steps on console like MySQL.

Thus, DaPy is more suitable for data analysts, statistic professors and who works with big data with limited computer knowledge than the engineers. In DaPy, our data structure offers 70 APIs for data mining, including 40+ data operation functions, 10+ feature engineering functions and 15+ data exploring functions.

Example

This example simply shows the characters of DaPy of chain programming, working log and simple feature engineering methods. Our goal in this example is to train a classifier for Iris classification task. Detail information can be read from here.

Features of DaPy

We already have abundant of great libraries for data science, why we need DaPy?

The answer is DaPy is designed for data analysts, not for coders. In DaPy, users only need to focus on their thought of handling data, and pay less attention to coding tricks. For example, in contrast with Pandas, DaPy supports you manipulating data by rows as same as using SQL. Here are just a few of things that make DaPy simple:

Variety of ways to visualize data in CMD
2D data sheet structures following Python syntax habits
SQL-like APIs to process data
Thread-safety data container
Variety functions for preprocessing and feature engineering
Flexible IO tools for loading and saving data (e.g. Website, Excel, Sqlite3, SPSS, Text)
Built-in basic models (e.g. Decision Tree, Multilayer Perceptron, Linear Regression, ...)

Also, DaPy has high efficiency to support you solving real-world situations. Following dialog shows a testing result which provides that DaPy has comparable efficiency than some exists C written libraries. The detail of test can be found from here.

Install

The latest version 1.11.1 had been updated to PyPi.

pip install DaPy

Some of functions in DaPy depend on requirements.

xlrd: loading data from .xls file【Necessary】
xlwt: export data to a .xls file【Necessary】
repoze.lru: speed up loading data from .csv file【Necessary】
savReaderWrite: loading data from .sav file【Optional】
bs4.BeautifulSoup: auto downloading data from a website【Optional】
numpy: dramatically increase the efficiency of ML models【Recommand】

Usages

Load & Explore Data
- Load data from a local csv, sav, sqlite3, mysql server, mysql dump file or xls file: sheet = DaPy.read(file_addr)
- Display the first five and the last five records: sheet.show(lines=5)
- Summary the statistical information of each columns: sheet.info
- Count distribution of categorical variable: sheet.count_values('gender')
- Find differences of the labels in categorical variables: sheet.groupby('city')
- Calculate the correlation between the continuous variables: sheet.corr(['age', 'income'])
Preprocessing & Clean Up Data
- Remove duplicate records: sheet.drop_duplicates(col, keep='first')
- Use linear interpolation to fill in NaN : sheet.fillna(method='linear')
- Remove the records which contains more than 50% variables are NaN: sheet.dropna(axis=0, how=0.5)
- Remove some meaningless columns (e.g. ID): sheet.drop('ID', axis=1)
- Sort records by some columns: sheet = sheet.sort('Age', 'DESC')
- Merge external features from another table: sheet.merge(sheet2, left_key='ID', other_key='ID', keep_key='self', keep_same=False)
- Merge external records from another table: sheet.join(sheet2)
- Append records one by one: sheet.append_row(new_row)
- Append new variables one by one: sheet.append_col(new_col)
- Get parts of records by index: sheet[:10, 20: 30, 50: 100]
- Get parts of columns by column name: sheet['age', 'income', 'name']
Feature Engineering
- Transfer a date time into categorical variables: sheet.get_date_label('birth')
- Transfer numerical variables into categorical variables: sheet.get_categories(cols='age', cutpoints=[18, 30, 50], group_name=['Juveniles', 'Adults', 'Wrinkly', 'Old'])
- Transfer categorical variables into dummy variables: sheet.get_dummies(['city', 'education'])
- Create higher-order crossover terms between your selected variables: sheet.get_interactions(n_power=3, col=['income', 'age', 'gender', 'education'])
- Introduce the ranks of each records: sheet.get_ranks(cols='income', duplicate='mean')
- Standardize some normal continuous variables: sheet.normalized(col='age')
- Special processing for some special variables: sheet.normalized('log', col='salary')
- Create new variables by some business logical formulas: sheet.apply(func=tax_rate, col=['salary', 'income'])
- Difference process to make time-series stable: DaPy.diff(sheet.income)
Developing Models
- Choose a model and initialize it: m = MLP(), m = LinearRegression(), m = DecisionTree() or m = DiscriminantAnalysis()
- Train the model parameters: m.fit(X_train, Y_train)
Model Evaluation
- Evaluate model with parameter tests: m.report.show()
- Evaluate model with visualization: m.plot_error() or DecisionTree.export_graphviz()
- Evaluate model with test set: DaPy.methods.Performance(m, X_test, Y_test, mode).
Saving Result
- Save the model: m.save(addr)
- Save the final dataset: sheet.save(addr)

Contributors

Contributors:
- Xuansheng WU (@JacksonWoo: [email protected])
- Feichi YANG (@Nick Yang: [email protected])

Following programs are also great data analyzing/ manipulating frameworks in Python:

Agate: Data analysis library optimized for humans
Numpy: fundamental package for scientific computing with Python
Pandas: Python Analysis Data
Scikit-Learn: Machine Learn in Python

Further-Info

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 523

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

JacksonWuxs / Dapy

Programming Languages

Labels

Projects that are alternatives of or similar to Dapy

DaPy - Enjoy the Tour in Data Mining

Overview

Example

Features of DaPy

Install

Usages

Contributors

Contributors:

Related

Further-Info