Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Stars: ✭ 828 (+5420%)

Mutual labels: pandas

Finta

Common financial technical indicators implemented in Pandas.

Stars: ✭ 901 (+5906.67%)

Mutual labels: pandas

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (+4213.33%)

Mutual labels: pandas

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

Stars: ✭ 8,329 (+55426.67%)

Mutual labels: pandas

Lux

Python API for Intelligent Visual Data Discovery

Stars: ✭ 787 (+5146.67%)

Mutual labels: pandas

S3bp

Read and write Python objects to S3, caching them on your hard drive to avoid unnecessary IO.

Stars: ✭ 24 (+60%)

Mutual labels: pandas

Fecon235

Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio equities SPX bonds TIPS rates currency FX euro EUR USD JPY yen XAU gold Brent WTI oil Holt-Winters time-series forecasting statistics econometrics

Stars: ✭ 708 (+4620%)

Mutual labels: pandas

Pandas exercises

Practice your pandas skills!

Stars: ✭ 7,140 (+47500%)

Mutual labels: pandas

Boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (+53.33%)

Mutual labels: pandas

Jdata

京东JData算法大赛-高潜用户购买意向预测入门程序(starter code)

Stars: ✭ 662 (+4313.33%)

Mutual labels: pandas

Disatbot

DABOT: Disaster Attention Bot

Stars: ✭ 26 (+73.33%)

Mutual labels: pandas

Pingouin

Statistical package in Python based on Pandas

Stars: ✭ 651 (+4240%)

Mutual labels: pandas

Quickviz

Visualize a pandas dataframe in a few clicks

Stars: ✭ 18 (+20%)

Mutual labels: pandas

Numsharp

High Performance Computation for N-D Tensors in .NET, similar API to NumPy.

Stars: ✭ 882 (+5780%)

Mutual labels: pandas

Pyda 2e Zh

📖 [译] 利用 Python 进行数据分析 · 第 2 版

Stars: ✭ 866 (+5673.33%)

Mutual labels: pandas

Python Introducing Pandas

Introduction to pandas Treehouse course

Stars: ✭ 24 (+60%)

Mutual labels: pandas

View All Similar Projects ➔

Yelp Dataset Challenge for Python

Repository for reading and downloading Yelp Dataset Challenge round 6 in Pandas pickle format. This repository makes it easy for anyone who want to mess around with Yelp data using Python. I provide yelp_util Python package that has read and download function.

Datasets repository

The following is structure of S3,

science-of-science-bucket
└─yelp_academic_dataset
  ├───yelp_academic_dataset_business.pickle (61k rows)
  ├───yelp_academic_dataset_review.pickle (1.5M rows)
  ├───yelp_academic_dataset_user.pickle (366k rows)
  ├───yelp_academic_dataset_checkin.pickle (45k rows)
  └───yelp_academic_dataset_tip.pickle (495k rows)

You can download data directly from AWS S3 repository as follows,

import yelp_util
yelp_util.download(file_list=["yelp_academic_dataset_business.pickle",
                              "yelp_academic_dataset_review.pickle",
                              "yelp_academic_dataset_user.pickle",
                              "yelp_academic_dataset_checkin.pickle",
                              "yelp_academic_dataset_tip.pickle"])

The file will be downloaded to data folder. After finishing download, you can simply read pickle as follows

import pandas as pd
review = pd.read_pickle('data/yelp_academic_dataset_review.pickle')
review.head()

Structure of Datasets

User table of user's information (366k rows)

average_stars	compliments	elite	fans	friends	name	review_count	type	user_id	votes	yelping_since

Business table of business with its location and city that it locates (61k rows)

attributes	business_id	categories	city	full_address	hours	latitude	longitude	name	neighborhoods	open	review_count	stars	state	type

Review reviews made by users (1.5M rows)

business_id	date	review_id	stars	text	type	user_id	type	votes_cool	votes_funny	votes_useful

Checkin check-in table (45k rows)

business_id	checkin_info	type

Tip tip table (495k rows)

business_id	date	likes	text	type	user_id

Cluster businesses according to how they are tagged

Read the business data

from sklearn.cluster import KMeans

business = pd.read_pickle('data/yelp_academic_dataset_business.pickle')
tags = business.categories.tolist()

then transform tags to matrix count

tag_countmatrix = yelp_util.taglist_to_matrix(tags)

This can be used to cluster businesses

from sklearn.cluster import KMeans
km = KMeans(n_clusters=3)
km.fit(tag_countmatrix)
business['cluster'] = km.predict(tag_countmatrix)

Train word2vec model

review = pd.read_pickle('data/yelp_academic_dataset_review.pickle')
yelp_review_sample = list(review.text.iloc[10000:20000])
model = yelp_util.create_word2vec_model(yelp_review_sample) # word2vec model

Django runserver

All django project is in random_reviews folder. Get started by running python manage.py migrate. Then for local computer (main aim is to custom css files) run Django project by using python manage.py runserver

Dependencies

pandas
scikit-learn
nltk with punkt (nltk.download('punkt'))
gensim
unidecode

Members

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 15

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗