Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → iamaziz → Pydataset

iamaziz / Pydataset

Licence: mit

Instant access to many datasets in Python.

Programming Languages

139335 projects - #7 most used programming language

Labels

data-science datasets

Projects that are alternatives of or similar to Pydataset

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-84.09%)

Mutual labels: data-science, datasets

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+56.59%)

Mutual labels: data-science, datasets

数据接口：百度、谷歌、头条、微博指数,宏观数据，利率数据，货币汇率，千里马、独角兽公司，新闻联播文字稿，影视票房数据，高校名单，疫情数据…

Stars: ✭ 1,229 (+39.66%)

Mutual labels: data-science, datasets

Awesome Twitter Data

A list of Twitter datasets and related resources.

Stars: ✭ 533 (-39.43%)

Mutual labels: data-science, datasets

Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai

Stars: ✭ 4,003 (+354.89%)

Mutual labels: data-science, datasets

Machine Learning Resources

A curated list of awesome machine learning frameworks, libraries, courses, books and many more.

Stars: ✭ 226 (-74.32%)

Mutual labels: data-science, datasets

R package to interface with OpenML

Stars: ✭ 81 (-90.8%)

Mutual labels: data-science, datasets

Quickly download, clean up, and install public datasets into a database management system

Stars: ✭ 241 (-72.61%)

Mutual labels: data-science, datasets

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Stars: ✭ 4,334 (+392.5%)

Mutual labels: data-science, datasets

Datasets For Recommender Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

Stars: ✭ 564 (-35.91%)

Mutual labels: data-science, datasets

PyMC3 educational resources

Stars: ✭ 930 (+5.68%)

Mutual labels: data-science

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-97.05%)

Mutual labels: data-science

Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]

Stars: ✭ 854 (-2.95%)

Mutual labels: data-science

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (-1.82%)

Mutual labels: data-science

A Python tool that automatically cleans data sets and readies them for analysis.

Stars: ✭ 933 (+6.02%)

Mutual labels: data-science

Verteego Data Suite

Stars: ✭ 9 (-98.98%)

Mutual labels: data-science

Notes for using R language to do data mining and machine learning (Chinese)

Stars: ✭ 25 (-97.16%)

Mutual labels: data-science

Out-of-box Data Science / AI platform | AI/数据科学的瑞士军刀

Stars: ✭ 25 (-97.16%)

Mutual labels: data-science

Kubeflow Data Science On Steroids

The blog post about Kubeflow, including all materials

Stars: ✭ 25 (-97.16%)

Mutual labels: data-science

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

Stars: ✭ 877 (-0.34%)

Mutual labels: data-science

View All Similar Projects ➔

PyDataset

Provides instant access to many datasets right from Python (in pandas DataFrame structure).

What?

The idea is simple. There are various datasets available out there, but they are scattered in different places over the web. Is there a quick way (in Python) to access them instantly without going through the hassle of searching, downloading, and reading ... etc? PyDataset tries to address that question :)

Usage:

Start with importing data():

from pydataset import data

To load a dataset:

titanic = data('titanic')

To display the documentation of a dataset:

data('titanic', show_doc=True)

To see the available datasets:

data()

That's it. See more examples.

Why?

In R, there is a very easy and immediate way to access multiple statistical datasets, in almost no effort. All it takes is one line > data(dataset_name). This makes the life easier for quick prototyping and testing. Well, I am jealous that Python does not have a similar functionality. Thus, the aim of pydataset is to fill that gap.

Currently, pydataset has about 757 (mostly numerical-based) datasets, that are based on RDatasets. In the future, I plan to scale it to include a larger set of datasets. For example,

include textual data for NLP-related tasks, and
allow adding a new dataset to the in-module repository.

Installation:

$ pip install pydataset

Uninstall:

$ pip uninstall pydataset
$ rm -rf $HOME/.pydataset

Changelog

0.2.0

Add search dataset by name similarity.
Example:

>>> data('heat')
Did you mean:
Wheat, heart, Heating, Yeast, eidat, badhealth, deaths, agefat, hla, heptathlon, azt

0.1.1

Fix: add support to Windows and fix filepaths, issue #1

Dependency:

pandas

Miscellaneous:

Tested on OSX and Linux (debian).
Supports both Python 2 (2.7.11) and Python 3 (3.5.1).

TODO:

add textual datasets (e.g. NLTK stuff).
add samples generators.

Thanks to:

RDatasets: R's datasets collection.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 880

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗