All Projects → iamaziz → Pydataset

iamaziz / Pydataset

Licence: mit
Instant access to many datasets in Python.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pydataset

Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-84.09%)
Mutual labels:  data-science, datasets
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+56.59%)
Mutual labels:  data-science, datasets
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+39.66%)
Mutual labels:  data-science, datasets
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (-39.43%)
Mutual labels:  data-science, datasets
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+354.89%)
Mutual labels:  data-science, datasets
Machine Learning Resources
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Stars: ✭ 226 (-74.32%)
Mutual labels:  data-science, datasets
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-90.8%)
Mutual labels:  data-science, datasets
Retriever
Quickly download, clean up, and install public datasets into a database management system
Stars: ✭ 241 (-72.61%)
Mutual labels:  data-science, datasets
Akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (+392.5%)
Mutual labels:  data-science, datasets
Datasets For Recommender Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Stars: ✭ 564 (-35.91%)
Mutual labels:  data-science, datasets
Resources
PyMC3 educational resources
Stars: ✭ 930 (+5.68%)
Mutual labels:  data-science
Tiledb Vcf
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-97.05%)
Mutual labels:  data-science
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-2.95%)
Mutual labels:  data-science
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-1.82%)
Mutual labels:  data-science
Datacleaner
A Python tool that automatically cleans data sets and readies them for analysis.
Stars: ✭ 933 (+6.02%)
Mutual labels:  data-science
Vds
Verteego Data Suite
Stars: ✭ 9 (-98.98%)
Mutual labels:  data-science
R Notes
Notes for using R language to do data mining and machine learning (Chinese)
Stars: ✭ 25 (-97.16%)
Mutual labels:  data-science
Docker Images
Out-of-box Data Science / AI platform | AI/数据科学的瑞士军刀
Stars: ✭ 25 (-97.16%)
Mutual labels:  data-science
Kubeflow Data Science On Steroids
The blog post about Kubeflow, including all materials
Stars: ✭ 25 (-97.16%)
Mutual labels:  data-science
Bayeslite
BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
Stars: ✭ 877 (-0.34%)
Mutual labels:  data-science

PyDataset

PyPI version

Provides instant access to many datasets right from Python (in pandas DataFrame structure).

What?

The idea is simple. There are various datasets available out there, but they are scattered in different places over the web. Is there a quick way (in Python) to access them instantly without going through the hassle of searching, downloading, and reading ... etc? PyDataset tries to address that question :)

Usage:

Start with importing data():

from pydataset import data
  • To load a dataset:
titanic = data('titanic')
  • To display the documentation of a dataset:
data('titanic', show_doc=True)
  • To see the available datasets:
data()

That's it. See more examples.

Why?

In R, there is a very easy and immediate way to access multiple statistical datasets, in almost no effort. All it takes is one line > data(dataset_name). This makes the life easier for quick prototyping and testing. Well, I am jealous that Python does not have a similar functionality. Thus, the aim of pydataset is to fill that gap.

Currently, pydataset has about 757 (mostly numerical-based) datasets, that are based on RDatasets. In the future, I plan to scale it to include a larger set of datasets. For example,

  1. include textual data for NLP-related tasks, and
  2. allow adding a new dataset to the in-module repository.

Installation:

$ pip install pydataset

Uninstall:

  • $ pip uninstall pydataset
  • $ rm -rf $HOME/.pydataset

Changelog

0.2.0

  • Add search dataset by name similarity.
  • Example:
>>> data('heat')
Did you mean:
Wheat, heart, Heating, Yeast, eidat, badhealth, deaths, agefat, hla, heptathlon, azt

0.1.1

  • Fix: add support to Windows and fix filepaths, issue #1

Dependency:

  • pandas

Miscellaneous:

  • Tested on OSX and Linux (debian).
  • Supports both Python 2 (2.7.11) and Python 3 (3.5.1).

TODO:

  • add textual datasets (e.g. NLTK stuff).
  • add samples generators.

Thanks to:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].