All Projects → JovianML → opendatasets

JovianML / opendatasets

Licence: MIT License
A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to opendatasets

napkinXC
Extremely simple and fast extreme multi-class and multi-label classifiers.
Stars: ✭ 38 (-76.4%)
Mutual labels:  datasets
Data-Science-and-Machine-Learning-Resources
List of Data Science and Machine Learning Resource that I frequently use
Stars: ✭ 19 (-88.2%)
Mutual labels:  datasets
bnk48 photo datasets
BNK48 Photo Datasets
Stars: ✭ 12 (-92.55%)
Mutual labels:  datasets
PharmacoDB
Search across publicly available datasets to find instances where a drug or cell line of interest has been profiled.
Stars: ✭ 38 (-76.4%)
Mutual labels:  datasets
systematic-review-datasets
A collection of fully labeled systematic review datasets (title-abstract screening)
Stars: ✭ 25 (-84.47%)
Mutual labels:  datasets
awesome-sweden-datasets
A curated list of awesome datasets to use when coding for the Swedish market.
Stars: ✭ 17 (-89.44%)
Mutual labels:  datasets
Text-Summarization-Repo
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.
Stars: ✭ 213 (+32.3%)
Mutual labels:  datasets
ck-env
CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:
Stars: ✭ 67 (-58.39%)
Mutual labels:  datasets
PharmacoGx
R package to analyze large-scale pharmacogenomic datasets.
Stars: ✭ 42 (-73.91%)
Mutual labels:  datasets
the-weather-scraper
A Lightweight Weather Scraper
Stars: ✭ 56 (-65.22%)
Mutual labels:  datasets
AIODrive
Official Python/PyTorch Implementation for "All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds"
Stars: ✭ 32 (-80.12%)
Mutual labels:  datasets
extra keras datasets
📃🎉 Additional datasets for tensorflow.keras
Stars: ✭ 20 (-87.58%)
Mutual labels:  datasets
kaggle-code
A repository for some of the code I used in kaggle data science & machine learning tasks.
Stars: ✭ 100 (-37.89%)
Mutual labels:  datasets
traj-pred-irl
Official implementation codes of "Regularizing neural networks for future trajectory prediction via IRL framework"
Stars: ✭ 23 (-85.71%)
Mutual labels:  datasets
SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
Stars: ✭ 74 (-54.04%)
Mutual labels:  datasets
HINT3
This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights Workshop https://insights-workshop.github.io/ Preprint for the paper is available here https://arxiv.org/abs/2009.13833
Stars: ✭ 27 (-83.23%)
Mutual labels:  datasets
text-classification-small-datasets
Building a text classifier with extremely small datasets
Stars: ✭ 34 (-78.88%)
Mutual labels:  datasets
databrewer
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
Stars: ✭ 39 (-75.78%)
Mutual labels:  datasets
RData.jl
Read R data files from Julia
Stars: ✭ 49 (-69.57%)
Mutual labels:  datasets
panoptic parts
This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.
Stars: ✭ 82 (-49.07%)
Mutual labels:  datasets

opendatasets

opendatasets is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.

Installation

Install the library using pip:

pip install opendatasets --upgrade

Usage - Downloading a dataset

Datasets can be downloaded within a Jupyter notebook or Python script using the opendatasets.download helper function. Here's some sample code for downloading the US Elections Dataset:

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')

dataset_url can also point to a public Google Drive link or a raw file URL.

Kaggle Credentials

opendatasets uses the Kaggle Official API for donwloading dataset from Kaggle. Follow these steps to find your API credentials:

  1. Sign in to https://kaggle.com/, then click on your profile picture on the top right and select "My Account" from the menu.

  2. Scroll down to the "API" section and click "Create New API Token". This will download a file kaggle.json with the following contents:

{"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}
  1. When you run opendatsets.download, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded in step 2.

Note that you need to download the kaggle.json file only once. You can also place the kaggle.json file in the same directory as the Jupyter notebook, and the credentials will be read automatically.

Some interesting datasets

You can find interesting datasets on Kaggle: https://www.kaggle.com/datasets

You can also create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable)

Other sources to look for datasets:

If you use an external source other than Kaggle, you'll create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable using opendatasets)

Curated Datasets

opendatasets also provides some curated datsets that you can download by passing the Dataset ID to opendatasets.download. Here's an example:

import opendatasets
opendatasets.download('stackoverflow-developer-survey-2020')

The following datasets are available for download.

Dataset ID Description Source
stackoverflow-developer-survey-2020 Stack Overflow Developer Survey 2020 Stack Overflow
owid-covid-19-latest Covid-19 Stats by Our World in Data Our World in Data
state-of-javascript-2016 State of Javascript Annual Survey 2016 StateOfJS
state-of-javascript-2017 State of Javascript Annual Survey 2017 StateOfJS
state-of-javascript-2018 State of Javascript Annual Survey 2018 StateOfJS
state-of-javascript-2019 State of Javascript Annual Survey 2019 StateOfJS
countries-languages-spoken Languages Spoken in Different Countries Infoplease

More datasets will be added soon..

Contributing

This is an open source project and we welcome contributions.

Local Development Setup

  1. Clone the repository:
git clone https://github.com/JovianML/opendatasets.git
  1. Setup the Python environment for development
conda create -n opendatasets python=3.5
conda activate opendatasets
pip install -r requirements.txt
  1. Open up the project in VS code and make your changes. Make sure to install the Python Extension for VS Code and select the opendatasets conda environment.

This package is developed and maintained by the Jovian team.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].