All Projects → rmax → databrewer

rmax / databrewer

Licence: MIT License
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to databrewer

is-04
AMWA IS-04 NMOS Discovery and Registration Specification (Stable)
Stars: ✭ 35 (-10.26%)
Mutual labels:  discovery
awesome-sweden-datasets
A curated list of awesome datasets to use when coding for the Swedish market.
Stars: ✭ 17 (-56.41%)
Mutual labels:  datasets
SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
Stars: ✭ 74 (+89.74%)
Mutual labels:  datasets
systematic-review-datasets
A collection of fully labeled systematic review datasets (title-abstract screening)
Stars: ✭ 25 (-35.9%)
Mutual labels:  datasets
Data-Science-and-Machine-Learning-Resources
List of Data Science and Machine Learning Resource that I frequently use
Stars: ✭ 19 (-51.28%)
Mutual labels:  datasets
bing-ip2hosts
bingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (+153.85%)
Mutual labels:  discovery
allie
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (+138.46%)
Mutual labels:  datasets
ck-env
CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:
Stars: ✭ 67 (+71.79%)
Mutual labels:  datasets
text-classification-small-datasets
Building a text classifier with extremely small datasets
Stars: ✭ 34 (-12.82%)
Mutual labels:  datasets
bnk48 photo datasets
BNK48 Photo Datasets
Stars: ✭ 12 (-69.23%)
Mutual labels:  datasets
simpleRPC
Simple RPC implementation for Arduino.
Stars: ✭ 28 (-28.21%)
Mutual labels:  discovery
assistant-with-discovery-openwhisk
DEPRECATED: this repo is no longer actively maintained
Stars: ✭ 21 (-46.15%)
Mutual labels:  discovery
panoptic parts
This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.
Stars: ✭ 82 (+110.26%)
Mutual labels:  datasets
extra keras datasets
📃🎉 Additional datasets for tensorflow.keras
Stars: ✭ 20 (-48.72%)
Mutual labels:  datasets
RData.jl
Read R data files from Julia
Stars: ✭ 49 (+25.64%)
Mutual labels:  datasets
MetaMorpheus
Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
Stars: ✭ 59 (+51.28%)
Mutual labels:  discovery
kaggle-code
A repository for some of the code I used in kaggle data science & machine learning tasks.
Stars: ✭ 100 (+156.41%)
Mutual labels:  datasets
asreview-visualization
Visualization extension for ASReview
Stars: ✭ 16 (-58.97%)
Mutual labels:  discovery
Happy
Happy 🥳 | Rocketseat 💜 - NLW 03 👩‍🚀
Stars: ✭ 61 (+56.41%)
Mutual labels:  discovery
the-weather-scraper
A Lightweight Weather Scraper
Stars: ✭ 56 (+43.59%)
Mutual labels:  datasets

DataBrewer

Documentation Status Coverage Status Code Quality Status Requirements Status

The missing datasets manager.

DataBrewer preview

Databrewer let you search and discover datasets. Inspired by Homebrew, it creates and index of known datasets that you can download with a single command. It will provide an API to allow to do the same in, for example, a IPython notebook so you no longer have to manually download datasets.

Quickstart

Install databrewer:

pip install databrewer

Update the recipes index:

databrewer update

Search for some keywords:

databrewer search nyc taxi

Example output:

andresmh-nyc-taxi-trips - NYC Taxi Trips. Data obtained through a FOIA request
nyc-tlc-taxi            - This dataset includes trip records from all trips
                          completed in yellow and green taxis in NYC in 2014 and
                                                    select months of 2015.

Let's check the nyc-tlc-taxi dataset:

databrewer info nyc-tlc-taxi

We can either download the entire dataset (which is huge!):

databrewer download nyc-tlc-taxi

Or just a few files in the dataset, or select a subset:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Note

Note that * is the standard glob operator and [green] acts as selector. The selectors depends on how the recipe if defined. When using selectors you must enclose the name in quotes in most shells.

Finally you need to know where the files are located for further processing:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Example output:

/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-01.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-02.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-03.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-04.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-05.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-06.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-07.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-08.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-09.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-10.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-11.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-12.csv

Datasets

The aim is to index known and not-so-known datasets. There is no plans to standarize the dataset format as we want to keep it as published by the authors.

Recipes

Datasets are defined in recipes which contains information about the dataset and where to find it.

These recipes are community maintained and hosted in the databrewer-recipes repository.

Roadmap

  • Include an API. For now it only provides a CLI-interface but in the near future it will include an API so you can search, download and load datasets directly in your Python code.

Contributing

You can help by the following means:

See CONTRIBUTING.rst for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].