All Projects → mjhucla → P Multimodal Dataset Toolbox

mjhucla / P Multimodal Dataset Toolbox

Projects that are alternatives of or similar to P Multimodal Dataset Toolbox

E2e Ml App Pytorch
🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku → https://e2e-ml-app-pytorch.herokuapp.com/ (may take few minutes to spin up occasionally).
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Hass Amazon Rekognition
Home Assistant Object detection with Amazon Rekognition
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Equalareacartogram
Converts a Shapefile, GeoJSON, or CSV to an equal area cartogram
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Puzzlemix
Official PyTorch implementation of "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" (ICML'20)
Stars: ✭ 67 (-1.47%)
Mutual labels:  jupyter-notebook
Concrete Autoencoders
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Pynq
Python Productivity for ZYNQ
Stars: ✭ 1,152 (+1594.12%)
Mutual labels:  jupyter-notebook
Lovaszsoftmax
Code for the Lovász-Softmax loss (CVPR 2018)
Stars: ✭ 1,148 (+1588.24%)
Mutual labels:  jupyter-notebook
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Equivariant Transformers
Equivariant Transformer (ET) layers are image-to-image mappings that incorporate prior knowledge on invariances with respect to continuous transformations groups (ICML 2019). Paper: https://arxiv.org/abs/1901.11399
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Encode Attend Navigate
Learning Heuristics for the TSP by Policy Gradient
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Deep Learning
深度学习的实战项目
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Qpga
Simulations of photonic quantum programmable gate arrays
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Backdrop
Implementation and demonstration of backdrop in pytorch. Code and demonstration of GP dataset generator.
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Covid 19 Dataviz
Simple data visualization on Covid-19 data using Pandas and Google Colaboratory
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Cifar10 mxnet
使用mxnet编写的kaggle CIFAR10比赛的代码
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Data bootcamp
Materials for a course at NYU Stern using Python to study economic and financial data.
Stars: ✭ 67 (-1.47%)
Mutual labels:  jupyter-notebook
Twitter sentiment analysis
A guide for binary class sentiment analysis of tweets.
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Red bag
支付宝红包/淘宝领喵币/雪球红包/苏宁易购/京东/淘宝自动签到 领取金币
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Predictive Analytics With Tensorflow
Predictive Analytics with TensorFlow, published by Packt
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook
Predictive Maintenance
Demonstration of MapR for Industrial IoT
Stars: ✭ 68 (+0%)
Mutual labels:  jupyter-notebook

Pinterest Multimodal Dataset ToolBox

Created by Junhua Mao

Introduction

This is a toolbox to download and manage the released part of the Pinterest40M multimodal dataset introduced in the paper Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images. More information can be found on the [Project Page](http://www.stat. ucla.edu/~junhua.mao/multimodal_embedding.html).

Cite

If you find this dataset or toolbox useful in your research, please cite:

@inproceedings{mao2016training,
  title={Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images},
  author={Mao, Junhua and Xu, Jiajing and Jing, Yushi and Yuille, Alan},
  booktitle={NIPS},
  year={2016}
}

Toolbox Installation and Data Downloading

Download and setup meta files.

Suppose that toolkit is install on $PATH_PTool:

cd $PATH_PTool
bash download_meta.sh

Download images.

You can easily download images in parallel (12 workers by default) and resize the downloaded images to 224x224:

cd $PATH_PTool
python download_images.py

There are ~5 million images in the dataset. The download process can take days.

The script allows you to resume your downloading at any time. Just re-run download_images.py if your downloading is shutted down unexpectedly. It is possible that you failed to access some of the urls at the first time. Re-run download_images.py to have another try.

You are welcome to read download_images.py and py_utils.py for personalized and advanced downloading settings (e.g. see the docstring of py_utils.PinDataset.download_images).

Demo

View demo.ipynb for how to use this toolbox.

Recommended Dataset Split

Use pin_2016_v1_0000.npy to pin_2016_v1_0097.npy as the training set.

Use pin_2016_v1_0098.npy as the validation set.

Use pin_2016_v1_0099.npy as the test set.

License

The copyright of the annotations and the images belongs to the original source. This meta data file can be used for research proposes only.

This toolbox is licensed under a Creative Commons Attribution 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].