All Projects → facebookresearch → omnivore

facebookresearch / omnivore

Licence: other
Omnivore: A Single Model for Many Visual Modalities

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Omnivorous modeling for visual modalities

This repository contains PyTorch pretrained models, inference examples for the following papers:

Omnivore A single vision model for many different visual modalities, CVPR 2022 [bib]
@inproceedings{girdhar2022omnivore,
  title={{Omnivore: A Single Model for Many Visual Modalities}},
  author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan},
  booktitle={CVPR},
  year={2022}
}
OmniMAE Single Model Masked Pretraining on Images and Videos [bib]
@article{girdhar2022omnimae,
  title={OmniMAE: Single Model Masked Pretraining on Images and Videos},
  author={Girdhar, Rohit and El-Nouby, Alaaeldin and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
  journal={arXiv preprint arXiv:2206.08356},
  year={2022}
}
OmniVision Our training pipeline supporting the multi-modal vision research.[bib]

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more information.

License

Omnivore is released under the CC-BY-NC 4.0 license. See LICENSE for additional details. However the Swin Transformer implementation is additionally licensed under the Apache 2.0 license (see NOTICE for additional details).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].