Introduction
This is Hexia. A PyTorch based framework for building visual question answering models. Hexia provides a mid-level API for seamless integration of your VQA models with pre-defined data, image preprocessing and natural language proprocessing pipelines.
Features
- Image preprocessing
- Text preprocessing
- Data Handling (MS-COCO Only)
- Real-time Loss and Accuracy Tracker
- VQA Evaluation
- Extendable Built-in Model Warehouse
Installation
- Clone the repository and enter it:
git clone https://github.com/aligholami/hexia && cd hexia
- Run the
setup.py
to install dependencies:
python3 setup.py install --user
Todo
- Official Evaluation Support (VQA-V2)
- Automatic Train/Val Plotting
- Automatic Checkpointing
- Automatic Resuming
- Prediction Module
- Prediction Module Test
- TensorboardX Auto-Resume Plots
- TensorboardX Auto-Resume Step Handler Fix
- TextVQA Support
- GQA Support
- Image Captioning Support
- Custom Loss and Optimizers
Documentation
Checkout the full documentation here.
References
1- Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 21-29).
2- Singh, A., Natarajan, V., Jiang, Y., Chen, X., Shah, M., Rohrbach, M., ... & Parikh, D. (2019). Pythia-a platform for vision & language research. In SysML Workshop, NeurIPS (Vol. 2018).
More references to be added soon.
Contribution
Please feel free to contribute to the project. You may send a pull-request or drop me an email to talk more. ([email protected])