Hateful Memes Challenge-Team HateDetectron Submissions
This repository contains all the code used at the Hateful Memes Challenge by Facebook AI. There are 2 main Jupyter notebooks where all the job is done and documented:
The first notebook is only for reproducing the results of Phase-2 submissions by the team HateDetectron
. In other words, just loading the final models and getting predictions for the test set. See the end-to-end notebook to have a look at the whole approach in detail: how the models are trained, how the image features are extracted, which datasets are used, etc.
The Hateful Memes Challenge and Data Set is a competition and open source data set designed to measure progress in multimodal vision-and-language classification. About the Competition
Check out the following sources to get more on the challenge:
We are placed the 3rd out of 3.173 participants in total! Competition Results:
See the official Leaderboard here!
The repository consists of the following folders: Repository structure
hyperparameter_sweep/ : where scripts for hyperparameter search are.
get_27_models.py
: iterates through the folders those that were created for hyperparameter search and collects the metrics (ROC-AUC, accuracy) on the 'dev_unseen' set and stores them in a pd.DataFrame. Then, it sorts the models according to AUROC metric and moves the best 27 models into a generated foldermajority_voting_models/
remove_unused_file.py
: removes unused files, e.g. old checkpoints, to free the disk.sweep.py
: defines the hyperparameters and starts the process by calling/sweep.sh
sweep.sh
: is the mmf cli command to do training on a defined dataset, parameters, etc.
notebooks/ : where Jupyter notebooks are stored.
[GitHub]end2end_process.ipynb
: presents the whole approach end-to-end: expanding data, image feature extraction, hyperparameter search, fine-tuning, majority voting.[GitHub]reproduce_submissions.ipynb
: loads our fine-tuned (final) models and generates predictions.[GitHub]label_memotion.ipynb
: a notebook which uses/utils/label_memotion.py
to label memes from Memotion and to save it in an appropriate form.[GitHub]simple_model.ipynb
: includes a simple multimodal model implementation, also known as 'mid-level concat fusion'. We train the model and generate submission for the challenge test set.[GitHub]benchmarks.ipynb
: reproduces the benchmark results.
utils/ : where some helper scripts are stored, such as labeling Memotion Dataset and merging the two datasets.
concat_memotion-hm.py
: concatenates the labeled memotion samples and the hateful memes samples and saves them in a newtrain.jsonl
file.generate_submission.sh
: generates predictions for 'test_unseen' set (phase 2 test set).label_memotion.jsonl
: presents the memes labeled by us from memotion dataset.label_memotion.py
: is the script for labelling Memotion Dataset. The script iterates over the samples in Memotion and labeler labels the samples by entering 1 or 0 on the keyboard. The labels and the sample metadata is saved at the end as alabel_memotion.jsonl
.