All Projects → davide-coccomini → Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

davide-coccomini / Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Licence: other
Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" available on Arxiv and was submitted to ICIAP 2021.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Deep-Fakes
No description or website provided.
Stars: ✭ 88 (+125.64%)
Mutual labels:  deepfakes, deepfake-detection, deepfake-videos, deepfakes-classification
FakeFynder-Hackathon-for-Good-2019
This repository contains our POC for a website which can easily check videos for manipulated areas. It was part of the Hackathon for Good in the Hague, 2019.
Stars: ✭ 23 (-41.03%)
Mutual labels:  deepfakes, faceforensics, deepfake-detection
Awesome-Deepfakes-Detection
A list of tools, papers and code related to Deepfake Detection.
Stars: ✭ 30 (-23.08%)
Mutual labels:  deepfakes, deepfake-detection
awesome-Deepfakes
All about Deepfakes & Detection
Stars: ✭ 107 (+174.36%)
Mutual labels:  deepfakes, deepfake-detection
InterpretDL
InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。
Stars: ✭ 121 (+210.26%)
Mutual labels:  vision-transformer
TransMorph Transformer for Medical Image Registration
TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)
Stars: ✭ 130 (+233.33%)
Mutual labels:  vision-transformer
ViT-V-Net for 3D Image Registration Pytorch
Vision Transformer for 3D medical image registration (Pytorch).
Stars: ✭ 169 (+333.33%)
Mutual labels:  vision-transformer
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+7151.28%)
Mutual labels:  vision-transformer
MXNet-EfficientNet
A Gluon Implement of EfficientNet
Stars: ✭ 12 (-69.23%)
Mutual labels:  efficientnet
koclip
KoCLIP: Korean port of OpenAI CLIP, in Flax
Stars: ✭ 80 (+105.13%)
Mutual labels:  vision-transformer
detectron2 backbone
detectron2 backbone: resnet18, efficientnet, hrnet, mobilenet v2, resnest, bifpn
Stars: ✭ 171 (+338.46%)
Mutual labels:  efficientnet
YOLOS
You Only Look at One Sequence (NeurIPS 2021)
Stars: ✭ 612 (+1469.23%)
Mutual labels:  vision-transformer
visualization
a collection of visualization function
Stars: ✭ 189 (+384.62%)
Mutual labels:  vision-transformer
Ensemble-of-Multi-Scale-CNN-for-Dermatoscopy-Classification
Fully supervised binary classification of skin lesions from dermatoscopic images using an ensemble of diverse CNN architectures (EfficientNet-B6, Inception-V3, SEResNeXt-101, SENet-154, DenseNet-169) with multi-scale input.
Stars: ✭ 25 (-35.9%)
Mutual labels:  efficientnet
video-download-cut-split
A script for gathering facesets from online videos
Stars: ✭ 25 (-35.9%)
Mutual labels:  deepfakes
efficientnet-jax
EfficientNet, MobileNetV3, MobileNetV2, MixNet, etc in JAX w/ Flax Linen and Objax
Stars: ✭ 114 (+192.31%)
Mutual labels:  efficientnet
image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (+79.49%)
Mutual labels:  vision-transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+3915.38%)
Mutual labels:  vision-transformer
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+2005.13%)
Mutual labels:  vision-transformer
TensorMONK
A collection of deep learning models (PyTorch implemtation)
Stars: ✭ 21 (-46.15%)
Mutual labels:  efficientnet

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" available on Arxiv and was submitted to ICIAP 2021 [Pre-print PDF]. Using this repository it is possible to train and test the two main architectures presented in the paper, Efficient Vision Transformers and Cross Efficient Vision Transformers, for video deepfake detection. The architectures exploits internally the EfficientNet-Pytorch and ViT-Pytorch repositories.

Setup

Clone the repository and move into it:

git clone https://github.com/davide-coccomini/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection.git

cd Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Setup Python environment using conda:

conda env create --file environment.yml
conda activate deepfakes
export PYTHONPATH=.

Get the data

Download and extract the dataset you want to use from:

Preprocess the data

The preprocessing phase is based on Selim Seferbekov implementation.

In order to perform deepfake detection it is necessary to first identify and extract faces from all the videos in the dataset. Detect the faces inside the videos:

cd preprocessing
python3 detect_faces.py --data_path "path/to/videos"

By default the consideted dataset structure will be the one of DFDC but you can customize it with the following parameter:

  • --dataset: Dataset (DFDC / FACEFORENSICS)

The extracted boxes will be saved inside the "path/to/videos/boxes" folder. In order to get the best possible result, make sure that at least one face is identified in each video. If not, you can reduce the threshold values of the MTCNN on line 38 of face_detector.py and run the command again until at least one detection occurs. At the end of the execution of face_detector.py an error message will appear if the detector was unable to find faces inside some videos.

If you want to manually check that at least one face has been identified in each video, make sure that the number of files in the "boxes" folder is equal to the number of videos. To count the files in the folder use:

cd path/to/videos/boxes
ls | wc -l

Extract the detected faces obtaining the images:

python3 extract_crops.py --data_path "path/to/videos" --output_path "path/to/output"

By default the consideted dataset structure will be the one of DFDC but you can customize it with the following parameter:

  • --dataset: Dataset (DFDC / FACEFORENSICS)

Repeat detection and extraction for all the different parts of your dataset.

After extracting all the faces from the videos in your dataset, organise the "dataset" folder as follows:

- dataset
    - training_set
        - Deepfakes
            - video_name_0
                0_0.png
                1_0.png
                2_0.png
                ...
                N_0.png
            ...
            - video_name_K
                0_0.png
                1_0.png
                2_0.png
                ...
                M_0.png
        - DFDC
        - Face2Face
        - FaceShifter
        - FaceSwap
        - NeuralTextures
        - Original
    - validation_set
        ...
            ...
                ...
                ...
    - test_set
        ...
            ...
                ...
                ...

We suggest to exploit the --output_path parameter when executing extract_crops.py to build the folders structure properly.

Evaluate

Move into the choosen architecture folder you want to evaluate and download the pre-trained model:

(Efficient ViT)

cd efficient-vit
wget http://datino.isti.cnr.it/efficientvit_deepfake/efficient_vit.pth

(Cross Efficient ViT)

cd cross-efficient-vit
wget http://datino.isti.cnr.it/efficientvit_deepfake/cross_efficient_vit.pth

If you are unable to use the previous urls you can download the weights from Google Drive.

Then, issue the following commands for evaluating a given model giving the pre-trained model path and the configuration file available in the config directory:

python3 test.py --model_path "pretrained_models/[model]" --config "configs/architecture.yaml"

By default the command will test on DFDC dataset but you can customize the following parameters for both the architectures:

  • --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|DFDC)
  • --max_videos: Maximum number of videos to use for training (default: all)
  • --workers: Number of data loader workers (default: 10)
  • --frames_per_video: Number of equidistant frames for each video (default: 30)
  • --batch_size: Prediction Batch Size (default: 32)

To evaluate a customized model trained from scratch with a different architecture you need to edit the configs/architecture.yaml file.

Train

Only for DFDC dataset, prepare the metadata moving all of them (by default inside dfdc_train_part_X folders) into a subfolder:

mkdir data/metadata
cd path/to/videos/training_set
mv **/metadata.json ../../../data/metadata

In order to train the model using our architectures configurations use:

(Efficient ViT)

cd efficient-vit
python3 train.py --config configs/architecture.yaml

(Cross Efficient ViT)

cd cross-efficient-vit
python3 train.py --config configs/architecture.yaml

By default the commands will train on DFDC dataset but you can customize the following parameters for both the architectures:

  • --num_epochs: Number of training epochs (default: 300)
  • --workers: Number of data loader workers (default: 10)
  • --resume: Path to latest checkpoint (default: none)
  • --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|All) (default: All)
  • --max_videos: Maximum number of videos to use for training (default: all)
  • --patience: How many epochs wait before stopping for validation loss not improving (default: 5)

Only for the Efficient ViT model it's also possible to custom the patch extractor and use different versions of EfficientNet (only B0 and B7) by adding the following parameter:

  • --efficient_net: Which EfficientNet version to use (0 or 7, default: 0)

Reference

@misc{coccomini2021combining,
      title={Combining EfficientNet and Vision Transformers for Video Deepfake Detection}, 
      author={Davide Coccomini and Nicola Messina and Claudio Gennaro and Fabrizio Falchi},
      year={2021},
      eprint={2107.02612},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].