All Projects → Chinmay26 → Near-Duplicate-Video-Detection

Chinmay26 / Near-Duplicate-Video-Detection

Licence: other
Detecting near-duplicate videos by aggregating features from intermediate CNN layers

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Near-Duplicate-Video-Detection

videohash
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
Stars: ✭ 155 (+72.22%)
Mutual labels:  duplicate-detection, duplicate-videos
python cv AI ML
用python做计算机视觉,人工智能,机器学习,深度学习等
Stars: ✭ 73 (-18.89%)
Mutual labels:  alexnet
vgg-caffe
The VGG16 and VGG19 networks in caffe with jupyter notebook
Stars: ✭ 45 (-50%)
Mutual labels:  vgg16
bugrepo
A collection of publicly available bug reports
Stars: ✭ 93 (+3.33%)
Mutual labels:  duplicate-detection
DeepNetModel
记录每一个常用的深度模型结构的特点(图和代码)
Stars: ✭ 25 (-72.22%)
Mutual labels:  alexnet
SuperResolution-with-Perceptual-loss
In this project I have used an convolution neural network with perceptual loss to convert low res image into high res image.
Stars: ✭ 17 (-81.11%)
Mutual labels:  vgg16
image space
Interactive Image similarity and Visual Search and Retrieval application
Stars: ✭ 91 (+1.11%)
Mutual labels:  alexnet
AU Recognition
AU_Recognition based on CKPlus/CK database
Stars: ✭ 21 (-76.67%)
Mutual labels:  alexnet
twly
Wanna get DRY? Static analysis tool for detecting repeat code.
Stars: ✭ 42 (-53.33%)
Mutual labels:  duplicate-detection
tensorflow alexnet classify
tensorflow_alexnet_classify (details http://www.cnblogs.com/vipyoumay/p/7686230.html)
Stars: ✭ 102 (+13.33%)
Mutual labels:  alexnet
visual-search
A toy project for visual search, based on deep learning.
Stars: ✭ 45 (-50%)
Mutual labels:  alexnet
deep-dream-pytorch
Pytorch implementation of DeepDream on VGG16 Network
Stars: ✭ 46 (-48.89%)
Mutual labels:  vgg16
miopen-benchmark
benchmarking miopen
Stars: ✭ 17 (-81.11%)
Mutual labels:  alexnet
Skin Lesions Classification DCNNs
Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification
Stars: ✭ 47 (-47.78%)
Mutual labels:  vgg16
Identifying-Clothing-Attributes
Pretrained VGG-16 network as feature extractor for Object Recognition (Python, Keras, Scikit-Learn)
Stars: ✭ 22 (-75.56%)
Mutual labels:  vgg16
vgg16 batchnorm
VGG16 architecture with BatchNorm
Stars: ✭ 14 (-84.44%)
Mutual labels:  vgg16
Artistic-Style-Transfer-using-Keras-Tensorflow
Art to Image Style Transfer using Keras and Tensorflow.
Stars: ✭ 22 (-75.56%)
Mutual labels:  vgg16
yadf
Yet Another Dupes Finder
Stars: ✭ 32 (-64.44%)
Mutual labels:  duplicate-detection
alexnet-architecture.tensorflow
Unofficial TensorFlow implementation of "AlexNet" architecture.
Stars: ✭ 15 (-83.33%)
Mutual labels:  alexnet
snowman
Welcome to Snowman App – a Data Matching Benchmark Platform.
Stars: ✭ 25 (-72.22%)
Mutual labels:  duplicate-detection

Near-Duplicate-Video-Detection (NDVD)

With the explosion of social networks, video content has risen exponentially over the last few years. Currently, Youtube reports 500h of video content being uploaded every minute. A Cisco forecast estimates that videos will constitute 80% of internet traffic by 2019. According to a recent study, 31.7% of videos on Youtube are duplicates with duplicates occupying 24% of storage space. It has become essential that we build robust systems to detect and purge these duplicates.

Near duplicate videos are a bigger class of problems which cover duplicate videos. Near duplicates videos are videos which have same visual content but differ in format ( scale, transformation, encoding etc ) or have small content modifications ( color / lightning / small text superimposition etc).

Problem Statement

Given a corpus of videos, identify if a query video is a near duplicate of an existing video in corpus.

Dataset

The standard for NDVD tasks is the CC_WEB_VIDEO dataset. The dataset contains a total of 13,129 videos and 24 query videos. The dataset already provides 398,008 keyframes extracted from the videos by shot boundary detection method.

Approach

The problem is tackled by using a bag-of visual words model for each video. This is done in following phases

  • Feature Extraction:

    Feature Extraction

Convolution Neural Networks are used to extract features from the video keyframes. Pretrained CNN networks have been proven to work well on many vision tasks such as Classification, Segmentation etc. Here, pretrained weights of AlexNet is used to extract feature vectors. Each video keyframe is forward passed through the intermediate layers of AlexNet to get frame feature vector. Max-pooling is applied on the intermediate feature maps to extract one single value. Each frame is then represented by a feature vector of 1376 dimensions. The video-level feature vector is calculated by concatenating over the individual keyframe feature vectors.

  • Visual Codebook Generation:

    Kmeans Clustering

A visual codebook is generated using the above feature vectors. Online Mini-batch K means is used to generate the codebook clusters. A sample of random 100K frames are used for visual codebook generation. K = 1000 gives best results from experiments.

  • Video level histogram:

    Video Histogram

For each keyframe in a video, the Nearest cluster is identified to generate a keyframe level histogram. Correspondingly, a video level histogram is generated by summing oveer the individual keyframe histograms.

  • Similarity:

    Cosine Similarity

video similarity is inferred by calculating cosine similarity using tf-idf (term frequency / inverse-document frequency) between the two video histograms. To reduce the number of comparisons, we construct an inverted index to identify the videos which have a common visual word.

Results

Using the above approach for the 24 query videos, the mAP (mean average precision) score was 0.951. The state of art techniques achieve a mAP of 0.98. However, note that the above metric is not very well deined compared to Classification / Segmentation tasks. Defining visually similar videos & ranking them is subject to human decision making and not well defined.

Future Work

  1. Scalability: How can we extend this to handle videos / keyframes of 1M+ videos? K-means clustering suffers from curse of dimensionality.
  2. Robustness: Run detailed analysis of type of near-duplicate videos (scale changes, small frame changes etc) to report its robustness against diffetent types.
  3. Metric learning: Use metric learning to calculate the similarity metric between the near-duplicate videos.
  4. Near Duplicate Video Retrieval: Retreive a ranked list of near-duplicate videos in order of similarity. This can be extende to retrieve visually similar videos.

References:

  1. Near-duplicate Video Detection using CNN intermediate features
  2. NDVD using metric learning
  3. TF-IDF tutorial
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].