All Projects → HuaizhengZhang → Awsome Deep Learning For Video Analysis

HuaizhengZhang / Awsome Deep Learning For Video Analysis

Licence: mit
Papers, code and datasets about deep learning and multi-modal learning for video analysis

Projects that are alternatives of or similar to Awsome Deep Learning For Video Analysis

Weightnorm
Example code for Weight Normalization, from "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks"
Stars: ✭ 347 (-23.23%)
Mutual labels:  paper
Optnet
OptNet: Differentiable Optimization as a Layer in Neural Networks
Stars: ✭ 361 (-20.13%)
Mutual labels:  paper
Knowledge Distillation Papers
knowledge distillation papers
Stars: ✭ 422 (-6.64%)
Mutual labels:  paper
Action Recognition Visual Attention
Action recognition using soft attention based deep recurrent neural networks
Stars: ✭ 350 (-22.57%)
Mutual labels:  paper
Benchmark results
Visual Tracking Paper List
Stars: ✭ 3,672 (+712.39%)
Mutual labels:  paper
Musicgenreclassification
Classify music genre from a 10 second sound stream using a Neural Network.
Stars: ✭ 377 (-16.59%)
Mutual labels:  paper
Akarin
Akarin is a powerful (not yet) server software from the 'new dimension'
Stars: ✭ 332 (-26.55%)
Mutual labels:  paper
Cvpr2021 Papers With Code
CVPR 2021 论文和开源项目合集
Stars: ✭ 7,138 (+1479.2%)
Mutual labels:  paper
Vsepp
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
Stars: ✭ 354 (-21.68%)
Mutual labels:  paper
Yatopia
The Most Powerful and Feature Rich Minecraft Server Software!
Stars: ✭ 408 (-9.73%)
Mutual labels:  paper
Inception V4
Inception-v4, Inception - Resnet-v1 and v2 Architectures in Keras
Stars: ✭ 350 (-22.57%)
Mutual labels:  paper
Healthcare ml
A curated list of ML|NLP resources for healthcare.
Stars: ✭ 351 (-22.35%)
Mutual labels:  paper
Ipfs
Peer-to-peer hypermedia protocol
Stars: ✭ 20,128 (+4353.1%)
Mutual labels:  paper
Bestofml
The best resources around Machine Learning
Stars: ✭ 349 (-22.79%)
Mutual labels:  paper
Paper For Mac
🖥 Unofficial Dropbox Paper client for macOS
Stars: ✭ 427 (-5.53%)
Mutual labels:  paper
Cyclegan
Tensorflow implementation of CycleGAN
Stars: ✭ 348 (-23.01%)
Mutual labels:  paper
Slimefun4
Slimefun 4 - A unique Spigot/Paper plugin that looks and feels like a modpack. We've been giving you backpacks, jetpacks, reactors and much more since 2013.
Stars: ✭ 369 (-18.36%)
Mutual labels:  paper
Jukebox
Code for the paper "Jukebox: A Generative Model for Music"
Stars: ✭ 4,863 (+975.88%)
Mutual labels:  paper
Research Method
论文写作与资料分享
Stars: ✭ 436 (-3.54%)
Mutual labels:  paper
Learning Deep Learning
Paper reading notes on Deep Learning and Machine Learning
Stars: ✭ 388 (-14.16%)
Mutual labels:  paper

Maintenance Awesome GitHub

Awesome Deep Learning for Video Analysis

This repo contains some video analysis, especiall multimodal learning for video analysis, research. I summarize some papers and categorize them by myself. You are kindly invited to pull requests!

I pay more attention on multimodal learning related work and some research like action recognition is not the main scope of this repo.

Contents

Video

Tutorial

  • Audio-visual paper list [GitHub]
  • CVPR2019:Multi-Modal Learning from Videos [Project Page]
  • awesome-multimodal-ml: Reading list for research topics in multimodal machine learning [GitHub]
  • A Comprehensive Study of Deep Video Action Recognition [Paper]

Dataset:

I find a very interesting website

Sortable and searchable compilation of video dataset [Video Dataset Overview]

  • AVA dataset: AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. [Project]
  • PyVideoResearch: A repositsory of common methods, datasets, and tasks for video research [GitHub]
  • How2 Dataset: How2: A Large-scale Dataset for Multimodal Language Understanding [Paper] [GitHub]
  • Moments in Time Dataset A large-scale dataset for recognizing and understanding action in videos [Dataset] [Pretrained Model]
  • Pretrained image and video models for Pytorch [GitHub]
  • Youtube-8M, new segment task! [Blog]

Tool

  • X-Temporal is an open source video understanding codebase from Sensetime X-Lab group that provides state-of-the-art video classification models [GitHub]
  • facebookresearch/ClassyVision: An end-to-end PyTorch framework for image and video classification [GitHub]
  • MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines [GitHub]
  • This document describes the collection of utilities created for Detection and Classification of Acoustic Scenes and Events (DCASE). [GitHub]
  • Easy to use video deep features extractor [GitHub]
  • Video Platform for Action Recognition and Object Detection in Pytorch [GitHub]
  • FAIR Self-Supervised Learning Integrated Multi-modal Environment (SSLIME) [GitHub]

Paper:

Video Classification (Spatiotemporal Features)

  • Learnable pooling with Context Gating for video classification [Paper] [GitHub]
  • TSM: Temporal Shift Module for Efficient Video Understanding [Paper] [GitHub]
  • Long-Term Feature Banks for Detailed Video Understanding (CVPR2019) [Paper][GitHub]
  • Deep Learning for Video Classification and Captioning [Paper]
  • Large-scale Video Classification with Convolutional Neural Networks [Paper]
  • Learning Spatiotemporal Features with 3D Convolutional Networks [Paper]
  • Two-Stream Convolutional Networks for Action Recognition in Videos [Paper]
  • Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [Paper]
  • Non-local neural networks [Paper] [GitHub]
    • Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. (CVPR 2018)
    • Summary:
  • Learning Correspondence from the Cycle-consistency of Time [Paper] [GitHub]
    • Xiaolong Wang and Allan Jabri and Alexei A. Efros (CVPR2019)
    • Summary:
  • 3D ConvNets in Pytorch [GitHub]

Multimodal For video Analysis

  • Awsome list for multimodal learning [GitHub]
  • VideoBERT: A Joint Model for Video and Language Representation Learning [Paper]
  • AENet: Learning Deep Audio Features for Video Analysis [Paper] [GitHub]
  • Look, Listen and Learn [Paper]
  • Objects that Sound [Paper]
  • Learning to Separate Object Sounds by Watching Unlabeled Video [Paper]
    • Gao, Ruohan, Rogerio Feris, and Kristen Grauman. arXiv preprint arXiv:1804.01665 2018
  • Ambient Sound Provides Supervision for Visual Learning [Paper]
    • Owens, Andrew, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. ECCV 2016
    • Summary: unsupervised learning
  • Learning Cross-Modal Temporal Representations from Unlabeled Videos [Google Blog]

Video Moment Localization

Video Retrieval

  • Use What You Have: Video retrieval using representations from collaborative experts [GitHub]
  • HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips [Project Website]
    • Miech, Antoine, et al. (arXiv:1906.03327 (2019))
  • Learning a Text-Video Embedding from Incomplete and Heterogeneous Data." [Paper][GitHub]
    • Miech, Antoine, Ivan Laptev, and Josef Sivic. ECCV 2018
    • Summary: combine multi-modality information, calculate similarities and weight different similarities
  • Cross-Modal and Hierarchical Modeling of Video and Text [Paper]
    • B. Zhang * , H. Hu * , F. Sha ECCV 2018
    • Summary: learning the intrinsic hierarchical structures of both videos and texts. (Make video and text closer, make videos closer and make text closer)
  • A dataset for movie description. [Paper]
    • Rohrbach, Anna, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. CVPR 2015
    • Summary: dataset paper
  • Web-scale Multimedia Search for Internet Video Content. [Thesis]
    • Lu Jiang
    • Summary: amazing thesis

Video Advertisement (Also include some image advertisement paper)

  • Automatic understanding of image and video advertisements [Paper] [Project]
    • Hussain, Zaeem, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. CVPR 2017
    • Summary: Image and video advertisement datasets and baselines.
  • Multimodal Representation of Advertisements Using Segment-level Autoencoders [Paper] [GitHub]
    • Somandepalli, Krishna, Victor Martinez, Naveen Kumar, and Shrikanth Narayanan. ICMI 2018
    • Summary: video and audio features to understand whether video is funny or not.
  • Story Understanding in Video Advertisements. [Paper] [GitHub]
    • Keren Ye, Kyle Buettner, Adriana Kovashka BMVC 2018
    • Summary: Combine multiple features including climax, audio and so on to analyze video ads.
  • ADVISE: Symbolism and External Knowledge for Decoding Advertisements. [Paper] [GitHub]
    • Keren Ye and Adriana Kovashka. (ECCV2018)
    • Summary: action-reason statement for advertisement. Many pre-trained models are as prior knowledge. SSD, DenseCAP and GloVe.

Visual Commonsense Reasoning

  • From Recognition to Cognition: Visual Commonsense Reasoning [Paper] [Project Website]
    • Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi (CVPR2019)
    • Summary: First dataset paper. Use BERT and fastrcnn as the baseline

Video Highlight Prediction

  • Video highlight prediction using audience chat reactions
    • Fu, Cheng-Yang, Joon Lee, Mohit Bansal, and Alexander C. Berg. (EMNLP 2017)

Object Tracking

  • SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask. [GitHub]

Audio-Visual Dialog

  • Audio-Visual Scene-Aware Dialog [GitHub]
    • Alamri, Huda, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa et al.
    • arXiv preprint arXiv:1901.09107 (2019)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].