Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → HuaizhengZhang → Awsome Deep Learning For Video Analysis

HuaizhengZhang / Awsome Deep Learning For Video Analysis

Licence: mit

Papers, code and datasets about deep learning and multi-modal learning for video analysis

Labels

deep-learning machine-learning paper

Projects that are alternatives of or similar to Awsome Deep Learning For Video Analysis

Weightnorm

Example code for Weight Normalization, from "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks"

Stars: ✭ 347 (-23.23%)

Mutual labels: paper

Optnet

OptNet: Differentiable Optimization as a Layer in Neural Networks

Stars: ✭ 361 (-20.13%)

Mutual labels: paper

Knowledge Distillation Papers

knowledge distillation papers

Stars: ✭ 422 (-6.64%)

Mutual labels: paper

Action Recognition Visual Attention

Action recognition using soft attention based deep recurrent neural networks

Stars: ✭ 350 (-22.57%)

Mutual labels: paper

Benchmark results

Visual Tracking Paper List

Stars: ✭ 3,672 (+712.39%)

Mutual labels: paper

Musicgenreclassification

Classify music genre from a 10 second sound stream using a Neural Network.

Stars: ✭ 377 (-16.59%)

Mutual labels: paper

Akarin

Akarin is a powerful (not yet) server software from the 'new dimension'

Stars: ✭ 332 (-26.55%)

Mutual labels: paper

Cvpr2021 Papers With Code

CVPR 2021 论文和开源项目合集

Stars: ✭ 7,138 (+1479.2%)

Mutual labels: paper

Vsepp

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Stars: ✭ 354 (-21.68%)

Mutual labels: paper

Yatopia

The Most Powerful and Feature Rich Minecraft Server Software!

Stars: ✭ 408 (-9.73%)

Mutual labels: paper

Inception V4

Inception-v4, Inception - Resnet-v1 and v2 Architectures in Keras

Stars: ✭ 350 (-22.57%)

Mutual labels: paper

Healthcare ml

A curated list of ML|NLP resources for healthcare.

Stars: ✭ 351 (-22.35%)

Mutual labels: paper

Ipfs

Peer-to-peer hypermedia protocol

Stars: ✭ 20,128 (+4353.1%)

Mutual labels: paper

Bestofml

The best resources around Machine Learning

Stars: ✭ 349 (-22.79%)

Mutual labels: paper

Paper For Mac

🖥 Unofficial Dropbox Paper client for macOS

Stars: ✭ 427 (-5.53%)

Mutual labels: paper

Cyclegan

Tensorflow implementation of CycleGAN

Stars: ✭ 348 (-23.01%)

Mutual labels: paper

Slimefun4

Slimefun 4 - A unique Spigot/Paper plugin that looks and feels like a modpack. We've been giving you backpacks, jetpacks, reactors and much more since 2013.

Stars: ✭ 369 (-18.36%)

Mutual labels: paper

Jukebox

Code for the paper "Jukebox: A Generative Model for Music"

Stars: ✭ 4,863 (+975.88%)

Mutual labels: paper

Research Method

论文写作与资料分享

Stars: ✭ 436 (-3.54%)

Mutual labels: paper

Learning Deep Learning

Paper reading notes on Deep Learning and Machine Learning

Stars: ✭ 388 (-14.16%)

Mutual labels: paper

View All Similar Projects ➔

Awesome Deep Learning for Video Analysis

This repo contains some video analysis, especiall multimodal learning for video analysis, research. I summarize some papers and categorize them by myself. You are kindly invited to pull requests!

I pay more attention on multimodal learning related work and some research like action recognition is not the main scope of this repo.

Tutorial

Audio-visual paper list [GitHub]
CVPR2019:Multi-Modal Learning from Videos [Project Page]
awesome-multimodal-ml: Reading list for research topics in multimodal machine learning [GitHub]
A Comprehensive Study of Deep Video Action Recognition [Paper]

Dataset:

I find a very interesting website

Sortable and searchable compilation of video dataset [Video Dataset Overview]

AVA dataset: AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. [Project]
PyVideoResearch: A repositsory of common methods, datasets, and tasks for video research [GitHub]
How2 Dataset: How2: A Large-scale Dataset for Multimodal Language Understanding [Paper] [GitHub]
Moments in Time Dataset A large-scale dataset for recognizing and understanding action in videos [Dataset] [Pretrained Model]
Pretrained image and video models for Pytorch [GitHub]
Youtube-8M, new segment task! [Blog]

Tool

X-Temporal is an open source video understanding codebase from Sensetime X-Lab group that provides state-of-the-art video classification models [GitHub]
facebookresearch/ClassyVision: An end-to-end PyTorch framework for image and video classification [GitHub]
MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines [GitHub]
This document describes the collection of utilities created for Detection and Classification of Acoustic Scenes and Events (DCASE). [GitHub]
Easy to use video deep features extractor [GitHub]
Video Platform for Action Recognition and Object Detection in Pytorch [GitHub]
FAIR Self-Supervised Learning Integrated Multi-modal Environment (SSLIME) [GitHub]

Paper:

Video Classification (Spatiotemporal Features)

Learnable pooling with Context Gating for video classification [Paper] [GitHub]
TSM: Temporal Shift Module for Efficient Video Understanding [Paper] [GitHub]
Long-Term Feature Banks for Detailed Video Understanding (CVPR2019) [Paper][GitHub]
Deep Learning for Video Classification and Captioning [Paper]
Large-scale Video Classification with Convolutional Neural Networks [Paper]
Learning Spatiotemporal Features with 3D Convolutional Networks [Paper]
Two-Stream Convolutional Networks for Action Recognition in Videos [Paper]
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors [Paper]
Non-local neural networks [Paper] [GitHub]
- Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. (CVPR 2018)
- Summary:
Learning Correspondence from the Cycle-consistency of Time [Paper] [GitHub]
- Xiaolong Wang and Allan Jabri and Alexei A. Efros (CVPR2019)
- Summary:
3D ConvNets in Pytorch [GitHub]

Multimodal For video Analysis

Awsome list for multimodal learning [GitHub]
VideoBERT: A Joint Model for Video and Language Representation Learning [Paper]
AENet: Learning Deep Audio Features for Video Analysis [Paper] [GitHub]
Look, Listen and Learn [Paper]
Objects that Sound [Paper]
Learning to Separate Object Sounds by Watching Unlabeled Video [Paper]
- Gao, Ruohan, Rogerio Feris, and Kristen Grauman. arXiv preprint arXiv:1804.01665 2018
Ambient Sound Provides Supervision for Visual Learning [Paper]
- Owens, Andrew, Jiajun Wu, Josh H. McDermott, William T. Freeman, and Antonio Torralba. ECCV 2016
- Summary: unsupervised learning
Learning Cross-Modal Temporal Representations from Unlabeled Videos [Google Blog]

Video Moment Localization

Localizing Moments in Video with Natural Language [Paper][GitHub]

Video Retrieval

Use What You Have: Video retrieval using representations from collaborative experts [GitHub]
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips [Project Website]
- Miech, Antoine, et al. (arXiv:1906.03327 (2019))
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data." [Paper][GitHub]
- Miech, Antoine, Ivan Laptev, and Josef Sivic. ECCV 2018
- Summary: combine multi-modality information, calculate similarities and weight different similarities
Cross-Modal and Hierarchical Modeling of Video and Text [Paper]
- B. Zhang * , H. Hu * , F. Sha ECCV 2018
- Summary: learning the intrinsic hierarchical structures of both videos and texts. (Make video and text closer, make videos closer and make text closer)
A dataset for movie description. [Paper]
- Rohrbach, Anna, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. CVPR 2015
- Summary: dataset paper
Web-scale Multimedia Search for Internet Video Content. [Thesis]
- Lu Jiang
- Summary: amazing thesis

Video Advertisement (Also include some image advertisement paper)

Automatic understanding of image and video advertisements [Paper] [Project]
- Hussain, Zaeem, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. CVPR 2017
- Summary: Image and video advertisement datasets and baselines.
Multimodal Representation of Advertisements Using Segment-level Autoencoders [Paper] [GitHub]
- Somandepalli, Krishna, Victor Martinez, Naveen Kumar, and Shrikanth Narayanan. ICMI 2018
- Summary: video and audio features to understand whether video is funny or not.
Story Understanding in Video Advertisements. [Paper] [GitHub]
- Keren Ye, Kyle Buettner, Adriana Kovashka BMVC 2018
- Summary: Combine multiple features including climax, audio and so on to analyze video ads.
ADVISE: Symbolism and External Knowledge for Decoding Advertisements. [Paper] [GitHub]
- Keren Ye and Adriana Kovashka. (ECCV2018)
- Summary: action-reason statement for advertisement. Many pre-trained models are as prior knowledge. SSD, DenseCAP and GloVe.

Visual Commonsense Reasoning

From Recognition to Cognition: Visual Commonsense Reasoning [Paper] [Project Website]
- Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi (CVPR2019)
- Summary: First dataset paper. Use BERT and fastrcnn as the baseline

Video Highlight Prediction

Video highlight prediction using audience chat reactions
- Fu, Cheng-Yang, Joon Lee, Mohit Bansal, and Alexander C. Berg. (EMNLP 2017)

Object Tracking

SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask. [GitHub]

Audio-Visual Dialog

Audio-Visual Scene-Aware Dialog [GitHub]
- Alamri, Huda, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa et al.
- arXiv preprint arXiv:1901.09107 (2019)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 452

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

HuaizhengZhang / Awsome Deep Learning For Video Analysis

Labels

Projects that are alternatives of or similar to Awsome Deep Learning For Video Analysis

Awesome Deep Learning for Video Analysis

Contents

Video

Tutorial

Dataset:

Sortable and searchable compilation of video dataset [Video Dataset Overview]

Tool

Paper:

Video Classification (Spatiotemporal Features)

Multimodal For video Analysis

Video Moment Localization

Video Retrieval

Video Advertisement (Also include some image advertisement paper)

Visual Commonsense Reasoning

Video Highlight Prediction

Object Tracking

Audio-Visual Dialog