All Projects → MichiganNLP → vlog_action_recognition

MichiganNLP / vlog_action_recognition

Licence: MIT License
Identifying Visible Actions in Lifestyle Vlogs

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to vlog action recognition

tennis action recognition
Using deep learning to perform action recognition in the sport of tennis.
Stars: ✭ 17 (+30.77%)
Mutual labels:  video-processing, action-recognition
Lintel
A Python module to decode video frames directly, using the FFmpeg C API.
Stars: ✭ 240 (+1746.15%)
Mutual labels:  video-processing, action-recognition
Awesome Action Recognition
A curated list of action recognition and related area resources
Stars: ✭ 3,202 (+24530.77%)
Mutual labels:  video-processing, action-recognition
conv3d-video-action-recognition
My experimentation around action recognition in videos. Contains Keras implementation for C3D network based on original paper "Learning Spatiotemporal Features with 3D Convolutional Networks", Tran et al. and it includes video processing pipelines coded using mPyPl package. Model is being benchmarked on popular UCF101 dataset and achieves result…
Stars: ✭ 50 (+284.62%)
Mutual labels:  video-processing, action-recognition
Actionvlad
ActionVLAD for video action classification (CVPR 2017)
Stars: ✭ 217 (+1569.23%)
Mutual labels:  video-processing, action-recognition
MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Stars: ✭ 38 (+192.31%)
Mutual labels:  video-processing, action-recognition
Image Processing
Image Processing techniques using OpenCV and Python.
Stars: ✭ 112 (+761.54%)
Mutual labels:  video-processing
FunCrop
Video Split Effect: a lots of transition effect in a video.
Stars: ✭ 42 (+223.08%)
Mutual labels:  video-processing
Dataset-REPAIR
REPresentAtion bIas Removal (REPAIR) of datasets
Stars: ✭ 49 (+276.92%)
Mutual labels:  action-recognition
FFCreatorLite
一个基于node.js的轻量极速短视频加工库 A lightweight and fast short video processing library based on node.js
Stars: ✭ 155 (+1092.31%)
Mutual labels:  video-processing
kinect-gesture
基于kinect 的人体 动作识别
Stars: ✭ 129 (+892.31%)
Mutual labels:  action-recognition
DLCV2018SPRING
Deep Learning for Computer Vision (CommE 5052) in NTU
Stars: ✭ 38 (+192.31%)
Mutual labels:  action-recognition
laav
Asynchronous Audio / Video Library for H264 / MJPEG / OPUS / AAC / MP2 encoding, transcoding, recording and streaming from live sources
Stars: ✭ 50 (+284.62%)
Mutual labels:  video-processing
two-stream-fusion-for-action-recognition-in-videos
No description or website provided.
Stars: ✭ 80 (+515.38%)
Mutual labels:  action-recognition
video-summarizer
Summarizes videos into much shorter videos. Ideal for long lecture videos.
Stars: ✭ 92 (+607.69%)
Mutual labels:  video-processing
Acid.Cam.v2.OSX
Acid Cam v2 for macOS distorts video to create art.
Stars: ✭ 91 (+600%)
Mutual labels:  video-processing
pi-asciicam
A live stream ASCII webcam server for Raspberry Pis using websockets, written in go.
Stars: ✭ 18 (+38.46%)
Mutual labels:  video-processing
ShaderView
ShaderView is an Android View that makes it easy to use GLSL shaders for your app. It's the modern way to use shaders for Android instead of RenderScript.
Stars: ✭ 53 (+307.69%)
Mutual labels:  video-processing
cpnet
Learning Video Representations from Correspondence Proposals (CVPR 2019 Oral)
Stars: ✭ 93 (+615.38%)
Mutual labels:  action-recognition
ntu-x
NTU-X, which is an extended version of popular NTU dataset
Stars: ✭ 55 (+323.08%)
Mutual labels:  action-recognition

Identifying Visible Actions in Lifestyle Vlogs

This repository contains the dataset and code for our ACL 2019 paper:

Identifying Visible Actions in Lifestyle Vlogs

Task Description

Example instance

Given a video and its transcript, which human actions are visible in the video?

Miniclips

We provide a Google Drive folder with the raw miniclips.

A miniclip is a short video clip (maximum 1 min) extracted from a YouTube video. We segment the videos into miniclips in order to ease the annotation process. For more details on how the segmentation is performed, see section 3.1 in our paper.

Videos

The video youtube urls from which the miniclips were created can be found here, together with the video transcripts and titles.

Important: You can automatically download your own videos, transcripts and perform the text and movement filtering by running the code described in the Youtube processing README section. If you still need the raw videos, e-mail me.

Data Format

The annotations of the miniclips are available at data/miniclip_action.json. The JSON file contains a dictionary: the keys represent the miniclips (e.g. "4p1_3mini_5.mp4") and the values represent the (action, label) pairs.

The miniclip name is formed by concatenating its YouTube channel, playlist, video and miniclip index. For miniclip "4p1_3mini_5.mp4":

  • 4 = channel index
  • p1 = playlist index (0 or 1) in the channel
  • 3 = video index in the playlist
  • mini_5 = miniclip index in the video

For each miniclip, we store the extracted actions and their corresponding labels:

  • 0 for visible
  • 1 for not visible.

The visibile actions were manually cleaned by removing extra words like: usually, now, always, I, you, then etc. Example format in JSON:

{
  "4p1_3mini_5.mp4": [
    ["smelled it", 1],
    ["used this in my last pudding video", 1],
    ["make it smell nice", 0],
    ["loving them", 1],
    ["using them about a week", 1],
    ["using my favorite cleaner which", 0],
    ["work really really well", 1],
    ["wipe down my counters", 0],
    ["wiping down my barstools", 0],
    ["using the car shammies", 0]
  ]
}

Citation

Please cite the following paper if you find this dataset useful in your research:

@inproceedings{ignat-etal-2019-identifying,
    title = "Identifying Visible Actions in Lifestyle Vlogs",
    author = "Ignat, Oana  and
      Burdick, Laura  and
      Deng, Jia  and
      Mihalcea, Rada",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1643",
    doi = "10.18653/v1/P19-1643",
    pages = "6406--6417"

Run the code

Some small parts of it are still under revision; please let me know here or by e-mail how I can help.

Installation

To download Stanford-postagger-full-2018-10-16 and all the required libraries. You need Python 3 (I have Python 3.6.7), it doesn't work with Python 2. Comment Tensorflow (for cpu) in requirement.txt if you use tensoflow-gpu instead.

sh setup.sh

Data Requirements

Download glove_vectors.txt (pre-trained POS embeddings on Google N-gram corpus using POS info from 5-grams).
Download glove.6B.50d.txt embeddings. Put both of them in data.

Usage

There are 3 main modules: Youtube processing, AMT processing and Classification. The first 2 modules are still under revision. The third module can be used without the first 2 ones, as all the data is accessible from the data folder or Google Drive.

Youtube processing

Needs an youtube downloader API key. Given channel ids - now 10, and playlist ids for each channel ( 2 playlists / channel), it downloads all the videos from each playlist. The code can be found in youtube_preprocessing.

python main_youtube.py

Amazon Mechanical Turk (AMT) processing

Does all the processing related to AMT (read data, spam removal, compute agreement). The code is in amt.

python main_amt.py

Classification

Everything related to classification models, embeddings and features can be found in classify.

Models

The available models are: svm, lstm, elmo, multimodal (video features + elmo embeddings).

To call the methods, for example lstm:

python main_classify.py --do-classify lstm

Extra data

The Extra data consists of: context and POS embeddings, and also concreteness scores for each word in the actions.

You can find the context information for each action in data/dict_context.json: each action is assigned the sentence it is extracted from. The sentences are extracted from the Youtube transcripts, using the Stanford Parser.

You can find both the POS and context embeddings in data/Embeddings. They consist of averaging the surrounding 5 left and right glove50d word embeddings. For future work, we want to use elmo embeddings.

The concreteness dataset from Brysbaert et al. can be find in data folder. Also, the data extracted from the file (just the unigrams and their concreteness scores) is in data/dict_all_concreteness.json.

The concreteness and POS of all the words in the actions is stored in data/dict_action_pos_concreteness.json.

To add these extra features to your model: for example run svm with context and pos embeddings:

python main_classify.py --do-classify svm --add-extra context pos

To run multimodal with concreteness:

python main_classify.py --do-classify multimodal --add-extra concreteness

Video Features

The video features are Inception, C3D and their concatenation. These are found in data/Video/Features. By default, the multimodal model is run with the concatenation of features: inception + c3d.

To run multimodal with inception:

python main_classify.py --do-classify multimodal --type-feat inception
YOLO output

After running YOLOv3 object detector on all the miniclips, all the results are stored here. Copy them in data/Video/YOLO/miniclips_results.

Useful

For all of this data, there is code available to generate your own data also.

Look in main_classify.py parse_args method for the rest of the models and data combinations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].