All Projects → hthuwal → Sign Language Gesture Recognition

hthuwal / Sign Language Gesture Recognition

Licence: mit
Sign Language Gesture Recognition From Video Sequences Using RNN And CNN

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sign Language Gesture Recognition

Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+1185.51%)
Mutual labels:  cnn, lstm, rnn
Multi Class Text Classification Cnn Rnn
Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.
Stars: ✭ 570 (+166.36%)
Mutual labels:  cnn, lstm, rnn
Unet Zoo
A collection of UNet and hybrid architectures in PyTorch for 2D and 3D Biomedical Image segmentation
Stars: ✭ 302 (+41.12%)
Mutual labels:  cnn, lstm, rnn
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+1399.53%)
Mutual labels:  cnn, lstm, rnn
Deepseqslam
The Official Deep Learning Framework for Route-based Place Recognition
Stars: ✭ 49 (-77.1%)
Mutual labels:  cnn, lstm, rnn
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+13.55%)
Mutual labels:  cnn, lstm, rnn
Video Classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Stars: ✭ 543 (+153.74%)
Mutual labels:  cnn, lstm, rnn
Basicocr
BasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。
Stars: ✭ 336 (+57.01%)
Mutual labels:  cnn, lstm, rnn
Neural Networks
All about Neural Networks!
Stars: ✭ 34 (-84.11%)
Mutual labels:  cnn, lstm, rnn
Rnn Theano
使用Theano实现的一些RNN代码,包括最基本的RNN,LSTM,以及部分Attention模型,如论文MLSTM等
Stars: ✭ 31 (-85.51%)
Mutual labels:  cnn, lstm, rnn
Eeg Dl
A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.
Stars: ✭ 165 (-22.9%)
Mutual labels:  cnn, lstm, rnn
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-55.14%)
Mutual labels:  cnn, lstm, rnn
Natural Language Processing With Tensorflow
Natural Language Processing with TensorFlow, published by Packt
Stars: ✭ 222 (+3.74%)
Mutual labels:  cnn, lstm, rnn
Lightnet
Efficient, transparent deep learning in hundreds of lines of code.
Stars: ✭ 243 (+13.55%)
Mutual labels:  cnn, lstm, rnn
Deep Music Genre Classification
🎵 Using Deep Learning to Categorize Music as Time Progresses Through Spectrogram Analysis
Stars: ✭ 23 (-89.25%)
Mutual labels:  cnn, lstm, rnn
Cnn lstm for text classify
CNN, LSTM, NBOW, fasttext 中文文本分类
Stars: ✭ 90 (-57.94%)
Mutual labels:  cnn, lstm, rnn
Pytorch Learners Tutorial
PyTorch tutorial for learners
Stars: ✭ 97 (-54.67%)
Mutual labels:  cnn, lstm, rnn
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+879.91%)
Mutual labels:  lstm, rnn
Kaggle Competition Favorita
5th place solution for Kaggle competition Favorita Grocery Sales Forecasting
Stars: ✭ 169 (-21.03%)
Mutual labels:  cnn, lstm
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (-5.61%)
Mutual labels:  lstm, rnn

sign-language-gesture-recognition-from-video-sequences

SIGN LANGUAGE GESTURE RECOGNITION FROM VIDEO SEQUENCES USING RNN AND CNN

The Paper on this work is published here

Please do cite it if you find this project useful. :)

UPDATE:

  • Cleaner and understandable code.
  • Replaced all manual editing with command line arguments.
  • Removed Bugs due to changes in names of the operations in the inception model.
  • Code Tested on a dummy dataset of three classes on google colab.

DataSet Used

  • Argentinian Sign Language Gestures. The dataset is made available strictly for academic purposes by the owners. Please read the license terms carefully and cite their paper if you plan to use the dataset.

Requirements

  • Install opencv. Note: pip install opencv-python does not have video capabilities. So I recommend to build it from source as described above.
  • Install tensorflow:
    pip install tensorflow
    
  • Install tflearn
    pip install tflearn
    

Training and Testing

1. Data Folder

Create two folders with any name say train_videos and test_videos in the project root directory. It should contain folders corresponding to each cateogry, each folder containing corresponding videos.

For example:

train_videos
├── Accept
│   ├── 050_003_001.mp4
│   ├── 050_003_002.mp4
│   ├── 050_003_003.mp4
│   └── 050_003_004.mp4
├── Appear
│   ├── 053_003_001.mp4
│   ├── 053_003_002.mp4
│   ├── 053_003_003.mp4
│   └── 053_003_004.mp4
├── Argentina
│   ├── 024_003_001.mp4
│   ├── 024_003_002.mp4
│   ├── 024_003_003.mp4
│   └── 024_003_004.mp4
└── Away
    ├── 013_003_001.mp4
    ├── 013_003_002.mp4
    ├── 013_003_003.mp4
    └── 013_003_004.mp4

2. Extracting frames

Command

  • usage:

    video-to-frame.py [-h] gesture_folder target_folder
    

    Extract frames from gesture videos.

  • positional arguments:

    gesture_folder:  Path to folder containing folders of videos of different
                      gestures.
    target_folder:   Path to folder where extracted frames should be kept.
    
  • optional arguments:

    -h, --help      show the help message and exit
    

The code involves some hand segmentation (based on the data we used) for each frame. (You can remove that code if you are working on some other data set)

Extracting frames form training videos

python3 "video-to-frame.py" train_videos train_frames

Extract frames from gestures in train_videos to train_frames.

Extracting frames form test videos

python3 "video-to-frame.py" test_videos test_frames

Extract frames from gestures in test_videos to test_frames.

3. Retrain the Inception v3 model.

  • Download retrain.py.

    curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
    

    Note: This link may change in the future. Please refer Tensorflow retrain tutorial

  • Run the following command to retrain the inception model.

    python3 retrain.py --bottleneck_dir=bottlenecks --summaries_dir=training_summaries/long --output_graph=retrained_graph.pb --output_labels=retrained_labels.txt --image_dir=train_frames
    

This will create two file retrained_labels.txt and retrained_graph.pb

For more information about the above command refer here.

4. Intermediate Representation of Videos

Command

  • usage:

    predict_spatial.py [-h] [--input_layer INPUT_LAYER]    
                       [--output_layer OUTPUT_LAYER] [--test]    
                       [--batch_size BATCH_SIZE]    
                       graph frames_folder
    
  • positional arguments:

    - graph                 graph/model to be executed
    - frames_folder         Path to folder containing folders of frames of
                            different gestures.
    
  • optional arguments:

      -h, --help            show this help message and exit
      --input_layer INPUT_LAYER
                            name of input layer
      --output_layer OUTPUT_LAYER
                            name of output layer
      --test                passed if frames_folder belongs to test_data
      --batch_size BATCH_SIZE
                            batch Size
    

Approach 1

  • Each Video is represented by a sequence of n dimensional vectors (probability distribution or output of softmax) one for each frame. Here n is the number of classes.

    On Training Data

    python3 predict_spatial.py retrained_graph.pb train_frames --batch=100
    

    This will create a file predicted-frames-final_result-train.pkl that will be used by RNN.

    On Test Data

    python3 predict_spatial.py retrained_graph.pb test_frames --batch=100 --test
    

    This will create a file predicted-frames-final_result-test.pkl that will be used by RNN.

Approach 2

  • Each Video represented by a sequence of 2048 dimensional vectors (output of last Pool Layer) one for each frame

    On Training Data

    python3 predict_spatial.py retrained_graph.pb train_frames \
    --output_layer="module_apply_default/InceptionV3/Logits/GlobalPool" \
    --batch=100
    

    This will create a file predicted-frames-GlobalPool-train.pkl that will be used by RNN.

    On Test Data

    python3 predict_spatial.py retrained_graph.pb train_frames \
            --output_layer="module_apply_default/InceptionV3/Logits/GlobalPool" \
            --batch=100 \
            --test
    

    This will create a file predicted-frames-GlobalPool-test.pkl that will be used by RNN.

5. Train the RNN.

Command

  • usage

    rnn_train.py [-h] [--label_file LABEL_FILE] [--batch_size BATCH_SIZE]
                    input_file_dump model_file
    
  • positional arguments

    input_file_dump       file containing intermediate representation of gestures from inception model
    model_file            Name of the model file to be dumped. Model file is
                          created inside a checkpoints folder
    
  • optional arguments

    -h, --help            show this help message and exit
    --label_file LABEL_FILE
                          path to label file generated by inception, default='retrained_labels.txt'
    --batch_size BATCH_SIZE
                          batch Size, default=32
    

Approach 1

python3 rnn_train.py predicted-frames-final_result-train.pkl non_pool.model

This will train the RNN model on the softmax based representation of gestures for 10 epochs and save the model with name non_pool.model in a folder named checkpoints.

Approach 2

python3 rnn_train.py predicted-frames-GlobalPool-train.pkl pool.model

This will train the RNN model on the pool layer based representation of gestures for 10 epochs and save the model with name pool.model in a folder named checkpoints.

6. Test the RNN Model

Command

  • usage

    rnn_eval.py [-h] [--label_file LABEL_FILE] [--batch_size BATCH_SIZE]
                    input_file_dump model_file
    
  • positional arguments

    input_file_dump       file containing intermediate representation of gestures from inception model
    model_file            Name of the model file to be used for prediction.
    
  • optional arguments

    -h, --help            show this help message and exit
    --label_file LABEL_FILE
                          path to label file generated by inception, default='retrained_labels.txt'
    --batch_size BATCH_SIZE
                          batch Size, default=32
    

Approach 1

python3 rnn_eval.py predicted-frames-final_result-test.pkl non_pool.model

This will use the non_pool.model to predict the labels of the softmax based representation of the test videos. Predictions and corresponding gold labels for each test video will be dumped in to results.txt

Approach 2

python3 rnn_eval.py predicted-frames-GlobalPool-test.pkl pool.model

This will use the pool.model to predict the labels of the pool layer based representation of the test videos. Predictions and corresponding gold labels for each test video will be dumped in to results.txt

Happy Coding :)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].