Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Fengdalu → Lipnet Pytorch

Fengdalu / Lipnet Pytorch

The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch deeplearning arxiv

Projects that are alternatives of or similar to Lipnet Pytorch

Deepfacelab

DeepFaceLab is the leading software for creating deepfakes.

Stars: ✭ 30,308 (+29042.31%)

Mutual labels: deeplearning, arxiv

Cnn Paper2

🎨 🎨 深度学习卷积神经网络教程：图像识别，目标检测，语义分割，实例分割，人脸识别，神经风格转换，GAN等🎨🎨 https://dataxujing.github.io/CNN-paper2/

Stars: ✭ 77 (-25.96%)

Mutual labels: deeplearning, arxiv

Lipreading Densenet3d

DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.06990

Stars: ✭ 91 (-12.5%)

Mutual labels: deeplearning, arxiv

The Nlp Pandect

A comprehensive reference for all topics related to Natural Language Processing

Stars: ✭ 1,349 (+1197.12%)

Mutual labels: deeplearning

Intra Bag And Inter Bag Attentions

Code for NAACL 2019 paper: Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions

Stars: ✭ 98 (-5.77%)

Mutual labels: deeplearning

Matex

Machine Learning Toolkit for Extreme Scale (MaTEx)

Stars: ✭ 104 (+0%)

Mutual labels: deeplearning

Deep learning object detection

A paper list of object detection using deep learning.

Stars: ✭ 10,334 (+9836.54%)

Mutual labels: deeplearning

Bdrar

Code for the ECCV 2018 paper "Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection"

Stars: ✭ 95 (-8.65%)

Mutual labels: deeplearning

Deeplearning ocr

Deep Learning on 身份证识别

Stars: ✭ 106 (+1.92%)

Mutual labels: deeplearning

Sert

Semantic Entity Retrieval Toolkit

Stars: ✭ 100 (-3.85%)

Mutual labels: deeplearning

Androidtensorflowmachinelearningexample

Android TensorFlow MachineLearning Example (Building TensorFlow for Android)

Stars: ✭ 1,369 (+1216.35%)

Mutual labels: deeplearning

Har Keras Cnn

Human Activity Recognition (HAR) with 1D Convolutional Neural Network in Python and Keras

Stars: ✭ 97 (-6.73%)

Mutual labels: deeplearning

Dlflow

DLFlow is a deep learning framework.

Stars: ✭ 105 (+0.96%)

Mutual labels: deeplearning

Exposure correction

Reference code for the paper "Learning Multi-Scale Photo Exposure Correction", CVPR 2021.

Stars: ✭ 98 (-5.77%)

Mutual labels: deeplearning

Ssd Pytorch

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

Stars: ✭ 107 (+2.88%)

Mutual labels: deeplearning

Ngsim env

Learning human driver models from NGSIM data with imitation learning.

Stars: ✭ 96 (-7.69%)

Mutual labels: deeplearning

Hiddenlayer

Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

Stars: ✭ 1,561 (+1400.96%)

Mutual labels: deeplearning

Paper Reading

深度学习论文阅读、数据仓库实践体验。比做算法的懂工程落地，比做工程的懂算法模型。

Stars: ✭ 101 (-2.88%)

Mutual labels: arxiv

Mit Deep Learning Book Pdf

MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Stars: ✭ 9,859 (+9379.81%)

Mutual labels: deeplearning

Spectralnormalizationkeras

Spectral Normalization for Keras Dense and Convolution Layers

Stars: ✭ 100 (-3.85%)

Mutual labels: deeplearning

View All Similar Projects ➔

LipNet: End-to-End Sentence-level Lipreading

The PyTorch implementation of 'LipNet: End-to-End Sentence-level Lipreading' by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas (https://arxiv.org/abs/1611.01599). We use PyTorch to build the LipNet model with minor changes. This reproduction achieves 13.3%/4.6% WER in unseen/overlapped testing, which exceeds all evaluation metrics in the original paper and reaches the state-of-the-art performance.

Demo

Results

Scenario	Image Size (W x H)	CER	WER
Unseen speakers (Origin)	100 x 50	6.7%	13.6%
Overlapped speakers (Origin)	100 x 50	2.0%	5.6%
Unseen speakers (Ours)	128 x 64	6.7%	13.3%
Overlapped speakers (Ours)	128 x 64	1.9%	4.6%

Notes:

These results are really hard to reproduce. For further research, the contribution to sharing the results of this model is highly appreciated (e.g. training logs, pretrained weights, learning rate schedulers, and so on)

Data Statistics

Following the original split, we use s1, s2, s20, s22 in unseen speakers testing and choose 255 random sentences from each speaker in overlapped speakers testing.

Scenario	Train	Validation
Unseen speakers (Origin)	28775	3971
Overlapped speakers (Origin)	24331	8415
Unseen speakers (Ours)	28837	3986
Overlapped speakers (Ours)	24408	8415

Data Preparation

We provide cropped lip images and annotation files in the following links:

BaiduYun (Code: jf0l), Google Drive

The original GRID Corpus can be found here.

Download all parts and concatenate the files using the following command:

cat GRID_LIP_160x80_TXT.zip.* > GRID_LIP_160x80_TXT.zip
unzip GRID_LIP_160x80_TXT.zip
rm GRID_LIP_160x80_TXT.zip

The extracted folder contains lip and GRID_align_txt folders, which store the cropped lip images and the annotation files. You can create symbolic links to the LipNet-PyTorch project:

ln -s PATH_OF_DOWNLOADED_DATA/lip LipNet-PyTorch/lip
ln -s PATH_OF_DOWNLOADED_DATA/GRID_align_txt LipNet-PyTorch/GRID_align_txt

Beyond our provided data, if you want to establish a whole lip-reading pipeline by yourself, we provide code of face detection and alignment in the scripts/ folder for reference. You can concat [email protected] or [email protected] for cooperation.

Training And Testing

Run the program main.py to train and test LipNet model:

python main.py

To monitor training progress:

tensorboard --logdir logs

Data configurations and hyperparameters are configured in options.py. Please pay attention that you may need to modify it to make the program work as expected (e.g. data path, learning rate, batch size, and so on). The options.py should like this.

gpu = '0'
random_seed = 0
data_type = 'unseen'
video_path = 'lip/'
train_list = f'data/{data_type}_train.txt'
val_list = f'data/{data_type}_val.txt'
anno_path = 'GRID_align_txt'
vid_padding = 75
txt_padding = 200
batch_size = 96
base_lr = 2e-5
num_workers = 16
max_epoch = 10000
display = 10
test_step = 1000
save_prefix = f'weights/LipNet_{data_type}'
is_optimize = True

weights = 'pretrain/LipNet_unseen_loss_0.44562849402427673_wer_0.1332580699113564_cer_0.06796452465503355.pt'

Optional arguments:

gpu: the GPU id used for training and testing
random_seed: random seed for training and testing
data_type: the data split in GRID Corpus, unseen and overlap is supported.
train_list: The training index file. Each line contains a video folder like s5/video/mpg_6000/lgbs5a. The dataset.py will read all *.jpg files in the folder by the frame order.
test_list: The testing index file. Same as train_list
anno_path: The annotation root, which contains the annotation files *.align for each video.
vid_padding: The video padding length, each video will be padded to vid_padding by zero.
txt_padding: The txt padding length, each txt will be paded to txt_padding by zero.
batch_size: The batch size for training and testing
base_lr: The learning rate.
num_workers: The number of processes used for data loading
max_epoch: The maximum epochs for training
display: The display interval in training and testing. For example, if display=10, the program will be print one time after 10 iterations.
test_step: The interval for testing and snapshot. For example, if test_step=1000, the program will test after 1000 training iterations.
save_prefix: The save prefix of model checkpoints.
is_optimize: The training mode. If this is set to False, the model will test one time and exit.
weights: The location of pre-trained weights. The model will load this weight before training or testing. If this parameter is missed, the model will be trained from scratch.

Simple demo

We provide a simple demo of LipNet. You can run python demo.py PATH_TO_YOUR_MP4 to watch. :)

Dependencies

PyTorch 1.0+
opencv-python
face_alignment (For demo only, https://github.com/1adrianb/face-alignment)

Bibtex

@article{assael2016lipnet,
  title={LipNet: End-to-End Sentence-level Lipreading},
  author={Assael, Yannis M and Shillingford, Brendan and Whiteson, Shimon and de Freitas, Nando},
  journal={GPU Technology Conference},
  year={2017},
  url={https://github.com/Fengdalu/LipNet-PyTorch}
}

License

The MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 104

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗