All Projects → lukoshkin → text2video

lukoshkin / text2video

Licence: MIT license
Text to Video Generation Problem

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to text2video

WeTextProcessing
Text Normalization & Inverse Text Normalization
Stars: ✭ 213 (+660.71%)
Mutual labels:  text-processing
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+60.71%)
Mutual labels:  text-processing
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-42.86%)
Mutual labels:  text-processing
gb-dl
A python based utility to download courses from infosec4tc.teachable.com , academy.ehacking.net and stackskills.com for personal offline use.
Stars: ✭ 33 (+17.86%)
Mutual labels:  dl
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+114.29%)
Mutual labels:  text-processing
SENet-for-Weakly-Supervised-Relation-Extraction
No description or website provided.
Stars: ✭ 39 (+39.29%)
Mutual labels:  dl
rbbcc
BCC port for MRI - this is unofficial bonsai project.
Stars: ✭ 45 (+60.71%)
Mutual labels:  dl
estratto
parsing fixed width files content made easy
Stars: ✭ 12 (-57.14%)
Mutual labels:  text-processing
ConTexto
Librería en Python para minería de texto y NLP
Stars: ✭ 43 (+53.57%)
Mutual labels:  text-processing
SuperCombinators
[Deprecated] A Swift parser combinator framework
Stars: ✭ 19 (-32.14%)
Mutual labels:  text-processing
r4strings
Handling Strings in R
Stars: ✭ 39 (+39.29%)
Mutual labels:  text-processing
sliceslice-rs
A fast implementation of single-pattern substring search using SIMD acceleration.
Stars: ✭ 66 (+135.71%)
Mutual labels:  text-processing
Dogs-Cats
猫狗二分类图片识别
Stars: ✭ 22 (-21.43%)
Mutual labels:  dl
ConvGRU-pytorch
Convolutional GRU
Stars: ✭ 109 (+289.29%)
Mutual labels:  convgru
articulated-animation
Code for Motion Representations for Articulated Animation paper
Stars: ✭ 849 (+2932.14%)
Mutual labels:  video-generation
MoCoGAN-HD
[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Stars: ✭ 224 (+700%)
Mutual labels:  video-generation
frangipanni
Program to convert lines of text into a tree structure.
Stars: ✭ 1,176 (+4100%)
Mutual labels:  text-processing
s3-concat
Concatenate Amazon S3 files remotely using flexible patterns
Stars: ✭ 32 (+14.29%)
Mutual labels:  text-processing
dif
'dif' is a Linux preprocessing front end to gvimdiff/meld/kompare
Stars: ✭ 18 (-35.71%)
Mutual labels:  text-processing
syn
syn - the thesaurus
Stars: ✭ 45 (+60.71%)
Mutual labels:  text-processing

Video Generation Based on Short Text Description (2019)

Over a year latter (in 2020), I have decided to add README to the repository, since some people find it useful even without a description. I hope this step will make the results of my work more usable for those who are interested in the problem and stumble upon the repository when browsing the topic on GitHub.

Example of Generated Video

Unfortunately, I have not saved videos generated by the network, since all the results remained on the working laptop which I hand over at the end of the internship. The only thing left is the recording that I did on my cellphone (Sorry if this makes your eyes bleed).

What is on the gif? There are 5 blocks of images stacked horizontally. Each block contains 4 objects, selected from '20bn-something-something-v2' dataset and belonging to the same category "Pushing [something] from left to right" 1 (~1000 samples). They are book (top left window), box (top right window), mug (bottom left window), and marker (bottom right window), pushed along the surface by hand. The number of occurrences in the data subset for the corresponding objects is 57, 43, 9, 55.

The generated videos are diverse (thanks to zero-gradient penalty) and about the same quality as the videos from the training data. However, there are no tests conducted on the validation data. Regarding the gif below, since all the objects belong to the same category, only single-word conditioning (i.e., on the object) is used. Still, there are tools in the repository for encoding the whole sentence.


1 Yep, exactly "from left to right" and not the other way around as you can read it on the gif (it is a typo). However, it is good for validation purposes to make new labels with the reversed direction of movement or new (but "similar", e.g., in space of embeddings) objects from unchanged category.

Navigating Through SRC Files

data_prep2.py Video and text processing (based on text_processing.py)
blocks.py Building blocks used in models.py
visual_encoders.py Advanced building blocks for image and video discriminators
process3-5.ipynb Pipeline for the training process on multiple gpus
(3-5 is a hardcoded range of gpus involved)
pipeline.ipynb Previously served for the same purpose as process3-5.ipynb
but hadrcoded range was 0-2. Now it is unfinished implementation of mixed batches
legacy (=obsolete) Early attempts and ideas

There is also a collection of references to articles relevant (at the time of 2019) to the text2video generation problem.

Update of 2021

This year, I have decided to make the results reproducible. It has turned out that if one has not dealt with this repo before, it is a tough call for them to get it up and running again. Quite surprising, huh? Especially, considering I uploaded everything hastily and as it was. Now, at least you can follow the instructions below. This is how you reach the state of what I left in 2019. To move further, a great deal of effort is required. Good luck!

Setting Everything Up

  1. Clone the repository
git clone https://github.com/lukoshkin/text2video.git
  1. Retrieve the docker image
cd docker
docker build -t lukoshkin/text2video:base .

or

docker pull lukoshkin/text2video:base

If using singularity, one can obtain the image by typing

singularity build t2v-base.simg docker://lukoshkin/text2video:base
  1. Get an access to GPU. For cluster-folks, it may look like:
salloc -p gpu_a100 -N 1 -n 4 --gpus=1 --mem-per-gpu=20G --time=12:00:00
## prints the name of allocated node, e.g., gn26
ssh gn26
  1. Cd to the directory where everything is located and create a container (9999 is a port exposed for Jupyter outside the container. That is, if running directly on your computer, localhost:9999 is your access point in a browser. You may need one more port for TensorBoard as well; note, 8888 is the default one Jupyter tries first, if the port is busy, you should specify it manually in the jupyter-command with --port option )
nvidia-docker run --name t2v \
  -p 9999:8888 -v "$PWD":/home/depp/project
  -d lukoshkin/text2video:base \
  'jupyter-notebook --ip=0.0.0.0 --no-browser'

Singularity users are like:

singularity exec \
  --no-home -B "$PWD:$HOME" --nv t2v-base.simg \
  jupyter notebook --ip 0.0.0.0 --no-browser

If accustomed to work in JupyterLab, please, use it readily by rewriting the commands to the proper form first.

For running everything on a HPC cluster, one should forward the ports. You type one of the following. Which one? - depends on whether you ssh to calculation nodes (gn26) on your server and whether you set up a nickname for the latter in .ssh/config.

ssh -NL 9999:gn26:8888 nickname
ssh -NL 9999:localhost:8888 user@server
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].