All Projects → thetobysiu → deepstory

thetobysiu / deepstory

Licence: other
Deepstory turns a text/generated text into a video where the character is animated to speak your story using his/her voice.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language
CSS
56736 projects

Projects that are alternatives of or similar to deepstory

transformer-models
Deep Learning Transformer models in MATLAB
Stars: ✭ 90 (+47.54%)
Mutual labels:  gpt-2, gpt2
GPT2-Telegram-Chatbot
GPT-2 Telegram Chat bot
Stars: ✭ 67 (+9.84%)
Mutual labels:  gpt-2, gpt2
ke-dialogue
KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.
Stars: ✭ 39 (-36.07%)
Mutual labels:  gpt-2
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-70.49%)
Mutual labels:  gpt2
vscode-witcherscript
Witcher Script for Visual Studio Code.
Stars: ✭ 22 (-63.93%)
Mutual labels:  witcher-3
Tabnine
AI Code Completions
Stars: ✭ 8,863 (+14429.51%)
Mutual labels:  gpt-2
Gpt2 Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
Stars: ✭ 4,592 (+7427.87%)
Mutual labels:  gpt-2
download-tweets-ai-text-gen-plus
Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation
Stars: ✭ 26 (-57.38%)
Mutual labels:  gpt-2
KorQuAD-Question-Generation
question generation model with KorQuAD dataset
Stars: ✭ 27 (-55.74%)
Mutual labels:  gpt2
dctts-pytorch
The pytorch implementation of DC-TTS
Stars: ✭ 73 (+19.67%)
Mutual labels:  dc-tts
The-Witcher-3-Mod-manager
Mod manager for The Witcher 3 🗃
Stars: ✭ 71 (+16.39%)
Mutual labels:  witcher-3
DeepConvolutionalTTS-pytorch
Deep Convolutional TTS pytorch implementation
Stars: ✭ 26 (-57.38%)
Mutual labels:  dctts
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+3563.93%)
Mutual labels:  gpt-2
finetune-gpt2xl
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
Stars: ✭ 353 (+478.69%)
Mutual labels:  gpt2
SMILE
SMILE: Semantically-guided Multi-attribute Image and Layout Editing, ICCV Workshops 2021.
Stars: ✭ 28 (-54.1%)
Mutual labels:  deep-fake
Gradient-Samples
Samples for TensorFlow binding for .NET by Lost Tech
Stars: ✭ 53 (-13.11%)
Mutual labels:  gpt-2
FCH-TTS
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Stars: ✭ 154 (+152.46%)
Mutual labels:  dctts
WolvenKit
Mod editor/creator for RED Engine games. The point is to have an all in one tool for creating mods for the games made with the engine.
Stars: ✭ 429 (+603.28%)
Mutual labels:  witcher-3
Transformer-QG-on-SQuAD
Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)
Stars: ✭ 28 (-54.1%)
Mutual labels:  gpt2
auto coding
A basic and simple tool for code auto completion
Stars: ✭ 42 (-31.15%)
Mutual labels:  gpt-2

Deepstory

Deepstory is an artwork that incorporates Natural Language Generation(NLG) w/GPT-2, Text-to-Speech(TTS) w/Deep Convolutional TTS, speech to animation w/Speech driven animation and image animation w/First Order Motion Model into a media application.

To put it simply, it turns a text/generated text into a video where the character is animated to speak your story using his/her voice.

You can convert image into a video like this:

result

It provides a comfortable web interface and backend written with flask to create your own story.

It supports transformers model, and pytorch-dctts models

Live Demo

Colab (flask-ngrok): https://colab.research.google.com/drive/1HYCPUmFw5rN8kvZdwzFpfBlaUMWPNHas?usp=sharing

Video (In case you need instructions): https://blog.thetobysiu.com/video/

Updates

  1. Redesign interface, especially the whole GPT2 interface
  2. GPT2 now support text loading from original data, so that it can continue to generate a story based on the book
  3. Figure out the token limits in GPT2 and only infer to the nearest 1024 - predict length tokens
  4. GPT2 support interactive mode that generates several batches of sentences and provides an interface to add those sentence
  5. Sentence speaker mapping system, not replacing all speaker by default anymore
  6. text normalization is now in the synthesizing stage so that punctuations are preserved and can be referenced to have a variable duration in synthesized audio
  7. Audio synthesizing are now all in temp folder, synthesized audios are trimmed so that it's animated video is more accurate(sda mode trained data are short also)
  8. Combined audio now have variable silences according to punctuation
  9. Basically, rewrite the web interface and lots of codes...

Colab version will be available soon!

Interface

Folder structure

Deepstory
├── animator.py
├── app.py
├── data
│   ├── dctts
│   │   ├── Geralt
│   │   │   ├── ssrn.pth
│   │   │   └── t2m.pth
│   │   ├── LJ
│   │   │   ├── ssrn.pth
│   │   │   └── t2m.pth
│   │   └── Yennefer
│   │       ├── ssrn.pth
│   │       └── t2m.pth
│   ├── fom
│   │   ├── vox-256.yaml
│   │   ├── vox-adv-256.yaml
│   │   ├── vox-adv-cpk.pth.tar
│   │   └── vox-cpk.pth.tar
│   ├── gpt2
│   │   ├── Waiting for Godot
│   │   │   ├── config.json
│   │   │   ├── default.txt
│   │   │   ├── merges.txt
│   │   │   ├── pytorch_model.bin
│   │   │   ├── special_tokens_map.json
│   │   │   ├── text.txt
│   │   │   ├── tokenizer_config.json
│   │   │   └── vocab.json
│   │   └── Witcher Books
│   │       ├── config.json
│   │       ├── default.txt
│   │       ├── merges.txt
│   │       ├── pytorch_model.bin
│   │       ├── special_tokens_map.json
│   │       ├── text.txt
│   │       ├── tokenizer_config.json
│   │       └── vocab.json
│   ├── images
│   │   ├── Geralt
│   │   │   ├── 0.jpg
│   │   │   └── fx.jpg
│   │   └── Yennefer
│   │       ├── 0.jpg
│   │       ├── 1.jpg
│   │       ├── 2.jpg
│   │       ├── 3.jpg
│   │       ├── 4.jpg
│   │       └── 5.jpg
│   └── sda
│       ├── grid.dat
│       └── image.bmp
├── deepstory.py
├── generate.py
├── modules
│   ├── dctts
│   │   ├── audio.py
│   │   ├── hparams.py
│   │   ├── __init__.py
│   │   ├── layers.py
│   │   ├── ssrn.py
│   │   └── text2mel.py
│   ├── fom
│   │   ├── animate.py
│   │   ├── dense_motion.py
│   │   ├── generator.py
│   │   ├── __init__.py
│   │   ├── keypoint_detector.py
│   │   ├── sync_batchnorm
│   │   │   ├── batchnorm.py
│   │   │   ├── comm.py
│   │   │   ├── __init__.py
│   │   │   └── replicate.py
│   │   └── util.py
│   └── sda
│       ├── encoder_audio.py
│       ├── encoder_image.py
│       ├── img_generator.py
│       ├── __init__.py
│       ├── rnn_audio.py
│       ├── sda.py
│       └── utils.py
├── README.md
├── requirements.txt
├── static
│   ├── bootstrap
│   │   ├── css
│   │   │   └── bootstrap.min.css
│   │   └── js
│   │       └── bootstrap.min.js
│   ├── css
│   │   └── styles.css
│   └── js
│       └── jquery.min.js
├── templates
│   ├── animate.html
│   ├── deepstory.js
│   ├── gen_sentences.html
│   ├── gpt2.html
│   ├── index.html
│   ├── map.html
│   ├── models.html
│   ├── sentences.html
│   ├── status.html
│   └── video.html
├── test.py
├── text.txt
├── util.py
└── voice.py

Complete project download

They are available at the google drive version of this project. All the models(including Geralt, Yennefer) are included.

You have to download the spacy english model first.

make sure you have ffmpeg installed in your computer, and ffmpeg-python installed.

https://drive.google.com/drive/folders/1AxORLF-QFd2wSORzMOKlvCQSFhdZSODJ?usp=sharing

To simplify things, a google colab version will be released soon...

Requirements

It is required to have an nvidia GPU with at least 4GB of VRAM to run this project

Credits

https://github.com/tugstugi/pytorch-dc-tts

https://github.com/DinoMan/speech-driven-animation

https://github.com/AliaksandrSiarohin/first-order-model

https://github.com/huggingface/transformers

Notes

The whole project uses PyTorch, while tensorflow is listed in requirements.txt, it was used for transformers to convert a model trained from gpt-2-simple to a Pytorch model.

Only the files inside modules folder are slightly modified from the original. The remaining files are all written by me, except some parts that are referenced.

Bugs

There's still some memory issues if you synthesize sentences within a session over and over, but it takes at least 10 times to cause memory overflow.

Training models

There are other repos of tools that I created to preprocess the files. They can be found in my profile.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].