All Projects → oseiskar → Autosubsync

oseiskar / Autosubsync

Licence: mit
Automatically synchronize subtitles with audio using machine learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Autosubsync

Ffsubsync
Automagically synchronize subtitles with video.
Stars: ✭ 5,167 (+2786.59%)
Mutual labels:  ffmpeg, subtitles
moshpit
A powerful cross-platform command-line tool for datamoshing.
Stars: ✭ 74 (-58.66%)
Mutual labels:  ffmpeg, command-line-tool
Android Media Converter
Android open source media converter build on top of FFmpeg
Stars: ✭ 164 (-8.38%)
Mutual labels:  ffmpeg
Online Video Editor
API based Online Video Editing using FFMPEG & NodeJs for Backend Editing
Stars: ✭ 176 (-1.68%)
Mutual labels:  ffmpeg
Fast Cli
Test your download and upload speed using fast.com
Stars: ✭ 2,178 (+1116.76%)
Mutual labels:  command-line-tool
Wallop
📺 A transcoding server for your HDHomeRun Prime
Stars: ✭ 165 (-7.82%)
Mutual labels:  ffmpeg
Projectman
ProjectMan is a command line tool to easily save/open your favorite projects right from command line. `pm add` to add projects and `pm open` to open them from anywhere you want🦸
Stars: ✭ 172 (-3.91%)
Mutual labels:  command-line-tool
Gister
command line tool to sync github gists
Stars: ✭ 162 (-9.5%)
Mutual labels:  command-line-tool
Ffmpeg.js
Port of FFmpeg with Emscripten
Stars: ✭ 2,447 (+1267.04%)
Mutual labels:  ffmpeg
Mongotail
Command line tool to log all MongoDB queries in a "tail"able way
Stars: ✭ 169 (-5.59%)
Mutual labels:  command-line-tool
Caincamera
CainCamera is an Android Project to learn about development of beauty camera, image and short video
Stars: ✭ 2,397 (+1239.11%)
Mutual labels:  ffmpeg
Puppeteer Recorder
Record animations using puppeteer. Based on electron-recorder.
Stars: ✭ 169 (-5.59%)
Mutual labels:  ffmpeg
Grex
A command-line tool and library for generating regular expressions from user-provided test cases
Stars: ✭ 4,847 (+2607.82%)
Mutual labels:  command-line-tool
Dnote
A simple command line notebook for programmers
Stars: ✭ 2,192 (+1124.58%)
Mutual labels:  command-line-tool
Receivemidi
Multi-platform command-line tool to monitor and receive MIDI messages
Stars: ✭ 164 (-8.38%)
Mutual labels:  command-line-tool
You Dont Need Gui
Stop relying on GUI; CLI **ROCKS**
Stars: ✭ 4,766 (+2562.57%)
Mutual labels:  command-line-tool
Rapidbay
Self-hosted torrent video streaming service compatible with Chromecast and AppleTV deployable in the cloud
Stars: ✭ 163 (-8.94%)
Mutual labels:  subtitles
Partyline
Output to Laravel's console from outside of your Command classes.
Stars: ✭ 168 (-6.15%)
Mutual labels:  command-line-tool
Autojump
A cd command that learns - easily navigate directories from the command line
Stars: ✭ 13,289 (+7324.02%)
Mutual labels:  command-line-tool
Video Downloader Deploy
Video Downloaders (you-get, youtube-dl, annie) One-Click Deployment Batch. || 视频下载器 (you-get, youtube-dl, annie) 一键配置脚本。
Stars: ✭ 178 (-0.56%)
Mutual labels:  ffmpeg

Automatic subtitle synchronization tool

Build Status PyPI

Did you know that hundreds of movies, especially from the 1950s and '60s, are now in public domain and available online? Great! Let's download Plan 9 from Outer Space. As a non-native English speaker, I prefer watching movies with subtitles, which can also be found online for free. However, sometimes there is a problem: the subtitles are not in sync with the movie.

But fear not. This tool can resynchronize the subtitles without any human input. A correction for both shift and playing speed can be found automatically... using "AI & machine learning"

Installation

macOS / OSX

Prerequisites: Install Homebrew and pip. Then install FFmpeg and this package

brew install ffmpeg
pip install autosubsync

Linux (Debian & Ubuntu)

Make sure you have Pip, e.g., sudo apt-get install python-pip. Then install FFmpeg and this package

sudo apt-get install ffmpeg
sudo pip install autosubsync

Note: If you are running Ubuntu 14 (but not 12 and 16, which are fine), you'll need to jump some more hoops to install FFmpeg.

Usage

autosubsync [input movie] [input subtitles] [output subs]

# for example
autosubsync plan-9-from-outer-space.avi \
  plan-9-out-of-sync-subs.srt \
  plan-9-subtitles-synced.srt

See autosubsync --help for more details.

Features

  • Automatic speed and shift correction

  • Typical synchronization accuracy ~0.15 seconds (see performance)

  • Wide video format support through ffmpeg

  • Supports all reasonably encoded SRT files in any language

  • Should work with any language in the audio (only tested with a few though)

  • Quality-of-fit metric for checking sync success

  • Python API. Example (save as batch_sync.py):

    "Batch synchronize video files in a folder: python batch_sync.py /path/to/folder"
    
    import autosubsync
    import glob, os, sys
    
    if __name__ == '__main__':
        for video_file in glob.glob(os.path.join(sys.argv[1], '*.mp4')):
            base = video_file.rpartition('.')[0]
            srt_file = base + '.srt'
            synced_srt_file = base + '_synced.srt'
    
            # see help(autosubsync.synchronize) for more details
            autosubsync.synchronize(video_file, srt_file, synced_srt_file)
    

Development

Training the model

  1. Collect a bunch of well-synchronized video and subtitle files and put them in a file called training/sources.csv (see training/sources.csv.example)
  2. Run (and see) train_and_test.sh. This
    • populates the training/data folder
    • creates trained-model.bin
    • runs cross-validation

Synchronization (predict)

Assumes trained model is available as trained-model.bin

python3 autosubsync/main.py input-video-file input-subs.srt synced-subs.srt

Build and distribution

  • Create virtualenv: python3 -m venv venvs/test-python3
  • Activate venv: source venvs/test-python3/bin/activate
  • pip install -e .
  • pip install wheel
  • python setup.py bdist_wheel

Methods

The basic idea is to first detect speech on the audio track, that is, for each point in time, t, in the film, to estimate if speech is heard. The method described below produces this estimate as a probability of speech p(t). Another input to the program is the unsynchronized subtitle file containing the timestamps of the actual subtitle intervals.

Synchronization is done by finding a time transformation tf(t) that makes s(f(t)), the synchronized subtitles, best match, p(t), the detected speech. Here s(t) is the (unsynchronized) subtitle indicator function whose value is 1 if any subtitles are visible at time t and 0 otherwise.

Speech detection (VAD)

Speech detection is done by first computing a spectrogram of the audio, that is, a matrix of features, where each column corresponds to a frame of duration Δt and each row a certain frequency band. Additional features are engineered by computing a rolling maximum of the spectrogram with a few different periods.

Using a collection of correctly synchronized media files, one can create a training data set, where the each feature column is associated with a correct label. This allows training a machine learning model to predict the labels, that is, detect speech, on any previously unseen audio track - as the probability of speech p(iΔt) on frame number i.

The weapon of choice in this project is logistic regression, a common baseline method in machine learning, which is simple to implement. The accuracy of speech detection achieved with this model is not very good, only around 72% (AURoC). However, the speech detection results are not the final output of this program but just an input to the synchronization parameter search. As mentioned in the performance section, the overall synchronization accuracy is quite fine even though the speech detection is not.

Synchronization parameter search

This program only searches for linear transformations of the form f(t) = a t + b, where b is shift and a is speed correction. The optimization method is brute force grid search where b is limited to a certain range and a is one of the common skew factors. The parameters minimizing the loss function are selected.

Loss function

The data produced by the speech detection phase is a vector representing the speech probabilities in frames of duration Δt. The metric used for evaluating match quality is expected linear loss:

    loss(f) = Σi s(fi) (1 - pi) + (1 - s(fi)) pi,

where pi = p(iΔt) is the probability of speech and s(fi) = s(f(iΔt)) = s(a iΔt + b) is the subtitle indicator resynchronized using the transformation f at frame number i.

Speed correction

Speed/skew detection is based on the assumption that an error in playing speed is not an arbitrary number but caused by frame rate mismatch, which constraints the possible playing speed multiplier to be ratio of two common frame rates sufficiently close to one. In particular, it must be one of the following values

  • 24/23.976 = 30/29.97 = 60/59.94 = 1001/1000
  • 25/24
  • 25/23.976

or the reciprocal (1/x).

The reasoning behind this is that if the frame rate of (digital) video footage needs to be changed and the target and source frame rates are close enough, the conversion is often done by skipping any re-sampling and just changing the nominal frame rate. This effectively changes the playing speed of the video and the pitch of the audio by a small factor which is the ratio of these frame rates.

Performance

Based on somewhat limited testing, the typical shift error in auto-synchronization seems to be around 0.15 seconds (cross-validation RMSE) and generally below 0.5 seconds. In other words, it seems to work well enough in most cases but could be better. Speed correction errors did not occur.

Auto-syncing a full-length movie currently takes about 3 minutes and utilizes around 1.5 GB of RAM.

References

I first checked Google if someone had already tried to solve the same problem and found this great blog post whose author had implemented a solution using more or less the same approach that I had in mind. The post also included good points that I had not realized, such as using correctly synchronized subtitles as training data for speech detection.

Instead of starting from the code linked in that blog post I decided to implement my own version from scratch, since this might have been a good application for trying out RNNs, which turned out to be unnecessary, but this was a nice project nevertheless.

Other similar projects

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].