All Projects → fedecalendino → pysub-parser

fedecalendino / pysub-parser

Licence: MIT license
Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pysub-parser

Subtitle.js
Stream-based library for parsing and manipulating subtitle files
Stars: ✭ 234 (+485%)
Mutual labels:  subtitles, subtitle, srt
Parsing With Haskell Parser Combinators
🔍 A step-by-step guide to parsing using Haskell parser combinators.
Stars: ✭ 72 (+80%)
Mutual labels:  parsing, subtitles, srt
Srt
A simple library for parsing, modifying, and composing SRT files.
Stars: ✭ 210 (+425%)
Mutual labels:  subtitles, subtitle, srt
Netflix To Srt
Rip, extract and convert subtitles to .srt closed captions from .xml/dfxp/ttml and .vtt/WebVTT (e.g. Netflix, YouTube)
Stars: ✭ 387 (+867.5%)
Mutual labels:  subtitles, subtitle, srt
video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Stars: ✭ 1,763 (+4307.5%)
Mutual labels:  extract, subtitles, srt
Ffsubsync
Automagically synchronize subtitles with video.
Stars: ✭ 5,167 (+12817.5%)
Mutual labels:  subtitles, subtitle, srt
Submerger
SRT Subtitles Merger
Stars: ✭ 92 (+130%)
Mutual labels:  subtitles, srt
Subsync
Synchronize your subtitles using machine learning
Stars: ✭ 84 (+110%)
Mutual labels:  subtitles, subtitle
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-30%)
Mutual labels:  parsing, extract
Link Preview Js
Parse and/or extract web links meta information: title, description, images, videos, etc. [via OpenGraph], runs on mobiles and node.
Stars: ✭ 240 (+500%)
Mutual labels:  parsing, extract
SABRE.js
Substation Alpha suBtitles REnderer -- A Gpu Accelerated Javascript Advanced SubStation (ASS) Alpha Subtitles Renderer. Renders .ass and .ssa files.
Stars: ✭ 58 (+45%)
Mutual labels:  ssa, subtitles
ass-compiler
Parses and compiles ASS subtitle format to easy-to-use data structure
Stars: ✭ 73 (+82.5%)
Mutual labels:  ssa, subtitle
Caption
Get Caption, start watching.
Stars: ✭ 1,258 (+3045%)
Mutual labels:  subtitles, subtitle
subtitleeditor
Subtitle Editor is a GTK+3 tool to create or edit subtitles for GNU/Linux/*BSD.
Stars: ✭ 79 (+97.5%)
Mutual labels:  subtitles, subtitle
ST-ASS
ASS/SSA subtitles syntax highlight for Sublime Text.
Stars: ✭ 23 (-42.5%)
Mutual labels:  ssa, subtitle
ttml2srt
Convert TTML subtitles used by Netflix, HBO, CMore and others to SRT format
Stars: ✭ 51 (+27.5%)
Mutual labels:  subtitles, srt
Subed
Subtitle editor for Emacs
Stars: ✭ 77 (+92.5%)
Mutual labels:  subtitles, srt
ChineseSubFinder
自动化中文字幕下载。字幕网站支持 shooter、xunlei、arrst、a4k 。支持 Emby、Jellyfin、Plex、Sonarr、Radarr、TMM
Stars: ✭ 2,212 (+5430%)
Mutual labels:  subtitle, sub
Subfinder
字幕查找器
Stars: ✭ 545 (+1262.5%)
Mutual labels:  subtitles, subtitle
Pgstosrt
PGS to Srt converter
Stars: ✭ 21 (-47.5%)
Mutual labels:  subtitles, srt

pysub-parser

Version Quality Gate Status CodeCoverage

Utility to extract the contents of a subtitle file.

Supported types:

For more information: http://write.flossmanuals.net/video-subtitling/file-formats

Usage

The method parse requires the following parameters:

  • path: location of the subtitle file.
  • subtype: one of the supported file types, by default file extension is used.
  • encoding: encoding of the file, utf-8 by default.
  • **kwargs: optional parameters.
    • fps: framerate (only used by sub files), 23.976 by default.
from pysubparser import parser

subtitles = parser.parse('./files/space-jam.srt')

for subtitle in subtitles:
    print(subtitle)

Output:

0 > [BALL BOUNCING]
1 > Michael?
2 > What are you doing out here, son? It's after midnight.
3 > MICHAEL: Couldn't sleep, Pops.

Subtitle Class

Each line of a dialogue is represented with a Subtitle object with the following properties:

  • index: position in the file.
  • start: timestamp of the start of the dialog.
  • end: timestamp of the end of the dialog.
  • text: dialog contents.
for subtitle in subtitles:
    print(f'{subtitle.start} > {subtitle.end}')
    print(subtitle.text)
    print()

Output:

00:00:36.328000 > 00:00:38.329000
[BALL BOUNCING]

00:01:03.814000 > 00:01:05.189000
Michael?

00:01:08.402000 > 00:01:11.404000
What are you doing out here, son? It's after midnight.

00:01:11.572000 > 00:01:13.072000
MICHAEL: Couldn't sleep, Pops.

Cleaners

Currently, 4 cleaners are provided:

  • ascii will translate every unicode character to its ascii equivalent.
  • brackets will remove anything between them (e.g., [BALL BOUNCING])
  • formatting will remove formatting keys like <i> and </i>.
  • lower_case will lower case all text.
from pysubparser.cleaners import ascii, brackets, formatting, lower_case

subtitles = brackets.clean(
    lower_case.clean(
        subtitles
    )
)

for subtitle in subtitles:
    print(subtitle)
0 > 
1 > michael?
2 > what are you doing out here, son? it's after midnight.
3 > michael: couldn't sleep, pops.

Writers

Given any list of Subtitle and a path it will output those subtitles in a srt format.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].