All Projects → hathix → youtube-transcriber

hathix / youtube-transcriber

Licence: MIT License
Automatically transcribes YouTube videos

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to youtube-transcriber

asr24
24-hour Automatic Speech Recognition
Stars: ✭ 27 (-61.97%)
Mutual labels:  transcription
scription
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech
Stars: ✭ 46 (-35.21%)
Mutual labels:  transcription
simple diarizer
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code
Stars: ✭ 26 (-63.38%)
Mutual labels:  transcription
speechmatics-python
Python library and CLI for Speechmatics
Stars: ✭ 24 (-66.2%)
Mutual labels:  transcription
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-81.69%)
Mutual labels:  transcription
Braille-Translator
Translates standard alphabet based text to Grade 2 Braille and back.
Stars: ✭ 29 (-59.15%)
Mutual labels:  transcription
React Plyr
📺 A React video component based on Plyr
Stars: ✭ 254 (+257.75%)
Mutual labels:  youtube
BrokenDisc
A discord music bot || NO CODING NEEDED!
Stars: ✭ 52 (-26.76%)
Mutual labels:  youtube
gtranscribe
Software for interview transcription
Stars: ✭ 12 (-83.1%)
Mutual labels:  transcription
slamdunk
Streamlining SLAM-seq analysis with ultra-high sensitivity
Stars: ✭ 24 (-66.2%)
Mutual labels:  transcription
glaemscribe
Glaemscribe, the tolkienian languages/writings transcription engine.
Stars: ✭ 29 (-59.15%)
Mutual labels:  transcription
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-70.42%)
Mutual labels:  transcription
realtime-transcription-playground
A real-time transcription project using React and socketio
Stars: ✭ 101 (+42.25%)
Mutual labels:  transcription
parlatype
GNOME audio player for transcription
Stars: ✭ 151 (+112.68%)
Mutual labels:  transcription
CCAligner
🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
Stars: ✭ 131 (+84.51%)
Mutual labels:  transcription
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+398.59%)
Mutual labels:  transcription
cmu-pronouncing-dictionary
The 134,000+ words and their pronunciations in the CMU pronouncing dictionary
Stars: ✭ 46 (-35.21%)
Mutual labels:  transcription
laravel-video-api
Laravel (Youtube/Vimeo) Video Data API
Stars: ✭ 53 (-25.35%)
Mutual labels:  youtube
madoc-platform
A platform for the display, enrichment, and curation of IIIF-based digital objects
Stars: ✭ 28 (-60.56%)
Mutual labels:  transcription
nashi
Some bits of javascript to transcribe scanned pages using PageXML
Stars: ✭ 13 (-81.69%)
Mutual labels:  transcription

YouTube Transcriber

Need to transcribe a YouTube video? Turns out that YouTube already makes pretty good automated captions, even though this feature is little known. With this script, you can grab those captions in seconds -- so you can, effectively, automatically transcribe your YouTube video!

This script has been tested on several videos and should work with no setup besides having Python and Jupyter installed. Here's how you can do it!

Example

Consider this example video. YouTube provides a timedtext file for it (we'll show you how to get it in a bit) that contains the auto-generated captions. The timedtext looks a bit like this:

<timedtext format="3">
    <body>
    <p t="2810" d="7450">
When you grow up you, tend to get told that the world is the way it is and your life is
</p>
    <p t="10260" d="5299">
just to live your life inside the world, try not to bash into the walls too much, try to
</p>
    <p t="15559" d="6351">have a nice family, have fun, save a little money.</p>

    ...

    <p t="83960" d="1819">
Once you learn that, you’ll never be the same again.”
</p>
    </body>
</timedtext>

This script formats it as such:

When you grow up you, tend to get told that the world is the way it is and your life is just to live your life inside the world, try not to bash into the walls too much, try to have a nice family, have fun, save a little money. That’s a very limited life. Life can be much broader, once you discover one simple fact, and that is that everything around you that you call life was made up by people that were no smarter than you. And you can change it, you can influence it, you can build your own things that other people can use. And the minute that you understand that you can poke life and actually something will, you know if you push in, something will pop out the other side, that you can change it, you can mold it. That’s maybe the most important thing. It’s to shake off this erroneous notion that life is there and you’re just gonna live in it, versus embrace it, change it, improve it, make your mark upon it. I think that’s very important and however you learn that, once you learn it, you’ll want to change life and make it better, cause it’s kind of messed up, in a lot of ways. Once you learn that, you’ll never be the same again.”

Getting started

First, fire up the Jupyter notebook. Here's how to install Jupyter, if you don't already have it.

Next you need to get the URL of a timedtext file from your YouTube video. This file contains YouTube's formatted, auto-generated transcription.

Start by opening up the YouTube video you want to automatically transcribe. Here's an example.

YouTube video home

Immediately pause the video. Now open the developer console (Cmd+Opt+K on Macs):

Open dev console

Then go to the "Network" tab:

Network tab

Now enter timedtext in the filter bar:

Search for timedtext

You should see nothing. Now turn on closed captions on the video (as by hitting the "CC" button):

Turn on closed captions

Then you'll see something appear in the console! Click on the "name" text and you'll see a URL appear:

timedtext appears

Copy that Request URL and open it in a new tab. You should see something like this:

Open timedtext

That's the URL you want! It'll be something like https://www.youtube.com/api/timedtext?caps&hl=en_US&expire=1502423643&v=0Ydp6bR5HXw....

Finally, open up the iPython notebook, paste that URL in the parameters field, and run the script. The text will be printed and also put into the tmp/out.txt files!

iPython output

Pitfalls

This script may not capture all the punctuation and capitalization that you need, and it might not transcribe some words properly. (YouTube's transcription is good, but not perfect!)

As such, I recommend going through the output manually to make sure you catch anything YouTube missed.

Results

I used this script for a project where we had to transcribe several YouTube videos with just hours to spare before the deadline.

This project let us transcribe a 10-minute video in about 15 minutes (mostly because we had to add extensive hand-edits), whereas transcribing manually would have taken about an hour.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].