All Projects → smlum → scription

smlum / scription

Licence: AGPL-3.0 license
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to scription

deepspeech
A PyTorch implementation of DeepSpeech and DeepSpeech2.
Stars: ✭ 45 (-2.17%)
Mutual labels:  speech-to-text, deepspeech
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+18508.7%)
Mutual labels:  speech-to-text, deepspeech
serverless-transcribe
A simple UI for Amazon Transcribe
Stars: ✭ 49 (+6.52%)
Mutual labels:  transcription, aws-transcribe
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (+78.26%)
Mutual labels:  speech-to-text, deepspeech
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-54.35%)
Mutual labels:  speech-to-text, transcription
vspeech
📢 Complete V bindings for Mozilla's DeepSpeech TensorFlow based Speech-to-Text library. 📜
Stars: ✭ 38 (-17.39%)
Mutual labels:  speech-to-text, deepspeech
simple diarizer
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code
Stars: ✭ 26 (-43.48%)
Mutual labels:  speech-to-text, transcription
Deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Stars: ✭ 18,680 (+40508.7%)
Mutual labels:  speech-to-text, deepspeech
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-71.74%)
Mutual labels:  speech-to-text, transcription
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+669.57%)
Mutual labels:  speech-to-text, transcription
speech-to-text
Python helper for Google and IBM Watson speech-to-text cloud APIs.
Stars: ✭ 14 (-69.57%)
Mutual labels:  speech-to-text, transcription
speechmatics-python
Python library and CLI for Speechmatics
Stars: ✭ 24 (-47.83%)
Mutual labels:  speech-to-text, transcription
DeepSpeech-API
The code enables users to use Mozilla's Deep Speech model over the Web Browser.
Stars: ✭ 31 (-32.61%)
Mutual labels:  speech-to-text, mozilla-deepspeech
revai-node-sdk
Node.js SDK for the Rev AI API
Stars: ✭ 21 (-54.35%)
Mutual labels:  speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+1728.26%)
Mutual labels:  speech-to-text
Generate-Live-Transcription
This extension helps to get a real-time transcription of audio playing in the browser using Deep Speech.
Stars: ✭ 16 (-65.22%)
Mutual labels:  deepspeech
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (+219.57%)
Mutual labels:  speech-to-text
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-69.57%)
Mutual labels:  speech-to-text
aws-content-analysis
This project is a fully automated video search engine which uses AWS AI services for computer vision and speech recognition to catalog video archives.
Stars: ✭ 67 (+45.65%)
Mutual labels:  aws-transcribe
glaemscribe
Glaemscribe, the tolkienian languages/writings transcription engine.
Stars: ✭ 29 (-36.96%)
Mutual labels:  transcription

Scription ✍️

Scription is an editor for automated transcription services like Amazon Transcribe and Mozilla Deepspeech. It links transcript text to audio playback to bring love and joy to the transcription process ❤️ It's currently being developed bit by bit - if you have any feedback please feel free to send me a message.

Visit the Scription web app.

What Scription does

  • Highlight and scroll text as the audio plays
  • Control audio playback by clicking words in the text
  • Skip around in the audio with keyboard shortcuts

And some other useful stuff:

  • Highlight quotes and export them to csv
  • Seperate speech by speakers (AWS)
  • Highlight low confidence words (AWS)
  • Add punctuation (AWS)

Get started

Basic usage

  1. Run a transcription job using Amazon Transcribe or Mozilla Deepspeech
  2. Download the json output file
  3. Load the json file into Scription
  4. Load in your corresponding audio (see below for large audio files)
  5. You're good to go!

Saving and loading a project

'Save project' creates a text file which you can load into Scription at a later time. It preserves any text edits and annotations.

If you have 'Autosave' turned on it saves your edits every 5 seconds using cookies. This is less secure, but if you refresh the page, they should still be there.

Exporting

'Export text' creates a plain text file which includes the speaker tags - essentially the same thing as copy and pasting.

'Export annotations' creates a csv file with highlighted quotes by each category.

Audio control shortcuts

Audio playback can be controlled using keyboard shortcuts:

  • Go back 5s Ctrl + ,
  • Skip 5s Ctrl + .
  • Slow down Ctrl + Shift + ,
  • Speed up Ctrl + Shift + .

Uploading large audio files to Scription

Large audio files (above ~50mb) can cause playback issues. So can files with variable bitrates. Ideally you want the files to be less than 50mb.

To get around this you can compress audio down to a small file size. I recommend using a lossy file format (like mp3). It also helps to format it to mono, use a constant bitrate and reduce the bitrate.

You can manually adjust these using something like Audacity's "export to mp3", for example:

This can be a pain for multiple files. I used the following ffmpeg script to iterate through a folder of mp3 files, change the bitrates and sample rates to 8k, change to mono and save new audio files with the '.min.mp3' suffix:

find ./ -name “*.mp3” -exec ffmpeg -i "{}" -codec:a libmp3lame -b:a 8k -ac 1 -ar 8000 '$(basename {} min)’.mp3 \;

Comparing AWS Transcribe and Mozilla DeepSpeech

Amazon and Mozilla both offer automated speech-to-text services.

Amazon has a (fairly) easy to use web user interface, high accuracy and has lots of useful features, like speaker identification, custom volcabulary and punctuation. However, it costs money (1.44 per hour) and requires you to store data on their servers, which could be a privacy concern.

DeepSpeech is free and runs locally on your machine, so there are no privacy concerns. However, it requires you to download and run their pre-trained model using python from the command line. The accuracy is pretty average. You need to add your own punctuation, correct specialised volcabulary and seperate speakers. It also requires specific audio formats.

A quick comparison I considered price, setup, privacy, performance and features.

AWS Transcribe Mozilla DeepSpeech
cost ~1.44usd per hour free
setup web user interface python/command line
privacy data saved on Amazon's servers data saved locally
accuracy good ok
features lots text only

There are other big tech speech-to-text services from Google, IBM and Microsoft.

Setup AWS Transcribe

  • Follow their instructions
  • Requires setting up an account, S3 bucket, adding payment info, creating a job on Transcribe.

Setup Mozilla DeepSpeech

  • Follow their instructions
  • Helps to have some basic familiarity with python and command line
  • They have quite tight requirements for audio formats. It needs to be .wav, mono, sample rate 16000hz.

Cleaning audio for transcription

To use automated transcription services you may need to format audio in a particular way or clean it up (eg remove noise). I recommend Audacity for manual audio editing/formatting or ffmpeg for automated batch formatting.

Run Scription locally

  1. Clone the repository:
git clone https://github.com/smlum/scription
cd scription
  1. Install packages (requires node)
npm run install
  1. Run on a local server
npm run start

or for development (with browser sync):

npm run dev

Privacy

The Scription web app uses your browser's local storage. Nothing is uploaded onto another server using the app.

Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.

Credits

Scription is built using Bulma and hyperaudio

Thanks to likeleto for adding Google and Yandex support.

Support

If you need some help to setup scription, want to ask a question or simply get involved in the community, feel free to give me a shout.

License

scription was created by Sam Lumley and is licensed under the open source AGPLv3 license. If you're interested in using it in a proprietary application feel free to get in touch!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].