All Projects → kwea123 → Unity_live_caption

kwea123 / Unity_live_caption

Licence: MIT license
Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!

Programming Languages

python
139335 projects - #7 most used programming language
C#
18002 projects

Projects that are alternatives of or similar to Unity live caption

speechmatics-python
Python library and CLI for Speechmatics
Stars: ✭ 24 (-7.69%)
Mutual labels:  speech-recognition, speech-to-text
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (+130.77%)
Mutual labels:  speech-recognition, speech-to-text
web-speech-cognitive-services
Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.
Stars: ✭ 35 (+34.62%)
Mutual labels:  speech-recognition, speech-to-text
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-19.23%)
Mutual labels:  speech-recognition, speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+3134.62%)
Mutual labels:  speech-recognition, speech-to-text
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+242.31%)
Mutual labels:  speech-recognition, speech-to-text
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (+465.38%)
Mutual labels:  speech-recognition, speech-to-text
revai-java-sdk
Rev.ai Java SDK
Stars: ✭ 16 (-38.46%)
Mutual labels:  speech-recognition, speech-to-text
Inimesed
An Android app that lets you search your contacts by voice. Internet not required. Based on Pocketsphinx. Uses Estonian acoustic models.
Stars: ✭ 65 (+150%)
Mutual labels:  speech-recognition, speech-to-text
DeepSpeech-API
The code enables users to use Mozilla's Deep Speech model over the Web Browser.
Stars: ✭ 31 (+19.23%)
Mutual labels:  speech-recognition, speech-to-text
React.ai
It recognize your speech and trained AI Bot will respond(i.e Customer Service, Personal Assistant) using Machine Learning API (DialogFlow, apiai), Speech Recognition, GraphQL, Next.js, React, redux
Stars: ✭ 38 (+46.15%)
Mutual labels:  speech-recognition, speech-to-text
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-46.15%)
Mutual labels:  speech-recognition, speech-to-text
octopus
On-device speech-to-index engine powered by deep learning.
Stars: ✭ 30 (+15.38%)
Mutual labels:  speech-recognition, speech-to-text
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-3.85%)
Mutual labels:  speech-recognition, speech-to-text
web-voice-processor
A library for real-time voice processing in web browsers
Stars: ✭ 69 (+165.38%)
Mutual labels:  speech-recognition, speech-to-text
speechrec
a simple speech recognition app using the Web Speech API Interfaces
Stars: ✭ 18 (-30.77%)
Mutual labels:  speech-recognition, speech-to-text
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+588.46%)
Mutual labels:  speech-recognition, speech-to-text
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+103.85%)
Mutual labels:  speech-recognition, speech-to-text
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (+92.31%)
Mutual labels:  speech-recognition, speech-to-text
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-19.23%)
Mutual labels:  speech-recognition, speech-to-text

Unity_live_caption

繁體中文解說看這裡

Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!

This is part of the OpenVTuberProject, which provides many toolkits for becoming a VTuber.

Important notice before you continue : The speech to text API is NOT free! The pricing guide is here.

The youtube livestream that demos and explains how this works (explanation in Chinese, caption in Chinese/Japanese/English/French).

Currently, the live caption is done in python and the result is sent to unity in real time. There might be a way to do everything in C#, maybe this but I did in python because of some reasons:

  1. I'm not fluent in C#.
  2. Doing speech recognition in another program allows to start/turn off the recognition at any time, and also allows to change the language at wish without restarting unity .exe.
  3. There is already an asset which claims that it can do this (I don't know if it can do real time recognition though).

Pre-requisite

As this process uses Google Cloud API, you need to have an google account.

Follow the website to activate the Speech-to-Text API in the console, and download the API key, which should be a .json file. I will refer this key to be key.json in the following.

Next, there are command line (CLI) version and GUI versions of this program. The code is the same but there are some performance differences:

CLI: file size is small and allows more customization.

GUI: file size is large (about 250MB) and takes some time to warm-up the speech to text program.

Here is the tutorial of command line usage. For GUI users, please jump to here.

Installation

Make sure you have python. If not, installation is recommended via Anaconda with python version 3.6 (if you use other versions, you need to manually download and install pyaudio from here).

Run pip install -r requirements.txt to install python dependencies.

Usage

  1. Test if speech recognition works in python:

    1. Change here to where your key.json is located.
    2. Run python googlesr.py --debug --lang_code={YOUR LANGUAGE CODE}. For the language codes, check here. You should see the recognition output on the console.
  2. Output the recognition result to unity:

    1. Create a Text component via GameObject->UI->Text.
    2. Attach subtitleListener.cs to it.
    3. Run the unity program FIRST, either in editor or executable, then run python googlesr.py --lang_code={YOUR LANGUAGE CODE} --connect. You should see the recognition output now in unity. You can stop and restart the recognition anytime by pressing Ctrl and c in the python console without affecting the unity program at all.
  3. Remember to stop the python program when you finish the work, otherwise it is going to keep charging you! I disclaim any reponsibility of the induced charges by using my program.

Customization

  1. You can change the connection port by changing the port number (default 5067) here and here

  2. You can change how the text is printed on unity here and here. The default is configured to print at most 32 characters in Chinese, so you might need to change if you're not using Chinese.

GUI usage

  1. Download googlesr_gui_english.zip from here.

  2. Open googlesr_gui_english.exe and you will see

alt

  1. Select your language, set the API key to where you downloaded key.json and select whether to connect to unity and/or print to console (if you want to connect to unity, please see the second point here).

  2. Press Start to start. It takes some time to warm-up. When it's ready, you will see the following and you can start to talk. You can adjust the size of this window.

alt

  1. Press Ctrl and c to stop the program when you finish.

  2. Remember to stop the program when you finish the work, otherwise it is going to keep charging you! I disclaim any reponsibility of the induced charges by using my program.

Other issues

Please ask in issue

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].