Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gillesdemey → Google Speech V2

gillesdemey / Google Speech V2

💬 Reverse Engineering Google's Speech To Text API (v2)

Labels

audio text-to-speech

Projects that are alternatives of or similar to Google Speech V2

Go Astibob

Golang framework to build an AI that can understand and speak back to you, and everything else you want

Stars: ✭ 222 (-48.97%)

Mutual labels: audio, text-to-speech

Aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Stars: ✭ 1,942 (+346.44%)

Mutual labels: audio, text-to-speech

Dx7 Supercollider

My accurate Yamaha DX-7 clone. Programmed in Supercollider.

Stars: ✭ 395 (-9.2%)

Mutual labels: audio

Bitmidi.com

🎹 Listen to free MIDI songs, download the best MIDI files, and share the best MIDIs on the web

Stars: ✭ 422 (-2.99%)

Mutual labels: audio

Web Audio Samples

Web Audio API samples by Chrome WebAudio Team

Stars: ✭ 402 (-7.59%)

Mutual labels: audio

Pyaudioanalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Stars: ✭ 4,487 (+931.49%)

Mutual labels: audio

Rust Av

Multimedia Toolkit written in pure rust.

Stars: ✭ 411 (-5.52%)

Mutual labels: audio

Mystiq

Qt5/C++ FFmpeg Media Converter

Stars: ✭ 393 (-9.66%)

Mutual labels: audio

Ffmpegcore

A .NET FFMpeg/FFProbe wrapper for easily integrating media analysis and conversion into your C# applications

Stars: ✭ 429 (-1.38%)

Mutual labels: audio

Ytmdl Web V2

Web version of ytmdl. Allows downloading songs with metadata embedded from various sources like itunes, gaana, LastFM etc.

Stars: ✭ 398 (-8.51%)

Mutual labels: audio

Lavalink

Standalone audio sending node based on Lavaplayer.

Stars: ✭ 420 (-3.45%)

Mutual labels: audio

Audiofile

A simple C++ library for reading and writing audio files.

Stars: ✭ 399 (-8.28%)

Mutual labels: audio

Auto Editor

Auto-Editor: Effort free video editing!

Stars: ✭ 382 (-12.18%)

Mutual labels: audio

Recordmp3js

Record MP3 files directly from the browser using JS and HTML

Stars: ✭ 413 (-5.06%)

Mutual labels: audio

Free Spoken Digit Dataset

A free audio dataset of spoken digits. Think MNIST for audio.

Stars: ✭ 396 (-8.97%)

Mutual labels: audio

Audiogridder

DSP servers using general purpose networks and computers - https://audiogridder.com

Stars: ✭ 423 (-2.76%)

Mutual labels: audio

Android Openslmediaplayer

Re-implementation of Android's MediaPlayer and audio effect classes based on OpenSL ES APIs.

Stars: ✭ 393 (-9.66%)

Mutual labels: audio

Matchering

🎚️ Open Source Audio Matching and Mastering

Stars: ✭ 398 (-8.51%)

Mutual labels: audio

Flexasio

A flexible universal ASIO driver that uses the PortAudio sound I/O library. Supports WASAPI (shared and exclusive), KS, DirectSound and MME.

Stars: ✭ 403 (-7.36%)

Mutual labels: audio

Labsound

🔬 🔈 graph-based audio engine

Stars: ✭ 429 (-1.38%)

Mutual labels: audio

View All Similar Projects ➔

Google Speech API v2:

NOTICE

Google has since launched it's official Google Cloud Speech API. I strongly recommend looking over there.

Host:

https://www.google.com/speech-api/v2/recognize

Parameters

output: json, xml not supported.

lang: any valid locale (en-us, nl-be, fr-fr, etc.)

key: Please get one from the Google Developers Console

Key is not optional.

app: optional

You can specify an optional query string called app, which returns some extra transcripts for some reason.

client: optional, seems to do nothing in particular

Data:

FLAC

Flac file; 44100Hz 32bit float, exported with Audacity. Check the audio folder in this repository for some hilarious examples.

Channels       : 2
Sample Rate    : 44100
Precision      : 32-bit
Sample Encoding: 32-bit Float

16-bit PCM

The following audio options are confirmed working for 16-bit PCM sample encoding:

Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

One-line sox recording command:

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Headers:

Content-Type:

Content-Type: audio/x-flac; rate=44100;

Set the rate to be equal to the rate of the FLAC file (generally 44100Hz) but it supports different rates.

Content-Type: audio/l16; rate=16000; is also supported with a rate of 44100Hz or 16000Hz for files encoded with LPCM 16-bit signed-integer.

NOTE: Make sure the rate in your header matches the sample rate you used for your audio capture.

User-Agent:

not required, but for spoofing purposes use one of Chrome’s userAgent strings.

Response:

When Google is 100% confident in it's translation, it will return the following object:

{
   "result":[
      {
         "alternative":[
            {
               "transcript":"good morning Google how are you feeling today"
            }
         ],
         "final":true
      }
   ],
   "result_index":0
}

When it's doubtful, it adds a confidence parameter for you. It also seems to add multiple transcripts for some reason.

{
  "result":[
    {
      "alternative":[
        {
          "transcript":"this is a test",
          "confidence":0.97321892
        },
        {
          "transcript":"this is a test for"
        }
      ],
      "final":true
    }
  ],
  "result_index":0
}

Example

Install sox

On OS X with Homebrew installed:

brew install sox

Record audio

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Send the request

curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Or for FLAC encoded audio:

curl -X POST \
--data-binary @audio/good-morning-google.flac \
--header 'Content-Type: audio/x-flac; rate=44100;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Caveats

Here are a few caveats you have to know about, should you decide to use this API in a production environment. (I don't recommend it)

The API only accepts up to ~10-15 seconds of audio.
Generating your own Speech API Key, you can only make 50 requests per day.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 435

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗