All Projects β†’ gillesdemey β†’ Google Speech V2

gillesdemey / Google Speech V2

πŸ’¬ Reverse Engineering Google's Speech To Text API (v2)

Projects that are alternatives of or similar to Google Speech V2

Go Astibob
Golang framework to build an AI that can understand and speak back to you, and everything else you want
Stars: ✭ 222 (-48.97%)
Mutual labels:  audio, text-to-speech
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+346.44%)
Mutual labels:  audio, text-to-speech
Dx7 Supercollider
My accurate Yamaha DX-7 clone. Programmed in Supercollider.
Stars: ✭ 395 (-9.2%)
Mutual labels:  audio
Bitmidi.com
🎹 Listen to free MIDI songs, download the best MIDI files, and share the best MIDIs on the web
Stars: ✭ 422 (-2.99%)
Mutual labels:  audio
Web Audio Samples
Web Audio API samples by Chrome WebAudio Team
Stars: ✭ 402 (-7.59%)
Mutual labels:  audio
Pyaudioanalysis
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Stars: ✭ 4,487 (+931.49%)
Mutual labels:  audio
Rust Av
Multimedia Toolkit written in pure rust.
Stars: ✭ 411 (-5.52%)
Mutual labels:  audio
Mystiq
Qt5/C++ FFmpeg Media Converter
Stars: ✭ 393 (-9.66%)
Mutual labels:  audio
Ffmpegcore
A .NET FFMpeg/FFProbe wrapper for easily integrating media analysis and conversion into your C# applications
Stars: ✭ 429 (-1.38%)
Mutual labels:  audio
Ytmdl Web V2
Web version of ytmdl. Allows downloading songs with metadata embedded from various sources like itunes, gaana, LastFM etc.
Stars: ✭ 398 (-8.51%)
Mutual labels:  audio
Lavalink
Standalone audio sending node based on Lavaplayer.
Stars: ✭ 420 (-3.45%)
Mutual labels:  audio
Audiofile
A simple C++ library for reading and writing audio files.
Stars: ✭ 399 (-8.28%)
Mutual labels:  audio
Auto Editor
Auto-Editor: Effort free video editing!
Stars: ✭ 382 (-12.18%)
Mutual labels:  audio
Recordmp3js
Record MP3 files directly from the browser using JS and HTML
Stars: ✭ 413 (-5.06%)
Mutual labels:  audio
Free Spoken Digit Dataset
A free audio dataset of spoken digits. Think MNIST for audio.
Stars: ✭ 396 (-8.97%)
Mutual labels:  audio
Audiogridder
DSP servers using general purpose networks and computers - https://audiogridder.com
Stars: ✭ 423 (-2.76%)
Mutual labels:  audio
Android Openslmediaplayer
Re-implementation of Android's MediaPlayer and audio effect classes based on OpenSL ES APIs.
Stars: ✭ 393 (-9.66%)
Mutual labels:  audio
Matchering
🎚️ Open Source Audio Matching and Mastering
Stars: ✭ 398 (-8.51%)
Mutual labels:  audio
Flexasio
A flexible universal ASIO driver that uses the PortAudio sound I/O library. Supports WASAPI (shared and exclusive), KS, DirectSound and MME.
Stars: ✭ 403 (-7.36%)
Mutual labels:  audio
Labsound
πŸ”¬ πŸ”ˆ graph-based audio engine
Stars: ✭ 429 (-1.38%)
Mutual labels:  audio

Google Speech API v2:

NOTICE

Google has since launched it's official Google Cloud Speech API. I strongly recommend looking over there.

Host:

https://www.google.com/speech-api/v2/recognize

Parameters

output: json, xml not supported.

lang: any valid locale (en-us, nl-be, fr-fr, etc.)

key: Please get one from the Google Developers Console

Key is not optional.

app: optional

You can specify an optional query string called app, which returns some extra transcripts for some reason.

client: optional, seems to do nothing in particular

Data:

FLAC

Flac file; 44100Hz 32bit float, exported with Audacity. Check the audio folder in this repository for some hilarious examples.

Channels       : 2
Sample Rate    : 44100
Precision      : 32-bit
Sample Encoding: 32-bit Float

16-bit PCM

The following audio options are confirmed working for 16-bit PCM sample encoding:

Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

One-line sox recording command:

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Headers:

Content-Type:

Content-Type: audio/x-flac; rate=44100;

Set the rate to be equal to the rate of the FLAC file (generally 44100Hz) but it supports different rates.

Content-Type: audio/l16; rate=16000; is also supported with a rate of 44100Hz or 16000Hz for files encoded with LPCM 16-bit signed-integer.

NOTE: Make sure the rate in your header matches the sample rate you used for your audio capture.

User-Agent:

not required, but for spoofing purposes use one of Chrome’s userAgent strings.

Response:

When Google is 100% confident in it's translation, it will return the following object:

{
   "result":[
      {
         "alternative":[
            {
               "transcript":"good morning Google how are you feeling today"
            }
         ],
         "final":true
      }
   ],
   "result_index":0
}

When it's doubtful, it adds a confidence parameter for you. It also seems to add multiple transcripts for some reason.

{
  "result":[
    {
      "alternative":[
        {
          "transcript":"this is a test",
          "confidence":0.97321892
        },
        {
          "transcript":"this is a test for"
        }
      ],
      "final":true
    }
  ],
  "result_index":0
}

Example

Install sox

On OS X with Homebrew installed:

brew install sox

Record audio

rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav

Send the request

curl -X POST \
--data-binary @'audio/hello (16bit PCM).wav' \
--header 'Content-Type: audio/l16; rate=16000;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Or for FLAC encoded audio:

curl -X POST \
--data-binary @audio/good-morning-google.flac \
--header 'Content-Type: audio/x-flac; rate=44100;' \
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'

Caveats

Here are a few caveats you have to know about, should you decide to use this API in a production environment. (I don't recommend it)

  • The API only accepts up to ~10-15 seconds of audio.
  • Generating your own Speech API Key, you can only make 50 requests per day.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].