All Projects → alumae → Ruby Pocketsphinx Server

alumae / Ruby Pocketsphinx Server

Licence: other
Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module

Programming Languages

ruby
36898 projects - #4 most used programming language

= Introduction

Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.

= Requirements

  • Ruby 1.8
  • Sinatra
  • Rack
  • Unicorn
  • PocketSphinx (NOTE: some features of the server require patched PocketSphinx, see below)
  • Some acoustic and language models for PocketSphinx

= Installing

== CMU Sphinx

  • Install sphinxbase from SVN (make, make install)

=== Apply PocketSphinx patch

In cmusphinx/pocketsphinx directory:

wget http://www.phon.ioc.ee/~tanela/ps_gst.patch patch -p0 -i ps_gst.patch

Make sure you have GStreamer devevelopment packages installed. In Debian Squeeze:

apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev

And configure, make, make install as usual.

== Install Ruby gems: Unicorn and Sinatra, UUID tools, JSON, locale

This assumes you have ruby and rubygems installed.

You might want to do this as root:

gem install unicorn gem install sinatra gem install uuidtools gem install json gem install locale

Install ruby-gstreamer package (might vary depending on your distribution):

apt-get install libgst-ruby1.8

== Additional tools

English GF-based recognizer also need:

== Run ruby-pocketsphinx-server

Clone the git repository:

git clone git://github.com/alumae/ruby-pocketsphinx-server.git

Before executing, add /usr/local/lib to the path where GStreamer plugins are looked for:

export GST_PLUGIN_PATH=/usr/local/lib

= Running

unicorn -c unicorn.conf.rb config.ru

If you installed Unicorn as a Ruby gem, you might need to execute:

/var/lib/gems/1.8/bin/unicorn -c unicorn.conf.rb config.ru

Test the default configuration (English WSJ language model with HUB4 acostic models), using a raw audio file in the PocketSphinx test directory (replace $(POCKETSPHINX_DIR) with the Pocketsphinx source directory):

curl -T $(POCKETSPHINX_DIR)/test/data/wsj/n800_440c0207.wav -H "Content-Type: audio/x-wav" "http://localhost:8080/recognize"

Response should be:

{ "status": 0, "hypotheses": [ { "utterance": "the agency isn't likely to take any action until the union's rank and file votes on the contract into three weeks" }, { "utterance": "the agency isn't likely to take any action until the union's rank and file puts on the contract into three weeks" }, { "utterance": "the agency isn't likely to take any action until the union's rank and file funds from the contract into three weeks" }, { "utterance": "the agency isn't likely to take any action until the union's rank and file for from the contract into three weeks" }, { "utterance": "the agency isn't likely to take any action until the union's rank and file parts of the contract into three weeks" } ], "id": "8686a37b5674cbdc63deb13f73de81a5" }

= Configuration

== Web service

Unicorn configuration is in file unicorn.conf.rb. See http://unicorn.bogomips.org/examples/unicorn.conf.rb for more info.

== Recognizer

See conf.yaml

= Using the web service

Some of the more advanced examples below are specific to the Estonian configuration.

==Example 1

Record a sentence to a wav file, in mono (hit Ctrl-C when done speaking):

rec -c 1 sentence.wav

Send it to the web service:

curl -X POST --data-binary @sentence.wav -H "Content-Type: audio/x-wav" http://localhost:8080/recognize

Output (encoded using json, the example uses Estonian models):

{ "status": 0, "hypotheses": [ { "utterance": [ "t\u00e4na on v\u00e4ljas \u00fcsna ilus ilm" ] } ], "id": "e30f54561135d681599915562d77d240" }

== Example 2

Record a raw file using arecord:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 > sentence2.raw

Send it to web service:

curl -X POST --data-binary @sentence2.raw -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize

== Example 3

Record a 5 second audio, pipe it to curl, which streams it directly to web service using PUT (and gets almost instant response):

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize

= Support for JSGF grammars

Users can use their own grammars to recognize certain sentences. The grammars should be in JSGF format.

Example JSGF (let's call it robot.jsgf)

#JSGF V1.0;

grammar robot;

public = (liigu | mine ) [ ( üks | kaks | kolm | neli | viis ) meetrit ] (edasi | tagasi);

NB! Grammars should be in the same charset that the server is using for dictionary, which currently is latin-1 (sorry for that).

You need to upload the JSGF file to somewhere where the server can fetch it, let's say http://www.example.com/robot.txt

Now, let the server download and compile it:

curl -vv http://localhost:8080/fetch-lm?url=http://www.example.com/robot.jsgf

This should result in HTTP/1.1 200 OK.

Now you can use the grammar to recognize a sentence that is accepted by the grammar:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 |
curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize?lm=http://www.example.com/robot.jsgf

Result:

{ "status": 0, "hypotheses": [ { "utterance": "mine viis meetrit tagasi" } ], "id": "9e3895e9ee0b5138e73c6fca30f51a58" }

If you update the grammar on the server, you need to make the /fetch-jsgf request again, as the server doesn't check for changes every time a recognition request is done (for efficiency reasons).

= Support for GF grammars

GF (Grammatical Framework) grammars are supported.

A GF grammar must be compiled into a .pgf file. To upload it to the server, use the fetch-pgf API call, e.g.:

curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&lang=Est"

The 'lang' attribute (defaults to 'Est') specifies input languages of the grammar. Many comma-separated languages can be specified, e.g lang=Est,Est2

To recognize with a GF, use similar request as with JSGF, e.g.:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf

You can also specify output language(s) that will be used to linearize the raw recognition result, e.g.:

arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&output-lang=App"

Output:

{ "status": 0, "hypotheses": [ { "utterance": "viis minutit sekundites", "linearizations": [ { "lang": "App", "output": "5 ' IN "" }, { "lang": "App", "output": "5 min IN s" } ] } ], "id": "83486feaca30995401ed4a66951a3f23" }

Multiple output languages can be used, by using comma-separated values: "..&output-lang=App,App2"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].