Pronounce
Break words up into their syllables and phones.
Usage
require 'pronounce'
Pronounce.how_do_i_pronounce('monkeys')
=> [["M", "AH1", "NG"], ["K", "IY0", "Z"]]
Data and Procedure
Pronunciations are based on the CMUdict database: http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/
The phone list is the ARPAbet subset used by CMUdict.
Vowels
Monophthongs: AA, AE, AH, AO, EH, IH, IY, UH, UW
Diphthongs: AW, AY, EY, OW, OY
R-colored: ER
Consonants
Aspirates: HH
Stops: B, D, G, K, P, T
Affricates: CH, JH
Fricatives: DH, F, SH, S, TH, V, Z, ZH
Nasals: M, N, NG
Liquids: L, R
Semivowels: W, Y
CMUdict contains pronunciations of North American English and ARPAbet represents the phonemes of General American English so those are currently the only dialect and accent supported.
Syllables are split by scanning the pronunciation from the start to finish and applying rules of English phonology to determine if the current phone is the start of a new syllable. Because the pronunciations are corpus based rules only need to split valid words, not determine if a word is valid.
Rules are defined by the rule DSL. A rule can return :new_syllable
, :no_new_syllable
, or :not_applicable
indicating that the rule doesn't apply in the context and other rules should be evaluated.
Declaration
module Pronounce::SyllableRules
rule :optional_language, 'name of rule' do
...
end
end
Ruby Support
- MRI 2.1+
- JRuby 9.0.0.0.rc1
- Rubinius 2.5.5