All Projects → repp → big-phoney

repp / big-phoney

Licence: GPL-3.0 license
Get phonetic spellings and syllable counts for any english word. Works with made-up and non-dictionary words

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to big-phoney

Britfone
British English pronunciation dictionary
Stars: ✭ 66 (+1.54%)
Mutual labels:  english, phonetics
word frequency
[Refactoring scheduled] 📊 A script to analyse the frequencies of words in web pages.
Stars: ✭ 13 (-80%)
Mutual labels:  english
YouTube to m3u
Grab .m3u8 from YouTube live channels and makes .m3u IPTV Playlist from various languages and Events. Tamil / Malayalam / English / Hindi / French / Kids / Sports / Urudu etc.
Stars: ✭ 48 (-26.15%)
Mutual labels:  english
spectrogram-tutorial
A walkthrough of how to make spectrograms in python that are customized for human speech research.
Stars: ✭ 31 (-52.31%)
Mutual labels:  phonetics
ety-python
A Python module to discover the etymology of words
Stars: ✭ 110 (+69.23%)
Mutual labels:  english
cmu-pronouncing-dictionary
The 134,000+ words and their pronunciations in the CMU pronouncing dictionary
Stars: ✭ 46 (-29.23%)
Mutual labels:  english
new-word-tab
A browser extension to learn a new word per new tab
Stars: ✭ 30 (-53.85%)
Mutual labels:  english
wrangling-genomics
Data Wrangling and Processing for Genomics
Stars: ✭ 49 (-24.62%)
Mutual labels:  english
DWords2
Show words as Danmaku on the screen to help you memorize them | 把单词变成屏幕上的弹幕来帮助你记住它们
Stars: ✭ 66 (+1.54%)
Mutual labels:  english
bem-flashcards
Simple single-page flashcards application based on the bem-core/bem-history and BEM methodology
Stars: ✭ 19 (-70.77%)
Mutual labels:  english
OpenRefine-ecology-lesson
Data Cleaning with OpenRefine for Ecologists
Stars: ✭ 20 (-69.23%)
Mutual labels:  english
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (-1.54%)
Mutual labels:  english
syng
A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
Stars: ✭ 108 (+66.15%)
Mutual labels:  english
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-36.92%)
Mutual labels:  english
cummings.ee
A collection of the work of Edward Estlin Cummings, as it enters the public domain.
Stars: ✭ 32 (-50.77%)
Mutual labels:  english
SoMeWeTa
A part-of-speech tagger with support for domain adaptation and external resources.
Stars: ✭ 20 (-69.23%)
Mutual labels:  english
rginger
RGinger takes an English sentence and gives correction and rephrasing suggestions for it using Ginger proofreading API.
Stars: ✭ 13 (-80%)
Mutual labels:  english
lc-data-intro
Library Carpentry: Introduction to Working with Data (Regular Expressions)
Stars: ✭ 16 (-75.38%)
Mutual labels:  english
tudien
Từ điển tiếng Việt dành cho Kindle
Stars: ✭ 38 (-41.54%)
Mutual labels:  english
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (+129.23%)
Mutual labels:  english

Big Ohoney

Build Status License: GPL v3

Big Phoney is a python module that generates phonetic pronunciations from english words. For example, given the word "dinosaur", Big Phoney will return "D AY1 N AH0 S AO2 R". This is sometimes called "Grapheme-to-Phoneme Conversion" or G2P. Big Phoney works for any word, even those that don't appear in the dictionary and it's designed to handle special cases like currency and abbreviations.

Phonetic pronunciations are represented using the ARPAbet phoneme set.

Big Phoney can also count the number of syllables in any english word.

How it Works

When possible, pronunciations come from a dictionary which contains 134,000 words. Slang, proper-nouns, and made-up words that don't appear in the standard dictionary are predicted using a model. You can read more about the pronunciation prediction model on Kaggle.

Additionally, Big Phoney has a number of configurable preprocessors to handle cases where proper pronunciation requires special knowledge. For example "$5.00" is pronounced "F AY1 V D AA1 L ER0 Z". Currently, Big Phoney can handle: numbers, currency, times, symbols, abbreviations, email addresses, and urls.

Accuracy

For any of the 134,000 words found in the dictionary, you can expect 100% accurate pronunciations and syllable counts. For words not found in the dictionary, you can expect syllable counts to be accurate 98.1% of the time and pronunciations to be perfectly accurate 75.4% of the time. Even when a predicted pronunciation isn't completely correct, it's often very close.

Installation

Install with PyPI:

pip3 install big-phoney

Install from source:

git clone https://github.com/repp/big-phoney.git
cd big_phoney
python setup.py install

Note: BigPhoney only works with python3 right now.

Usage

First, import Big Phoney:

from big_phoney import BigPhoney

Next, create an instance of the main class:

phoney = BigPhoney()

This will load the phonetic dictionary and prediction model into memory. It may take a second. It's in your best interest to not create multiple instances of this class.

Call phonize to generate phonetic spellings from words.

phoney.phonize('pterodactyl')  # --> 'T EH2 R OW0 D AE1 K T AH0 L'

# Works with multiple words. Individual pronunciations are seperate by 2 spaces:
phoney.phonize('tyrannosaurus rex')  # --> 'T IH0 R AE0 N AH0 S AO1 R AH0 S  R EH1 K S'

Call count_syllables to get the number of syllables in a word or phrase.

phoney.count_syllables('bird')  # --> 1
phoney.count_syllables('triceratops')  # --> 4

# Given multiple words, Big Phoney returns the total number of syllables:
phoney.count_syllables('welcome to jurassic park')  # --> 7

# If you want a list of syllable counts, try something like:
[phoney.count_syllables(word) for word in 'welcome to jurassic park'.split()]  # --> [2,1,3,1]

Preprocessors

Big Phoney has a number of default preprocessors designed to improve pronunciation results in special cases.

DEFAULT_PREPROCESSORS = [ExpandCurrencySymbols, FormatEmailAndURLs, ReplaceTimes,  SpacePadSymbols,
                             SpacePadNumbers, ReplaceAbbreviations, ReplaceNumbers]

By default all of the above preprocessors are applied. You can add and remove them when creating a Big Phoney instance with the preprocessors keyword argument:

phoney = BigPhoney(preprocessors=[ReplaceNumbers])  # Only preprocess numbers

To skip preprocessing entirely, just pass an empty list:

phoney = BigPhoney(preprocessors=[])  # No preprocessing

Be careful when adjusting the default preprocessors. Their order is important as some rely on other 'upstream' processors to be most effective.

To test a preprocessor setup, use the apply_preprocessors method:

phoney = BigPhoney()  # Use default preprocessors
phoney.apply_preprocessors('£7.89')  # --> 'seven pounds and eighty-nine pence'
phoney.apply_preprocessors('Mt St. Helens')  # --> 'mount saint helens'
phoney.apply_preprocessors('[email protected]')  # --> 'no underscore reply at gmail dot com'
phoney.apply_preprocessors('1ft + 2ft = 3ft')  # --> 'one foot plus two feet equals three feet'
phoney.apply_preprocessors("It'll be 7:00am in 1,245.6 seconds")  # --> 'it'll be seven o'clock a m in one-thousand, two hundred and forty-five point six seconds'

Writing your own preprocessors is easy. Any class with a process method that inputs and outputs a single string is valid. For example:

class DummyPreprocessor:

    def process(self, input_string):
        # do some preprocessing here!
        return input_string

Other Options

As mentioned, Big Phoney uses a dictionary and a model to generate pronunciations, if you only want to use one or the other, you can create instances of each individually:

from big_phoney import PhoneticDictionary
phonetic_dict = PhoneticDictionary()
phonetic_dict.lookup('paleontologist')  # --> 'P EY2 L IY0 AH0 N T AA1 L AH0 JH IH0 S T' ✅
phonetic_dict.lookup('fakeosaur')  # --> None ❌
from big_phoney import PredictionModel
pred_model = PredictionModel()
pred_model.predict('paleontologist')  # --> 'P EY2 L IY0 AH0 N T AA1 L AH0 JH IH0 S T' ✅
phonetic_dict.lookup('fakeosaur')  # --> 'F EY1 K OW0 S AO2 R' ✅

The dictionary is faster and always correct but won't always return a result. The model is slower and less reliably accurate but it will always return something no matter what you throw at it. In most cases, you should just stick with the BigPhoney class.

Contributing

If you want to contribute to this project that's great! Make sure to check out dev/README.md for more info.

Acknowledgements

The dictionary and data used to train the phonetic prediction model came from the CMU Pronunciation Dictionary.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].