All Projects → JoseLlarena → Britfone

JoseLlarena / Britfone

Licence: MIT license
British English pronunciation dictionary

Projects that are alternatives of or similar to Britfone

cmu-pronouncing-dictionary
The 134,000+ words and their pronunciations in the CMU pronouncing dictionary
Stars: ✭ 46 (-30.3%)
Mutual labels:  dictionary, pronunciation, english
tudien
Từ điển tiếng Việt dành cho Kindle
Stars: ✭ 38 (-42.42%)
Mutual labels:  dictionary, english
leximaven
A command line tool for searching word-related APIs.
Stars: ✭ 20 (-69.7%)
Mutual labels:  dictionary, pronunciation
MyGoldenDict
My personal goldendict-dictionaries collection
Stars: ✭ 13 (-80.3%)
Mutual labels:  dictionary, english
wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+153.03%)
Mutual labels:  pronunciation, phonetics
jiten
jiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (-3.03%)
Mutual labels:  dictionary, english
new-word-tab
A browser extension to learn a new word per new tab
Stars: ✭ 30 (-54.55%)
Mutual labels:  dictionary, english
Memorize
🚀 Japanese-English-Mongolian dictionary. It lets you find words, kanji and more quickly and easily
Stars: ✭ 72 (+9.09%)
Mutual labels:  dictionary, english
folket
Swedish–English dictionary for macOS (December 20, 2020)
Stars: ✭ 31 (-53.03%)
Mutual labels:  dictionary, english
Google Ime Dictionary
日英変換・英語略語展開のための IME 追加辞書 📙 日本語から英語への和英変換や英語略語の展開を Google 日本語入力や ATOK などで可能にする IME 拡張辞書です
Stars: ✭ 30 (-54.55%)
Mutual labels:  dictionary, english
big-phoney
Get phonetic spellings and syllable counts for any english word. Works with made-up and non-dictionary words
Stars: ✭ 65 (-1.52%)
Mutual labels:  english, phonetics
English-Persian-Word-Database
English Persian Word Database - Popular database extensions
Stars: ✭ 19 (-71.21%)
Mutual labels:  dictionary, english
syng
A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
Stars: ✭ 108 (+63.64%)
Mutual labels:  dictionary, english
kengdic
Joe Speigle's Korean/English dictionary database
Stars: ✭ 76 (+15.15%)
Mutual labels:  dictionary, english
Vocabs
📚 A lightweight online dictionary integration to the command line. No browsers. No paperbacks.
Stars: ✭ 226 (+242.42%)
Mutual labels:  dictionary, english
cpwp
Chinese Programmer Wrong Pronunciation
Stars: ✭ 42 (-36.36%)
Mutual labels:  pronunciation, english
introcsharpbook
"Fundamentals of Computer Programming with C#" Book
Stars: ✭ 12 (-81.82%)
Mutual labels:  english
ieml-language
The IEML language database. A git database containing the translations for IEML expressions: USL (Uniform Semantic Locator)
Stars: ✭ 17 (-74.24%)
Mutual labels:  dictionary
asyncomplete-nextword.vim
Provides intelligent English autocomplete for asyncomplete.vim via nextword
Stars: ✭ 43 (-34.85%)
Mutual labels:  english
flask-mdict
Flask Mdict Server. Query word online with MDICT dictionary
Stars: ✭ 62 (-6.06%)
Mutual labels:  dictionary

Britfone

British English (RP/Standard Southern British ) pronunciation dictionary:

  • +16,000 entries including the top 10,000 most frequent words as per BNC and Google Web Corpus
  • IPA transcription including primary and secondary stress
  • MIT license
  • separate expansion dictionary spelling out punctuation and abbreviations
  • both American and British spelling variants
  • all UK counties
  • all London boroughs
  • all major UK towns
  • all European capitals
  • all US states
  • all common irregular plurals
  • all common irregular verbs

Format

The main dictionary's words are in upper case, comma-separated from their space-separated pronunciation. For words with multiple pronunciations, a parenthesised number is attached to the end:

RAINBOW, ɹ ˈeɪ n b ˌəʊ
RAINING, ɹ ˈeɪ n ɪ ŋ
RAISE, ɹ ˈeɪ z
RAISED, ɹ ˈeɪ z d
RAISES, ɹ ˈeɪ z ɪ z
RAISING, ɹ ˈeɪ z ɪ ŋ
RAISINS, ɹ ˈeɪ z ɪ n z
RALEIGH(1), ɹ ˈɑː l i
RALEIGH(2), ɹ ˈɔː l i

Stress marks are attached to the stressed vowel/diphthong.

Multi-unit words are separated by the underscore _, which stands for an actual space . This is to ease further processing:

COSTA_RICA, k ˌɒ s t ə ɹ ˈiː k ə

In the expansions dictionary entries are also in upper case, tab-separated from their expansions:

MON	MONDAY(1)
MON.	MONDAY(1)
MPG	MILES PER(1) GALLON
MPH	MILES PER(1) HOUR
MR	MISTER
MR.	MISTER
MRS	MISSIS
MRS.	MISSIS

Issues and remarks

  • strict IPA versus traditional phonetic symbols: the phonetic symbols are strictly as defined by the IPA, as opposed to how they have traditionally been used in many dictionaries and the language learning literature. In particular:

    • /ɐ/ instead of traditional /ʌ/
    • /ɹ/ instead of traditional /r/
    • /ɛ/ instead of traditional /e/
    • /ɜː/ instead of traditional /əː/
  • unstressed vowels as /ə/ and /ɪ/: due to the diversity of the sources for phonetic transcription, there's some inconsistency in how weak vowels are transcribed, though in most cases /ɪ/ is used, following the Collins Dictionary.

  • final i: final unstressed i's are given a short tense "i" phoneme /i/, different from both /iː/ and /ɪ/, to reflect happy-tensing. Most dictionaries show this vowel (https://en.wikipedia.org/wiki/English_phonology) or the short tense /ɪ/. There might be some inconsistency in the transcription as happy-tensing is preserved in inflected variants in spoken English (e.g., studied derives it from study, and it contrasts with studded) yet this might not always be reflected in the dictionary.

  • secondary stress: secondary stress is not always marked (the primary always is).

  • stems and inflections: not all inflected open-class words (noun, verbs, adjectives and adverbs) have all their inflected variants, and not all variants show all of the alternative pronunciations. The possessive form -'s of nouns is not included, and neither is the superlative form of most adjectives and adverbs.

  • acronyms vs initialisms: The expansions dictionary only contains acronyms, i.e., words that are not pronounced by spelling out the individual letters (e.g. NATO). Initialisms, on the other hand, (e.g. BBC, NHS) are excluded. The pronunciation of these can be obtained by looking up the names of the individual letters in the main dictionary, then concatenating them.

Sources

The initial source of the phonetic transcriptions is cmudict, plus a number of other sources for British English specifics: Wiktionary, Wikipedia, the Collins Dictionary, the Oxford Dictionary, the Cambridge Dictionary and the MacMillan Dictionary.

The main sources of the word frequency-filtered vocabulary are the top 10K in the British National Corpus, the Google Web Corpus and the New General Service Lists. Not all words in these lists are included since due to sampling bias there are uncommon words like athelstan or phentermine, as well as foreign words. Also excluded are initialisms.

Changelog

See Changelog

Contribuiting

If you'd like to contribute a correction or an addition, or make a request for an addition, you can make a pull request or open an issue.

MIT License (MIT)

Copyright (c) 2017 by Jose Llarena

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].