jcsilva / multilingual-g2p

Licence: other

Multilingual Grapheme to Phoneme

Programming Languages

shell

77523 projects

Projects that are alternatives of or similar to multilingual-g2p

voxpopuli

Python wrapper for Espeak and Mbrola, for simple local TTS

Stars: ✭ 21 (-47.5%)

Mutual labels: espeak, phonemes

DeepPhonemizer

Grapheme to phoneme conversion with deep learning.

Stars: ✭ 152 (+280%)

Mutual labels: phonemes, g2p

Deep-NLP-Resources

Curated list of all NLP Resources

Stars: ✭ 65 (+62.5%)

Mutual labels: lexicon

myprosody

A Python library for measuring the acoustic features of speech (simultaneous speech, high entropy) compared to ones of native speech.

Stars: ✭ 162 (+305%)

Mutual labels: phonemes

asr24

24-hour Automatic Speech Recognition

Stars: ✭ 27 (-32.5%)

Mutual labels: g2p

gf-wordnet

A WordNet in GF

Stars: ✭ 15 (-62.5%)

Mutual labels: lexicon

AffectiveTweets

A WEKA package for analyzing emotion and sentiment of tweets.

Stars: ✭ 74 (+85%)

Mutual labels: lexicon

py-espeak-ng

Some simple wrappers around eSpeak NG intended to make using this excellent TTS for waveform and IPA generation as convenient as possible.

Stars: ✭ 27 (-32.5%)

Mutual labels: espeak

JSpeak

A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features

Stars: ✭ 16 (-60%)

Mutual labels: espeak

sam

Software Automatic Mouth - Tiny Speech Synthesizer

Stars: ✭ 316 (+690%)

Mutual labels: phonemes

lexpy

Python package for lexicon; Trie and DAWG implementation.

Stars: ✭ 47 (+17.5%)

Mutual labels: lexicon

wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

Stars: ✭ 78 (+95%)

Mutual labels: lexicon

OpenGNT

Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources

Stars: ✭ 55 (+37.5%)

Mutual labels: lexicon

myG2P

Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).

Stars: ✭ 43 (+7.5%)

Mutual labels: g2p

mlmorph

Malayalam Morphological Analyzer using Finite State Transducer

Stars: ✭ 40 (+0%)

Mutual labels: lexicon

NRCLex

An affect generator based on TextBlob and the NRC affect lexicon. Note that lexicon license is for research purposes only.

Stars: ✭ 42 (+5%)

Mutual labels: lexicon

Aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Stars: ✭ 1,942 (+4755%)

Mutual labels: espeak

afinn

Sentiment Analysis in Javascript using the AFINN Lexicon

Stars: ✭ 26 (-35%)

Mutual labels: lexicon

G2P

Grapheme To Phoneme

Stars: ✭ 59 (+47.5%)

Mutual labels: g2p

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (+17.5%)

Mutual labels: lexicon

View All Similar Projects ➔

Multilingual Grapheme to Phoneme

Multilingual G2P based on espeak. Based on these ideas.

Languages

This G2P may be used in several languages. By defautl, it is configured for Brazilian Portuguese.

How to use

First of all, install espeak. On Ubuntu 14.04, sudo apt-get install espeak.
Create a words list with one word per line. The file words.egs included in this repository is an example.
Execute g2p.sh:

./g2p.sh -w words.egs

The lexicon will be thrown in /dev/stdout.

You may choose a different language simply setting the parameter "l". For example, the following command line will generate a French lexicon.

./g2p.sh -w words.egs -l fr

The following languages are valid:

af (Afrikaans), bs (Bosnian), ca (Catalan), cs (Czech),
da (Danish), de (German), el (Greek), en (Default English),
en-us (American English), en-sc (Scottich English),
en-n (Northern British English), en-rp (Received Pronunciation British English),
en-wm (West Midlands British English), eo (Esperanto), es (Spanish),
es-la (Spanish - Latin America), fi (Finnish), fr (French), hr (Croatian),
hu (Hungarian), it (Italian), kn (Kannada), ku (Kurdish), lv (Latvian),
nl (Dutch), pl (Polish), pt (Portuguese (Brazil)), pt-pt (Portuguese (European)),
ro (Romanian), sk (Slovak), sr (Serbian), sv (Swedish), sw (Swahihi),
ta (Tamil), tr (Turkish), zh (Mandarin Chinese)

Create a Brazilian Portuguese list of words

Get spelling dictionary, the license is LGPL version 2.1.
Extract pt_BR.dic and pt_BR.aff files from the .oxt file that was downloaded in the previous step. It may be done using vim.
Convert pt_BR.dic and pt_BR.aff to UTF-8:

iconv -f ISO8859-1 -t UTF-8 < pt_BR.dic > portuguese-brazilian-utf8.dic
iconv -f ISO8859-1 -t UTF-8 < pt_BR.aff > portuguese-brazilian-utf8.aff

Change first line of file portuguese-brazilian-utf8.aff from SET ISO8859-1 to SET UTF-8.
Install unmunch tool:

sudo apt-get install hunspell-tools

Generate a list with Brazilian Portuguese words:

unmunch portuguese-brazilian-utf8.dic portuguese-brazilian-utf8.aff > portuguese-brazilian-wordlist

portuguese-brazilian-wordlist will have more than 80 million words and its size will be greater than 1 GB.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jcsilva / multilingual-g2p

Programming Languages

Labels

Projects that are alternatives of or similar to multilingual-g2p

Multilingual Grapheme to Phoneme

Languages

How to use

Create a Brazilian Portuguese list of words