All Projects → abartov → pronuncify

abartov / pronuncify

Licence: Unlicense License
automate incrementally producing word pronunciation recordings for Wiktionary through Wikimedia Commons

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to pronuncify

cmu-pronouncing-dictionary
The 134,000+ words and their pronunciations in the CMU pronouncing dictionary
Stars: ✭ 46 (+100%)
Mutual labels:  pronunciation
Bat2Exe
Windows user interface for converting your batch files into executables.
Stars: ✭ 60 (+160.87%)
Mutual labels:  batch
batch deobfuscator
Deobfuscate batch scripts obfuscated using string substitution and escape character techniques.
Stars: ✭ 82 (+256.52%)
Mutual labels:  batch
gobatch
Batch processing library for Golang.
Stars: ✭ 19 (-17.39%)
Mutual labels:  batch
Batched-Grabber
🖥️ Windows Batch and powershell Discord Token grabber. Made for Troll (lmao)
Stars: ✭ 39 (+69.57%)
Mutual labels:  batch
sidekiq-merger
Merge Sidekiq jobs
Stars: ✭ 49 (+113.04%)
Mutual labels:  batch
EverythingPortable
EverythingPortable
Stars: ✭ 59 (+156.52%)
Mutual labels:  batch
bexhill-osm
A local mapping project using data from OpenStreetMap. Includes overlays, walking directions and historical information.
Stars: ✭ 16 (-30.43%)
Mutual labels:  wikimedia-commons
enableallExtensions
Automatically add all existing Chrome extensions to ExtensionInstallWhitelist, including non-webstore ones
Stars: ✭ 23 (+0%)
Mutual labels:  batch
openmessaging.github.io
OpenMessaging homepage
Stars: ✭ 12 (-47.83%)
Mutual labels:  batch
rocketjob
Ruby's missing background and batch processing system
Stars: ✭ 281 (+1121.74%)
Mutual labels:  batch
EFT Flea Market Bot
Escape from Tarkov Flea Market bot, to generate a lot of in-game currency within shortest time, while not even having to actively play the game!
Stars: ✭ 22 (-4.35%)
Mutual labels:  batch
chessalyzer.js
A JavaScript library for batch analyzing chess games
Stars: ✭ 14 (-39.13%)
Mutual labels:  batch
cl-rashell
Resilient replicant Shell Programming Library for Common Lisp
Stars: ✭ 17 (-26.09%)
Mutual labels:  batch
video-cut-tool
Wikimedia Tool to Trim Online Videos in Wikimedia Commons. https://commons.wikimedia.org/wiki/Commons:VideoCutTool
Stars: ✭ 27 (+17.39%)
Mutual labels:  wikimedia-commons
sic
🦜 Accessible image processing and conversion from the terminal. Front-end for image-rs/image.
Stars: ✭ 96 (+317.39%)
Mutual labels:  batch
KJNetworkPlugin
🎡A lightweight but powerful Network library. Network Plugin, Support batch and chain operation. 插件版网络架构
Stars: ✭ 43 (+86.96%)
Mutual labels:  batch
wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+626.09%)
Mutual labels:  pronunciation
Fun
Small fun scripts
Stars: ✭ 22 (-4.35%)
Mutual labels:  batch
nmly
Easy to use bulk rename utility for the terminal
Stars: ✭ 41 (+78.26%)
Mutual labels:  batch

pronuncify

Automate incrementally producing word pronunciation recordings for Wiktionary through Wikimedia Commons

Version

Pronuncify is version 0.32, Feb 7th 2017

Goal

Make it easy to quickly record batches of word pronunciations in Ogg files suitable for upload to Wikimedia Commons on any modern Linux machine.

It does so using the command line, showing the user a word at a time and recording a 4-second file. The user is then given a 4-second chance to reject it (if they made a mistake in recording, or if the word should not be recorded). If the user does nothing, the next word is shown and recorded. At the end of a run, you have count new Ogg files ready for upload, and named according to the standard in the Pronunciation page on Commons.

A single-file database (using SQLite) is used to track which words have been recorded so far.

Currently, the script handles ingesting word lists, recording batches of word pronunciations, and uploading them to Wikimedia Commons. The resultant Ogg files are deposited in a specified (or default) directory, and the user can either upload them to Commons manually, or employ the --upload option to have Pronuncify upload the files on their behalf. Pronuncify will automatically assign the appropriate category on Commons, based on the language code. The license will be CC0, to enable maximal re-use of the pronunciation files.

To upload, Pronuncify needs your Wikimedia username and password. In the future, I may implement OAuth-based authentication.

Prerequisites

  • Ruby 2.x
  • the sqlite3 library (apt-get install sqlite3) and gem (gem install sqlite3)
  • the mediawiki_api gem (gem install mediawiki_api) version 0.7.1 or later
  • the iso-639 gem (gem install iso-639)
  • alsa-utils (apt-get install alsa-utils)
  • sox (apt-get install sox)
  • your console needs to be able to render words in the chosen language (fonts matter!)

Usage

To ingest a wordlist

Given a UTF-8 plain text file with one word per line (lines beginning with '#' will be ignored), run:

ruby pronuncify.rb --ingest <fname> --lang <ISO code> --db <database file>

db defaults to './pronuncify.db'

Example for a word-list in Hebrew with the default database:

ruby pronuncify.rb --ingest wordlist.txt --lang he

To prepare another batch for recording

Run:

ruby pronuncify.rb --count NN --lang <ISO code> --outdir <directory> --frequency <Hz> --device <devicename> --sample <format>
  • count of words to record in a single run; defaults to 10
  • lang not needed if only one language ingested so far
  • outdir defaults to './pronounced_words_ISO'
  • frequency defaults to 48000 Hz
  • device will default to the system default. If you have a USB microphone, though, you may want something like --device hw:1,0 (see arecord --list-devices)
  • sample will default to the system default. If you have a USB microphone, you may need something like --sample S16_LE

so if you're only recording in one language and like the default count and output directory, you can just run:

ruby pronuncify.rb

to do 10 more words

Upload recorded files to Commons

To upload the recorded words to Commons (moving them from the output directory to an /uploaded subdirectory), run:

ruby pronuncify.rb --upload --user <username> --pass <password>

Saved configuration

Pronuncify will read settings from a pronuncify.yml file if it exists. You can still override specific settings by specifying them on the command line. To create the file, run pronuncify with the settings you want and add the --write-settings option.

Once you've saved your settings, you can just run:

ruby pronuncify.rb

to do another batch with your saved settings

Contributing

To report issues or contribute to the code, see http://github.com/abartov/pronuncify

See also

License

The code is in the public domain. See the LICENSE file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].