All Projects → cpfair → quran-align

cpfair / quran-align

Licence: MIT license
Word-accurate timestamps for Qur'anic audio.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to quran-align

IslamBot
A Discord bot that supports Qur'an, hadith, prayer times, tafsir and more.
Stars: ✭ 59 (-57.55%)
Mutual labels:  quran, islam
QuranJSON
Simplified Perfect Complete Quran JSON (Indonesia Translation, Tafsir, and Audio) with API
Stars: ✭ 83 (-40.29%)
Mutual labels:  quran, islam
quran-api
Open source quran api, not only quran text, this api is also equipped with audio recitation and you can change the audio according to the recitation of the Imam that you like
Stars: ✭ 38 (-72.66%)
Mutual labels:  quran, ayah
alquran-tools
Various tools for Parsing Quran Tajweed, Buck, etc.
Stars: ✭ 85 (-38.85%)
Mutual labels:  quran, islam
quran-text
The Qurʾan’s text
Stars: ✭ 36 (-74.1%)
Mutual labels:  quran
A chronology of deep learning
Tracing back and exposing in chronological order the main ideas in the field of deep learning, to help everyone better understand the current intense research in AI.
Stars: ✭ 47 (-66.19%)
Mutual labels:  speech-recognition
Sirat-E-Mustaqeem
Islamic App with Complete Quran, Prayer time Api, Hadith, & Qibla Direction.
Stars: ✭ 119 (-14.39%)
Mutual labels:  quran
revai-node-sdk
Node.js SDK for the Rev AI API
Stars: ✭ 21 (-84.89%)
Mutual labels:  speech-recognition
app waktu solat malaysia
Prayer times app for Malaysia. Accurate data from JAKIM. Available for Android and the web.
Stars: ✭ 24 (-82.73%)
Mutual labels:  islam
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (-84.89%)
Mutual labels:  speech-recognition
formulas-python
Ritchie CLI formulas in Python 🐍
Stars: ✭ 17 (-87.77%)
Mutual labels:  speech-recognition
tajmeeaton
تجميعة من المشاريع، وخصوصا مفتوحة المصدر، للنهوض باللغة العربية والأمة. 👨‍💻 👨‍🔬👨‍🏫🧕
Stars: ✭ 115 (-17.27%)
Mutual labels:  islam
NLP Toolkit
Library of state-of-the-art models (PyTorch) for NLP tasks
Stars: ✭ 92 (-33.81%)
Mutual labels:  speech-recognition
Deep-learning-And-Paper
【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等
Stars: ✭ 62 (-55.4%)
Mutual labels:  speech-recognition
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (-62.59%)
Mutual labels:  speech-recognition
speech-to-text-code-pattern
React app using the Watson Speech to Text service to transform voice audio into written text.
Stars: ✭ 37 (-73.38%)
Mutual labels:  speech-recognition
lafzi-web
Antarmuka web untuk Lafzi: mesin pencari lafadz dalam Al-Quran
Stars: ✭ 25 (-82.01%)
Mutual labels:  quran
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+61.15%)
Mutual labels:  speech-recognition
common-components
Common components to be used across Quran.com, Quranicaudio.com and Salah.com
Stars: ✭ 35 (-74.82%)
Mutual labels:  quran
soxan
Wav2Vec for speech recognition, classification, and audio classification
Stars: ✭ 113 (-18.71%)
Mutual labels:  speech-recognition

quran-align

A tool for producing word-precise segmentation of recorded Qur'anic recitation. Designed to work with EveryAyah style audio input.

Each word in the Qur'an is assigned a precise start and end timestamp within the recorded audio of the ayah. You can use this data to highlight the word currently being spoken during playback, to repeat a certain word or phrase, to compare against other audio, to analyze a qari's speaking cadence, and so on.

Data

If you just want the data, you may not need to actually run this tool: I've generated word-by-word timing files for many quraa' already. Visit the Releases tab to download them.

These data files are licensed under a Creative Commons Attribution 4.0 International License. Please consider emailing me if you use this data, so I can let you know when new & revised timing data is available.

Data Format

Data files are JSON, of the following format:

[
    {
        "surah": 1,
        "ayah": 1,
        "segments": {
            [word_start_index, word_end_index, start_msec, end_msec],
            ...
        },
        "stats": {
          "insertions": 123,
          "deletions": 456,
          "transpositions": 789
        }
    },
    ...
]

Where...

  • word_start_index is the 0-base index of the first word contained in the segment.
  • word_end_index is the 0-base index of the word after the last word contained in the segment.
  • start_msec/end_msec are timestamps within the input audio file.
  • stats contain statistics from the matching routine that aligns the recognized words with reference text.

Here, a "word" is defined by splitting the text of the Qur'an by spaces (specifically, quran-uthmani.txt from Tanzil.net - without me_quran tanween differentiation). Within the code, you may notice that the language model used for recognition treats muqata'at as sequences of words (ا ل م instead of الم) - but they will always appear as a single word in the alignment output.

Data Quality

Between the subjective nature of deciding exactly one word ends and the next begins, ambiguity surrounding repeated phrases, and most significantly, the lack of human-reviewed reference data, it is hard to measure the accuracy of this system. However, I was able to compare these results with those from the creators of ElMohafez, who use a different, independently-developed methodology than my own.

Using this data as a reference, I found that word timestamps fell an average of <73 msec away fro the reference data on a per-span basis, with standard deviations averaging 139 msec across all 6 recordings. 98.5-99.9% of words were individually segmented. These results except certain cases, most significantly, where the qari repeated or skipped a phrase (generally <1% of all words).

As our two independent implementations produce very similar results, it's reasonable to conclude that the data is largely correct, or that both implementations made the same mistakes.

Data Completeness

In some cases, it was not possible to automatically differentiate two words. This is a rare occurrence. In all cases, the segment's start and end word indices indicate the range of words contained by the segment. It is possible that some words may be omitted from the result sequence if their bounds could not be determined directly or inferred from adjacent words.

Methodology

  1. A CMU Sphinx speaker-specific acoustic model is trained using the verse-by-verse recitation recording and a Qur'an-specific language model.
  2. PocketSphinx full-utterance recognition is run on each ayah's audio, provided with a filtered LM dictionary containing only words from that ayah to improve recognition rates and runtime performance.
  3. Recognized words are matched to the reference text of each ayah, accounting for insertions, deletions, transpositions, etc.
  4. Raw audio data and a derived MFCC feature stream are used to refine alignment of words to syllable boundaries within the ayah audio.

Usage

Unfortunately, a key component - the script that generates the speech model training inputs and supporting data files - is currently in an unpublishable state. Nonetheless, with this excercise left to the reader, the align tool's help output explains its full usage. You may need to override CMUSPHINX_ROOT in the Makefile. Note that WAV files must be generated by FFMPEG because I hard-coded an offset to the audio data to avoid writing a RIFF parser.

Requirements

  • A UNIX machine (Windows Bash/LWS works)
  • PocketSphinx, SphinxTrain, and cmuclmtk from CMU Sphinx
  • EveryAyah-style audio recordings of the recitation
  • A C++11-compatible compiler
  • Python
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].