All Projects → linuxscout → Mishkal

linuxscout / Mishkal

Licence: gpl-3.0
Mishkal is an arabic text vocalization software

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mishkal

Awesome Arabic
A curated list of awesome projects and dev/design resources for supporting Arabic computational needs.
Stars: ✭ 309 (+95.57%)
Mutual labels:  arabic, natural-language-processing
Awesome Pytorch List
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Stars: ✭ 12,475 (+7795.57%)
Mutual labels:  natural-language-processing
Natural Language Processing Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera
Stars: ✭ 151 (-4.43%)
Mutual labels:  natural-language-processing
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Stars: ✭ 155 (-1.9%)
Mutual labels:  natural-language-processing
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (-2.53%)
Mutual labels:  natural-language-processing
Typescript Mern Starter
Build a real fullstack app (backend+website+mobile) in 100% Typescript
Stars: ✭ 154 (-2.53%)
Mutual labels:  webapp
Paraphrase identification
Examine two sentences and determine whether they have the same meaning.
Stars: ✭ 154 (-2.53%)
Mutual labels:  natural-language-processing
Nacollector
⚔ 一个采集工具箱,据说是一个用于采集各种 WEB 资源的工作站?!你可以认为这是一个框架,可拓展。淘宝、天猫、苏宁、国美 等电商平台数据采集... 一键邀请 一键打包 账号登录获取Cookie 任务多线程 下载内容管理 实时日志 dll 热更新 无边框窗体 Web App, CefSharp, WebDriver
Stars: ✭ 158 (+0%)
Mutual labels:  webapp
Sling
SLING - A natural language frame semantics parser
Stars: ✭ 1,892 (+1097.47%)
Mutual labels:  natural-language-processing
Swagaf
Repository for paper "SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference"
Stars: ✭ 156 (-1.27%)
Mutual labels:  natural-language-processing
Piggyvault
Family finance management app.
Stars: ✭ 152 (-3.8%)
Mutual labels:  webapp
Pythonrouge
Python wrapper for evaluating summarization quality by ROUGE package
Stars: ✭ 155 (-1.9%)
Mutual labels:  natural-language-processing
Holiday Cn
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-0.63%)
Mutual labels:  natural-language-processing
Pytorch Question Answering
Important paper implementations for Question Answering using PyTorch
Stars: ✭ 154 (-2.53%)
Mutual labels:  natural-language-processing
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Stars: ✭ 12,626 (+7891.14%)
Mutual labels:  natural-language-processing
Qb
QANTA Quiz Bowl AI
Stars: ✭ 153 (-3.16%)
Mutual labels:  natural-language-processing
Rnn lstm from scratch
How to build RNNs and LSTMs from scratch with NumPy.
Stars: ✭ 156 (-1.27%)
Mutual labels:  natural-language-processing
Epubviewer
ePub viewer with dictionary, themes, search, offline support, and more
Stars: ✭ 156 (-1.27%)
Mutual labels:  webapp
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Stars: ✭ 158 (+0%)
Mutual labels:  natural-language-processing
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+7977.85%)
Mutual labels:  natural-language-processing

Mishkal

Mishkal Arabic text vocalization software مشكال لتشكيل النصوص العربية

downloads downloads

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 1.10 Bouira
License GPL
Tracker linuxscout/mishkal/Issues
Mailinglist [email protected]
Website tahadz.com/mishkal
Source Github
Download sourceforge
Feedbacks Comments
Accounts @Facebook @Twitter @Sourceforge

Citation

@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}

Install

You can Install Mishkal as library or Software

Python lib

pip install mishkal

Install from github

  1. Clone mishkal project from GitHub:
git clone https://github.com/linuxscout/mishkal.git
  1. Install necessary packages:
pip install -r miskal/requirements.txt

requirments

- pyarabic  : basic arabic library
- sylajone  : aranasyn syntaxical analyzer
- arramooz  : arabic morphological dictionary
- asmai     : semantic analyzer
- CodernityDB :  pure python, fast, NoSQL database, used as cache system to minimize load of morphological analyzer 
- collocations : collocation library ( deprecated)
- libqutrub : verb conjugation library used by morphological analyzer
- maskouk   : collocation library
- naftawayh : word tag library
- qalsadi   ; morphological analyzer
- tashaphyne : light stemmer used by morphological analyzer

Usage

Mishkal provides:

  • Console command line
  • python library
  • GUi interface
  • Web interface
  • API interface

GUI:

  • Windows: MishkalGui.exe

  • GUI: Linux

    python interfaces/gui/mishkal-gui.py
    

Web server (linux, windows)

python3 interfaces/web/mishkal-webserver

Console (linux/windows)

$ python3 bin/mishkal-console.py -f filename

Usage: bin/mishal-console.py  -f filename [OPTIONS]
           bin/mishal-console.py  'السلام عليكم' [OPTIONS]

        [-f | --file = filename]       input file 
        [-o | --outfile = filename]    output file to write vocalized text to, '$FILENAME (Tashkeel).txt' by default
        
        [-h | --help]             outputs this usage message
        [-v | --version]        program version
        [-p | --progress]      display progress status
        [-a | --verbose]       enable verbosity

        * Tashkeel Actions
        -------------------
        [-r | --reduced]        Reduced Tashkeel.
        [-s | --strip]             Strip tashkeel (remove harakat).
        [-c | --compare]      compare the vocalized text with the program output

        * Tashkeel Options
        ------------------
        [-l | --limit]             vocalize only a limited number of line
        [-x | --syntax]         disable syntaxic analysis
        [-m | --semantic]    disable semantic analysis
        [-g | --train]             enable training option
        [-i | --ignore]           ignore the last Mark on output words.
        [-t | --stat]               disable statistic tashkeel

This program is licensed under the GPL License

Library

pip install mishkal

example:

>>> import mishkal.tashkeel
>>> vocalizer = mishkal.tashkeel.TashkeelClass()
>>> text = u"تطلع الشمس صباحا"
>>> vocalizer.tashkeel(text)
' تَطْلُعُ الشَّمْسُ صَبَاحًا'
>>> 

JSON connection API التشكيل عن بعد

يمكن استدعاء خدمة الموقع عبر مكتبة جيسون json و ajax من أي موقع، ويمكنك استعمالها في موقعك.

  • طريقة الاستدعاء 1- باستعمال تقنية json مع مكتبة Jquery
<!DOCTYPE html   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script src="http://code.jquery.com/jquery-latest.js"></script>
</head>
<body>
  <div id="result">

</div>
<script>
$().ready(function() {
$.getJSON("http://tahadz.com/mishkal/ajaxGet", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},
  function(data) {
      $("#result").text(data.result);
  });

 });
</script>

الاستدعاء يكون كما يأتي

$.getJSON("http://tahadz.com/mishkal/ajax...", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},

حيث

  • text: النص المطلوب تشكيله.
  • action: العملية المطلوبة وهنا هي TashkeelText.

النتيجة تكون من الشكل

{"result": " السّلامُ عَلَيكُمْ اهلا بِكُمْ كَيْفَ حالُكُمْ", "order": "0"}

حيث

  • result: النص الناتج المشكول.
  • order: رقم السطر في النص الأصلي، فإذا كان النص الأصلي كبيرا يقسمه المشكال لعدد من الاسطر، وقد لا يرجعون في نفس الترتيب، لذا حددنا رقم الترتيب.

How does Mishkal work:

Mishkal use a rule based method to detect relations and diacritics, First, it analyzes all morphological cases, it generates all possible diacritized word forms, by detecting all affixes and check it in a dictionary. second, It add word frequency to each word.

The two previous steps are made by support/Qalsadi ( arabic morphological analyzer), the used dictionary is a separated project named 'Arramooz: arabic dictionnary for morphology".

Third, we use a syntax analyzer to detect all possible relations between words. The syntax library is named support/ArAnaSyn. This analyzer is basic for the moment, it use only linear relations between adjacent words.

Forth, all data generated and relations will be analyzed semantically, to detect semantic relation in order to reduce ambiguity. The use libary is support/asmai ( Arabic semantic analysis). The semantic relations extraction is based on corpus. The used corpus is named "Tashkeela: arabic vocalized texts corpus".

In the final stage, The module mishkal/tashkeel tries to select the suitable word in the context, it tries to get evidents cases, or more related cases, else, it tries to select more probable case, using some rules like select a stop word by default, or select Mansoub case by default.

The rest of program provides functions to handles interfaces and API with web/desktop or command line

Featured Posts

  • “مشكال” لتشكيل النصوص العربية بإحترافية كمال فودة
  • كيفية شكيل الحروف والكلمات أو حتى نصوص باللغة العربية في ثواني من خلال متصفحك- رضا بوربعة
  • خدمة عربية جديدة : تشكيل النصوص العربية Sam Hamou
  • إطلاق الإصدار التجريبي برنامج مشكال لتشكيل النصوص العربية Zaid AlSaadi
  • مشكال: الطريق نحو التشكيل مدونة اليراع
  • مشكال لتشكيل النصوص العربية: إطلاق واجهة سطح المكتب مدونة اليراع
  • تعرّف على مشاريع “تحدّث” .. مشاريعٌ للغةٍ عظيمة محمد هاني صباغ
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].