All Projects → linuxscout → alyahmor

linuxscout / alyahmor

Licence: GPL-3.0 License
Arabic flexionnal morphology generator

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to alyahmor

libwifi
An 802.11 Frame Generation and Parsing Library in C
Stars: ✭ 27 (+22.73%)
Mutual labels:  generation
UnityExternal
Unity project demonstrating the use of an external C# project with Visual Studio Tools for Unity (VSTU).
Stars: ✭ 24 (+9.09%)
Mutual labels:  generation
Arteries
A procedural modeling toolkit base on UE4 blueprint
Stars: ✭ 92 (+318.18%)
Mutual labels:  generation
masader
The largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
Stars: ✭ 66 (+200%)
Mutual labels:  arabic-nlp
RivWidthCloudPaper
A Google Earth Engine based algorithm that extracts river centerlines and widths from satellite images
Stars: ✭ 62 (+181.82%)
Mutual labels:  morphology
hms-3d-modeling-demo
HUAWEI 3D Modeling Kit project contains a sample app. Guided by this demo, you will be able to implement full 3D Modeling Kit capabilities, including 3D object reconstruction and material generation.
Stars: ✭ 45 (+104.55%)
Mutual labels:  generation
mlmorph
Malayalam Morphological Analyzer using Finite State Transducer
Stars: ✭ 40 (+81.82%)
Mutual labels:  morphology
cnpj
🇧🇷 Format, validate and generate CNPJ numbers in Node & Deno
Stars: ✭ 26 (+18.18%)
Mutual labels:  generation
Speech driven gesture generation with autoencoder
This is the official implementation for IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation".
Stars: ✭ 76 (+245.45%)
Mutual labels:  generation
DeepMorphy
Морфологический анализатор для русского языка на C# для .NET
Stars: ✭ 23 (+4.55%)
Mutual labels:  morphology
Arabic-Tashkeela-Model
This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
Stars: ✭ 15 (-31.82%)
Mutual labels:  arabic-nlp
ilvr adm
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)
Stars: ✭ 133 (+504.55%)
Mutual labels:  generation
python-makefun
Dynamically create python functions with a proper signature.
Stars: ✭ 62 (+181.82%)
Mutual labels:  generation
erdiagram
Entity-Relationship diagram code generator library
Stars: ✭ 28 (+27.27%)
Mutual labels:  generation
UniqueBible
A cross-platform bible application, integrated with high-quality resources and amazing features, running offline in Windows, macOS and Linux
Stars: ✭ 61 (+177.27%)
Mutual labels:  morphology
lemma
A Morphological Parser (Analyser) / Lemmatizer written in Elixir.
Stars: ✭ 45 (+104.55%)
Mutual labels:  morphology
modular-assemblies
[NeurIPS 2019] Code for the paper "Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity"
Stars: ✭ 98 (+345.45%)
Mutual labels:  morphology
Morphos-Blade
Morphos adapter for Blade
Stars: ✭ 32 (+45.45%)
Mutual labels:  morphology
retinal-exudates-detection
exudates detection using hybrid approach (Image Morphology & Machine Learning)
Stars: ✭ 53 (+140.91%)
Mutual labels:  morphology
arabic-stop-words
Largest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
Stars: ✭ 193 (+777.27%)
Mutual labels:  arabic-nlp

Alyahmor اليحمور

Arabic flexionnal morphology generator

Description

The Alyahmor produce a word form from (prefix, lemma, suffix). It has many functionalities:

  • Generate word forms from given word and affixes
  • Generate all word forms by adding verbal or nominal affixes according to word type
  • Generate all affixes combination for verbs or nouns which can be used in morphology analysis.

Developpers:

Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 0.1
License GPL
Tracker linuxscout/alyahmor/Issues
Accounts @Twitter

Citation

If you would cite it in academic work, can you use this citation

T. Zerrouki‏, Alyahmor, Arabic mophological  generator Library for python.,  https://pypi.python.org/pypi/alyahmor/, 2019

or in bibtex format

@misc{zerrouki2019alyahmor,
  title={alyahmor, Arabic mophological generator Library for python.},
  author={Zerrouki, Taha},
  url={https://pypi.python.org/pypi/alyahmor},
  year={2019}
}

Applications

  • Text Stemming
  • Morphology analysis
  • Text Classification and categorization
  • Spellchecking

Features مزايا

  • Arabic word Light Stemming.
  • Features:
    • Generate word forms from given word and affixes
    • Generate all word forms by adding verbal or nominal affixes according to word type
    • Generate all affixes combination for verbs or nouns which can be used in morphology analysis.

Installation

pip install alyahmor

Requirements

pip install -r requirements.txt 

أصل التسمية

اليَحْمُور، وهو الحسن بن المعالي الباقلاني أبو علي النحوي الحلي شيخ العربية في زمانه في بغداد من تلامذة أبي البقاء العكبري ت ٦٣٧هـ

وكتب بخطه كثيراً من الأدب واللغة وسائر الفنون، وكان له همةٌ عالية، وحرصٌ شديد؛ وتحصيل الفوائد مع علو سنه، وضعف بصره، وكثرة محفوظه، وصدقه، وثقته، وتواضعه، وكرم أخلاقه.

وانتقل آخر عمره إلى مذهب الشافعي، وانتهت إليه رياسة النحو. مولده سنة ثمان وستين وخمسمائة، وتوفي سنة سبع وثلاثين وستمائة. المزيد عن اليحمور

Usage

Example

Generate words forms

It joins word with affixes with suitable correction for example

بال+كتاب +ين => بالكتابين ب+أبناء+ه => بأبنائه

Nouns

To generate all forms of the word كتاب as noun use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> noun_forms = generator.generate_forms( word, word_type="noun")
>>>noun_forms
[u'آلْكِتَاب', u'آلْكِتَابا', u'آلْكِتَابات', u'آلْكِتَابان', u'آلْكِتَابة', u'آلْكِتَابتان', u'آلْكِتَابتين', u'آلْكِتَابون', u'آلْكِتَابي', u'آلْكِتَابيات'
....]

Verbs

To generate all forms of the word كتاب as verb, use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"استعمل"
>>> verb_forms = generator.generate_forms( word, word_type="verb")
>>>verb_forms
[u'أَأَسْتَعْمِلَ', u'أَأَسْتَعْمِلَكَ', u'أَأَسْتَعْمِلَكُمَا', u'أَأَسْتَعْمِلَكُمْ', u'أَأَسْتَعْمِلَكُنَّ', u'أَأَسْتَعْمِلَنَا', u'أَأَسْتَعْمِلَنِي', u'أَأَسْتَعْمِلَنَّ', u'أَأَسْتَعْمِلَنَّكَ', u'أَأَسْتَعْمِلَنَّكُمَا', 

....]

Generate non vocalized forms

To generate all forms of the word كتاب as noun without vocalization use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> noun_forms = generator.generate_forms( word, word_type="noun", vocalized=False)
>>>noun_forms
[u'آلكتاب', u'آلكتابا', u'آلكتابات', u'آلكتابان', u'آلكتابة', u'آلكتابتان', u'آلكتابتين', u'آلكتابون', u'آلكتابي', u'آلكتابيات',
....]

Generate a dictionary of vocalized forms indexed by unvocalized form

To generate all forms of the word كتاب as noun as a dict of grouped all vocalized forms by unvocalized form use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> noun_forms = generator.generate_forms( word, word_type="noun", indexed=True)
>>>noun_forms
{u'أككتابة': [u'أكَكِتَِابَةِ', u'أكَكِتَِابَةٍ'],
 u'أوككتابة': [u'أَوَكَكِتَِابَةِ', u'أَوَكَكِتَِابَةٍ'],
 u'وكتابياتهم': [u'وَكِتَِابياتهِمْ', u'وَكِتَِابِيَاتُهُمْ', u'وَكِتَِابِيَاتِهِمْ', u'وَكِتَِابِيَاتُهِمْ', u'وَكِتَِابياتهُمْ'],
 u'وكتابياتهن': [u'وَكِتَِابياتهِنَّ', u'وَكِتَِابياتهُنَّ', u'وَكِتَِابِيَاتِهِنَّ', u'وَكِتَِابِيَاتُهِنَّ', u'وَكِتَِابِيَاتُهُنَّ'],
 u'وللكتابات': [u'وَلِلْكِتَِابَاتِ', u'وَلِلْكِتَِابات'],
 u'أبكتابتكن': [u'أَبِكِتَِابَتِكُنَّ'],
 u'أبكتابتكم': [u'أَبِكِتَِابَتِكُمْ'],
 u'أكتابياتهن': [u'أَكِتَِابياتهِنَّ', u'أَكِتَِابِيَاتِهِنَّ', u'أَكِتَِابياتهُنَّ', u'أَكِتَِابِيَاتُهُنَّ', u'أَكِتَِابِيَاتُهِنَّ'],
 u'فكتاباتهم': [u'فَكِتَِاباتهِمْ', u'فَكِتَِابَاتُهُمْ', u'فَكِتَِابَاتُهِمْ', u'فَكِتَِاباتهُمْ', u'فَكِتَِابَاتِهِمْ'],
 u'بكتابياتكن': [u'بِكِتَِابِيَاتِكُنَّ', u'بِكِتَِابياتكُنَّ'],
....
}

Generate detailled forms

The detailled form contains

  • vocalized word form, example: "ِكِتَابَاتُنَا"
  • semi-vocalized: the word without case mark (دون علامة الإعراب), example: "ِكِتَابَاتنَا"
  • segmented form: the affix parts and the word like : procletic-prefix-word-suffix-proclitic, for example : و--كتاب-ات-نا
  • Tags : عطف:جمع مؤنث سالم:ضمير متصل
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
noun_forms = generator.generate_forms( word, word_type="noun", indexed=True, details=True)
>>> noun_forms
  [{'vocolized': 'استعمل', 'semi-vocalized': 'استعمل', 'segmented': '-استعمل--', 'tags': '::'}, 
  {'vocolized': 'استعملي', 'semi-vocalized': 'استعملي', 'segmented': '-استعمل--ي', 'tags': ':مضاف:'},
  {'vocolized': 'استعملِي', 'semi-vocalized': 'استعملِي', 'segmented': '-استعمل--ي', 'tags': ':مضاف:'},
  {'vocolized': 'استعملكِ', 'semi-vocalized': 'استعملكِ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'}, 
  {'vocolized': 'استعملكَ', 'semi-vocalized': 'استعملكَ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'},
   {'vocolized': 'استعملكِ', 'semi-vocalized': 'استعملكِ', 'segmented': '-استعمل--ك', 'tags': ':مضاف:'}, 
   {'vocolized': 'استعملكُمُ', 'semi-vocalized': 'استعملكُمُ', 'segmented': '-استعمل--كم', 'tags': ':مضاف:'}, 
   ....]

Generate affixes lists

Alyahmor generate affixes listes for verbs and nouns

>>> verb_affix =generator.generate_affix_list(word_type="verb", vocalized=True)
>>>verb_affix
[u'أَفَسَت-يننِي', u'أَ-ونَا', u'ي-ونكَ', u'فَلَ-تاكَ', u'وَلََن-هُنَّ', u'أَت-وننَا', u'وَ-اكُنَّ', u'ن-ننَا', u'وَت-وهَا', u'أَي-نهُمَا', ....]

>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=True)
>>> noun_affix
[u'أكَ-ياتكَ', u'فَ-ِيَاتِكُمَا', u'أكَ-ياتكِ', u'أَوَكَ-ِينَا', u'أَلِ-ِيِّهِنَّ', u'أَفَ-َكُمَا', u'أَفَ-ِيَّتِهِمْ', u'أَفَكَ-ياتهُمْ', u'فَبِ-ِيِّكُمْ', u'وَلِ-ِيَّتِهَا', ....]

Generate Unvocalized affixes

>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=False)
>>> noun_affix
[u'-', u'-ا', u'-ات', u'-اتك', u'-اتكم', u'-اتكما', u'-اتكن', u'-اتنا', u'-اته', u'-اتها', ...]

Generate word forms by affixes

Alyahmor generate word forms for given affixes

  • the affix parameter is a list which contains four elements as
  • procletic
  • prefix
  • suffix
  • enclitic
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"كِتِاب"
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"بال", u"", u"ين", u""])
['بِالْكِتَِابين']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"وك", u"", u"ِ", u""])
['وَكَكِتَِابِ']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"و", u"", u"", u""])
['وَكِتَِاب']
 

Files

  • file/directory category description

tests/samples/dataset.csv A list of verified affixes

Featured Posts

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].