A fast, zero dependency object and array comparison library. Significantly faster than most other deep comparison libraries and has full TypeScript support.

Stars: ✭ 3,138 (+7553.66%)

Mutual labels: comparison

Transferable-E2E-ABSA

Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning (EMNLP'19)

Stars: ✭ 62 (+51.22%)

Mutual labels: sequence-labeling

arabic-tagger

AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training

Stars: ✭ 38 (-7.32%)

Mutual labels: arabic-language

neptune-client

📒 Experiment tracking tool and model registry

Stars: ✭ 348 (+748.78%)

Mutual labels: comparison

farasapy

A Python implementation of Farasa toolkit

Stars: ✭ 69 (+68.29%)

Mutual labels: diacritization

xdem

Analysis of digital elevation models (DEMs)

Stars: ✭ 50 (+21.95%)

Mutual labels: comparison

deepseg

Chinese word segmentation in tensorflow 2.x

Stars: ✭ 23 (-43.9%)

Mutual labels: sequence-labeling

ncdu-diff

ncdu fork that can compare and diff results

Stars: ✭ 21 (-48.78%)

Mutual labels: comparison

hyperdiff

Find common, removed and added element between two collections.

Stars: ✭ 14 (-65.85%)

Mutual labels: comparison

BERT-BiLSTM-CRF

BERT-BiLSTM-CRF的Keras版实现

Stars: ✭ 40 (-2.44%)

Mutual labels: sequence-labeling

View All Similar Projects ➔

Arabic Text Diacritization

This repository contains the dataset, helpers, and systems comparison for our paper on Arabic Text Diacritization:

"Arabic Text Diacritization Using Deep Neural Networks", Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, and Mahmoud Al-Ayyoub, ICCAIS 2019.

Files

dataset

train.txt - Contains 50,000 lines of diacritized Arabic text which can be used as training dataset
val.txt - Contains 2,500 lines of diacritized Arabic text which can be used as validation dataset
test.txt - Contains 2,500 lines of diacritized Arabic text which can be used as testing dataset

helpers

constants
- ARABIC_LETTERS_LIST.pickle - Contains list of Arbaic letters
- CLASSES_LIST.pickle - Contains list of all possible classes
- DIACRITICS_LIST.pickle - Contains list of all diacritics
count_characters.py - Counts the number of Arabic letters and diacritics in a file
count_fathatan.py - Counts the number of fathatan occurrences before and after Alif in all files from a folder
diacritization_stat.py - Calculates DER and WER using the gold data and the predicted output
diacritics_rate_extractor.py - Keeps lines with p% diacritics to Arabic characters rate or more in all files from a folder
file_lookup.py - Searches for a line in all files from a folder
fix_fathatan.py - Changes after-Alif fathatan to before-Alit fathatan in a file
remove_diacritics.py - Removes diacritics from a file
transliteration.py - Converts a file from Arabic text to Buckwalter transliteration and vice-versa
pre_process_tashkeela_corpus.ipynb - Pre-process Tashkeela Corpus data

existing_systems

ali-soft - Contains some bugs that exist in Ali-Soft system
farasa - Contains Farasa system output, fixed output, and DER/WER statistics
harakat - Contains Harakat system testing script, output, fixed output, and DER/WER statistics
madamira - Contains MADAMIRA system output, fixed output, and DER/WER statistics
mishkal - Contains Mishkal system output, fixed output, and DER/WER statistics
shakkala - Contains Shakkala system data splitting script, output, fixed output, and DER/WER statistics
tashkeela_model - Contains Tashkeela-Model system output, fixed output, and DER/WER statistics for each n-gram model provided by them

Note: All codes in this repository tested on Ubuntu 18.04

Contributors

License

The project is available as open source under the terms of the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

AliOsm / arabic-text-diacritization

Programming Languages

Labels

Projects that are alternatives of or similar to arabic-text-diacritization

Arabic Text Diacritization

Files

dataset

helpers

existing_systems

Note: All codes in this repository tested on Ubuntu 18.04

Contributors

License