BasicArabicOCRA very basic Arabic OCR based on tesseract OCR engine written in Java.
ATKSpythis repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS
alyahmorArabic flexionnal morphology generator
arabic-stop-wordsLargest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
ArSarcasmThis repository contains the Arabic sarcasm dataset (ArSarcasm)
Arabic-Tashkeela-ModelThis is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
masaderThe largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
tajmeeatonتجميعة من المشاريع، وخصوصا مفتوحة المصدر، للنهوض باللغة العربية والأمة. 👨💻 👨🔬👨🏫🧕
farasapyA Python implementation of Farasa toolkit
comparable-text-minerComparable documents miner: Arabic-English morphological analysis, text processing, n-gram features extraction, POS tagging, dictionary translation, documents alignment, corpus information, text classification, tf-idf computation, text similarity computation, html documents cleaning
arabic-taggerAQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training
ar-embeddingsSentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec
SumrizedAutomatic Text Summarization (English/Arabic).
nmathegA simple strategy for training and finetuning NLP models for Arabic. Specify the parameters and just wait for the results. A simple design that makes use of the different tools in our NLP pipeline.
tkseemArabic Tokenization Library. It provides many tokenization algorithms.