Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → adhaamehab → Textblob Ar

adhaamehab / Textblob Ar

Licence: mit

Arabic support for textblob

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning nlp natural-language-processing text-classification sentiment-analysis word-embeddings part-of-speech-tagger

Projects that are alternatives of or similar to Textblob Ar

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+4096.67%)

Mutual labels: natural-language-processing, sentiment-analysis, part-of-speech-tagger, text-classification

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+1786.67%)

Mutual labels: natural-language-processing, text-classification, sentiment-analysis

Pynlp

A pythonic wrapper for Stanford CoreNLP.

Stars: ✭ 103 (+71.67%)

Mutual labels: natural-language-processing, sentiment-analysis, part-of-speech-tagger

Ml Classify Text Js

Machine learning based text classification in JavaScript using n-grams and cosine similarity

Stars: ✭ 38 (-36.67%)

Mutual labels: natural-language-processing, text-classification, sentiment-analysis

overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification

NLP tutorial

Stars: ✭ 41 (-31.67%)

Mutual labels: sentiment-analysis, text-classification, word-embeddings

Kadot

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (+80%)

Mutual labels: natural-language-processing, text-classification, word-embeddings

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+5248.33%)

Mutual labels: natural-language-processing, sentiment-analysis, word-embeddings

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+496.67%)

Mutual labels: natural-language-processing, text-classification, sentiment-analysis

Chatbot cn

基于金融-司法领域(兼有闲聊性质)的聊天机器人，其中的主要模块有信息抽取、NLU、NLG、知识图谱等，并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口

Stars: ✭ 791 (+1218.33%)

Mutual labels: text-classification, sentiment-analysis

Nlp With Ruby

Curated List: Practical Natural Language Processing done in Ruby

Stars: ✭ 907 (+1411.67%)

Mutual labels: natural-language-processing, sentiment-analysis

Syntree2vec

An algorithm to augment syntactic hierarchy into word embeddings

Stars: ✭ 9 (-85%)

Mutual labels: natural-language-processing, word-embeddings

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+1216.67%)

Mutual labels: natural-language-processing, text-classification

Tf Rnn Attention

Tensorflow implementation of attention mechanism for text classification tasks.

Stars: ✭ 735 (+1125%)

Mutual labels: text-classification, sentiment-analysis

Concise Ipython Notebooks For Deep Learning

Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.

Stars: ✭ 23 (-61.67%)

Mutual labels: text-classification, word-embeddings

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (+1091.67%)

Mutual labels: natural-language-processing, word-embeddings

Wikipedia2vec

A tool for learning vector representations of words and entities from Wikipedia

Stars: ✭ 655 (+991.67%)

Mutual labels: natural-language-processing, text-classification

Omnicat Bayes

Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)

Stars: ✭ 30 (-50%)

Mutual labels: text-classification, sentiment-analysis

Easy Deep Learning With Allennlp

🔮Deep Learning for text made easy with AllenNLP

Stars: ✭ 32 (-46.67%)

Mutual labels: natural-language-processing, text-classification

Tensorflow Sentiment Analysis On Amazon Reviews Data

Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.

Stars: ✭ 34 (-43.33%)

Mutual labels: text-classification, sentiment-analysis

Pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Stars: ✭ 8,112 (+13420%)

Mutual labels: natural-language-processing, sentiment-analysis

View All Similar Projects ➔

=========== textblob-ar [WIP]

.. image:: https://travis-ci.org/adhaamehab/textblob-ar.svg?branch=master :target: https://travis-ci.org/adhaamehab/textblob-ar

Arabic language support for TextBlob_.

Features

Tokenizer
Sentiment analysis
Stanford Arabic POS
Spelling Correction
Text similarity
Fasttext arabic word2vec interface

Usage

Tokenizer

.. code-block:: python

>>> from textblob_ar import TextBlob
>>> blob = TextBlob(u"""هندسة البرمجيات هي دراسة تصميم وتنفيذ وتعديل البرمجيات بما يضمن توفر هذه البرمجيات بجودة عالية وتكلفة معقولة متاحة للجميع وقابلة للتطوير فيما بعد وسريعة للبناء. وهندسة البرمجيات تقوم على أسس ونظريات من الهندسة وعلوم الحاسب كمبدأ ال Functional Structure من الهندسة والذي يعتمد على مبدأ تصميم أجزاء صغيرة تتجانس في العمل مع بعضها لتشكل عمل الكل.""")
>>> blob.tokens
WordList(['هندسة', 'البرمجيات', 'هي', 'دراسة', 'تصميم', 'وتنفيذ', 'وتعديل', 'البرمجيات', 'بما', 'يضمن', 'توفر', 'هذه', 'البرمجيات', 'بجودة', 'عالية', 'وتكلفة', 'معقولة', 'متاحة', 'للجميع', 'وقابلة', 'للتطوير', 'فيما', 'بعد', 'وسريعة', 'للبناء', '.', 'وهندسة', 'البرمجيات', 'تقوم', 'على', 'أسس', 'ونظريات', 'من', 'الهندسة', 'وعلوم', 'الحاسب', 'كمبدأ', 'ال', 'Functional', 'Structure', 'من', 'الهندسة', 'والذي', 'يعتمد', 'على', 'مبدأ', 'تصميم', 'أجزاء', 'صغيرة', 'تتجانس', 'في', 'العمل', 'مع', 'بعضها', 'لتشكل', 'عمل', 'الكل', '.'])

Sentiment

.. code-block:: python

>>> from textblob_ar import TextBlob
>>> blob = TextBlob('اعجبني هذا الكتاب. اعترض قليلا مع بعض افكاره لكن مضمونه رائع')
>>> blob.sentiment
Sentiment(polarity=0.8, subjectivity=0.9)
>>> blob = TextBlob('لم يعجبني هذا الكتاب. مضمونه سئ')
>>> blob.sentiment
Sentiment(polarity=-0.6999999999999998, subjectivity=0.6666666666666666)

Stanford POS

Note that Stanford POS is the defualt one untill the main one is released .. code-block:: python

>>> from textblob_ar import TextBlob
>>> from textblob_ar.pos_tagger import StanfordPOSTagger
>>> tagg = StanfordPOSTagger()
>>> text = """ في أنظمة التشغيل متعددة المهام مثل اليونكس عفريت النظام هو برنامج يعمل في خلفية النظام بعيدا عن التحكم المباشر من المستحدم وغالبا ما يبدأ عمله كعملية خلفية مع بداية تشغيل النظام."""
>>> blob = TextBlob(text, pos_tagger=tagger)
>>> print(blob.tags)
[('', 'في/IN'), ('', 'أنظمة/NN'), ('', 'التشغيل/DTNN'), ('', 'متعددة/JJ'), ('', 'المهام/DTNN'), ('', 'مثل/NN'), ('', 'اليونكس/DTNNP'), ('', 'عفريت/NNP'), ('', 'النظام/DTNN'), ('', 'هو/PRP'), ('', 'برنامج/NN'), ('', 'يعمل/VBP'), ('', 'في/IN'), ('', 'خلفية/NN'), ('', 'النظام/DTNN'), ('', 'بعيدا/JJ'), ('', 'عن/IN'), ('', 'التحكم/DTNN'), ('', 'المباشر/DTJJ'), ('', 'من/IN'), ('', 'المستحدم/DTNN'), ('', 'وغالبا/NN'), ('', 'ما/WP'), ('', 'يبدأ/VBP'), ('', 'عمله/NN'), ('', 'كعملية/JJ'), ('', 'خلفية/NN'), ('', 'مع/NN'), ('', 'بداية/NN'), ('', 'تشغيل/NN'), ('', 'النظام/DTNN')]

Text Correction

Thanks for Peter Norvig http://norvig.com/spell-correct.html

.. code-block:: python

>>> from textblob_ar import TextBlob
>>> from textblob_ar.correction import TextCorrection
>>> text = 'الاذدهاز'
>>> TextCorrection().correct(text)
{'الاذهان', 'الازدهار', 'الادهان', 'الاندهاش'}
>>> TextCorrection().correct(text, top=True)
'الازدهاز'

Text Similarity

Based on gensim <https://radimrehurek.com/gensim>_ and Fasttext <https://fasttext.cc/docs/en/pretrained-vectors.html>_ pretrained word2vec model

The procedure used in calculating similarity is calculating mean feature vector for each sentence. Then calculate the cosine distance between those two vectors.

.. code-block:: python

>>> from textblob_ar import TextSimilarity
>>> sim = TextSimilarity()
# takes around 12 second (macbook pro 2017) to load the pretrained word2vec
>>> sent1 = u'الإرهابي الصالح هي رواية خيال سياسي للكاتبة دوريس ليسينج. ظهرت أول طبعة للرواية في سبتمبر من عام 1985 للناشرين جوناثان كيب في المملكة المتحدة وألفريد أ'
>>> sent2 = u'روايه الكاتبه دوريس ليسينج هي روايه خيال سياسي ظهرت في سبتمبر 1985 بعنوان الارهابي الصالح وتم نشرها عن طريق جوناثان كيب والفريد أ في انجلترا'
>>> sim.similarity(sent1, sent2)
0.9611366391181946

Requirements

Python >= 3.3

Installation

Development

.. code-block:: shell $ git clone https://github.com/adhaamehab/textblob-ar.git $ cd textblob_ar $ virtualenv -p python3 env $ source env/bin/activate $ pip install -Ur dev-requirements.txt

for text similarity download fasttext arabic word2vec pretrained model from here <https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md>_

TODO

Part Of Speech tagger
Noun-phrases extraction
Parser
Classification support
Grammer

License

MIT licensed. See the bundled LICENSE <https://github.com/sloria/textblob-fr/blob/master/LICENSE>_ file for more details.

.. _TextBlob: https://textblob.readthedocs.org/

.. image:: https://badges.gitter.im/textblob-ar/community.svg :alt: Join the chat at https://gitter.im/textblob-ar/community :target: https://gitter.im/textblob-ar/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 60

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗