All Projects → iabufarha → ArSarcasm

iabufarha / ArSarcasm

Licence: MIT license
This repository contains the Arabic sarcasm dataset (ArSarcasm)

Projects that are alternatives of or similar to ArSarcasm

ar-embeddings
Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec
Stars: ✭ 83 (+361.11%)
Mutual labels:  sentiment-analysis, arabic-nlp
sarcasm-detection-for-sentiment-analysis
Sarcasm Detection for Sentiment Analysis
Stars: ✭ 21 (+16.67%)
Mutual labels:  sentiment-analysis, sarcasm-detection
arabic-sentiment-analysis
Sentiment Analysis in Arabic tweets
Stars: ✭ 64 (+255.56%)
Mutual labels:  sentiment-analysis, arabic-nlp
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-16.67%)
Mutual labels:  sentiment-analysis
AirBnbPricePrediction
Training and Testing a Set of Machine Learning/Deep Learning Models to Predict Airbnb Prices for NYC
Stars: ✭ 47 (+161.11%)
Mutual labels:  sentiment-analysis
twitter mining
Twitter Mining in Java
Stars: ✭ 25 (+38.89%)
Mutual labels:  sentiment-analysis
masader
The largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
Stars: ✭ 66 (+266.67%)
Mutual labels:  arabic-nlp
sentiment-analysis-imdb
This is a classifier focused on sentiment analysis of movie reviews
Stars: ✭ 11 (-38.89%)
Mutual labels:  sentiment-analysis
Scon-ABSA
[CIKM 2021] Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning
Stars: ✭ 17 (-5.56%)
Mutual labels:  sentiment-analysis
pytorch-sentiment-analysis
char-rnn implementation for sentiment analysis on twitter data
Stars: ✭ 32 (+77.78%)
Mutual labels:  sentiment-analysis
athena
Opinion mining
Stars: ✭ 25 (+38.89%)
Mutual labels:  sentiment-analysis
Dataset-Sentimen-Analisis-Bahasa-Indonesia
Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…
Stars: ✭ 38 (+111.11%)
Mutual labels:  sentiment-analysis
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+716.67%)
Mutual labels:  sentiment-analysis
amazon-reviews
Sentiment Analysis & Topic Modeling with Amazon Reviews
Stars: ✭ 26 (+44.44%)
Mutual labels:  sentiment-analysis
senticnetapi
Simple API to use SenticNet
Stars: ✭ 69 (+283.33%)
Mutual labels:  sentiment-analysis
SentimentAnalysis
基于新浪微博数据的情感极性分析
Stars: ✭ 43 (+138.89%)
Mutual labels:  sentiment-analysis
hfusion
Multimodal sentiment analysis using hierarchical fusion with context modeling
Stars: ✭ 42 (+133.33%)
Mutual labels:  sentiment-analysis
rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
Stars: ✭ 25 (+38.89%)
Mutual labels:  sentiment-analysis
char-cnn-text-classification-tensorflow
Simple Convolutional Neural Network (CNN) for sentiment classification of Chinese movie reviews.
Stars: ✭ 55 (+205.56%)
Mutual labels:  sentiment-analysis
tajmeeaton
تجميعة من المشاريع، وخصوصا مفتوحة المصدر، للنهوض باللغة العربية والأمة. 👨‍💻 👨‍🔬👨‍🏫🧕
Stars: ✭ 115 (+538.89%)
Mutual labels:  arabic-nlp

ArSarcasm Dataset

ArSarcasm is a new Arabic sarcasm detection dataset. The dataset was created using previously available Arabic sentiment analysis datasets (SemEval 2017 and ASTD) and adds sarcasm and dialect labels to them. The dataset contains 10,547 tweets, 1,682 (16%) of which are sarcastic. For more details, please check our paper From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset

Dataset details:

ArSarcasm is provided in a CSV format, we provide an 80/20 train/test split to keep things consistent for future comparisons. The training set contains 8,437 tweets, while the test set contains 2,110 tweets.

The dataset contains the following fields:

  • tweet: the original tweet text surrounded by quotes (").
  • sarcasm: boolean that indicates whether a tweet is sarcastic or not.
  • sentiment: the sentiment from the new annotation (positive, negative, neutral).
  • original_sentiment: the sentiment in the original annotations (positive, negative, neutral).
  • source: the original source of tweet SemEval or ASTD.
  • dialect: the dialect used in the tweet, we used the 5 main regions in the Arab world, follows the labels and their meanings:
    • msa: modern standard Arabic.
    • egypt: the dialect of Egypt and Sudan.
    • levant: the Levantine dialect including Palestine, Jordan, Syria and Lebanon.
    • gulf: the Gulf countries including Saudi Arabia, UAE, Qatar, Bahrain, Yemen, Oman, Iraq and Kuwait.
    • magreb: the North African Arab countries including Algeria, Libya, Tunisia and Morocco.

Citation

Please use the following citation if you use ArSarcasm:

@inproceedings{abu-farha-magdy-2020-arabic,
    title = "From {A}rabic Sentiment Analysis to Sarcasm Detection: The {A}r{S}arcasm Dataset",
    author = "Abu Farha, Ibrahim  and Magdy, Walid",
    booktitle = "Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resource Association",
    url = "https://www.aclweb.org/anthology/2020.osact-1.5",
    pages = "32--39",
    language = "English",
    ISBN = "979-10-95546-51-1",
}

Other resources

If you are interested in other Arabic NLP resources check:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].