All Projects → meisaputri21 → Indonesian-Twitter-Emotion-Dataset

meisaputri21 / Indonesian-Twitter-Emotion-Dataset

Licence: other
Indonesian twitter dataset for emotion classification task

Projects that are alternatives of or similar to Indonesian-Twitter-Emotion-Dataset

Nlp classification
Implementing nlp papers relevant to classification with PyTorch, gluonnlp
Stars: ✭ 202 (+312.24%)
Mutual labels:  text-classification
Paddlenlp
NLP Core Library and Model Zoo based on PaddlePaddle 2.0
Stars: ✭ 212 (+332.65%)
Mutual labels:  text-classification
Hierarchical Attention Networks Pytorch
Hierarchical Attention Networks for document classification
Stars: ✭ 239 (+387.76%)
Mutual labels:  text-classification
Chinese ulmfit
中文ULMFiT 情感分析 文本分类
Stars: ✭ 208 (+324.49%)
Mutual labels:  text-classification
Interpret Text
A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard.
Stars: ✭ 220 (+348.98%)
Mutual labels:  text-classification
Pytorch Transformers Classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (+367.35%)
Mutual labels:  text-classification
Fake news detection
Fake News Detection in Python
Stars: ✭ 194 (+295.92%)
Mutual labels:  text-classification
protonet-bert-text-classification
finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet
Stars: ✭ 28 (-42.86%)
Mutual labels:  text-classification
Bert4doc Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Stars: ✭ 220 (+348.98%)
Mutual labels:  text-classification
Cnn Text Classification Tf Chinese
CNN for Chinese Text Classification in Tensorflow
Stars: ✭ 237 (+383.67%)
Mutual labels:  text-classification
Cnn Text Classification Keras
Text Classification by Convolutional Neural Network in Keras
Stars: ✭ 213 (+334.69%)
Mutual labels:  text-classification
Text Classification
Text Classification through CNN, RNN & HAN using Keras
Stars: ✭ 216 (+340.82%)
Mutual labels:  text-classification
Fancy Nlp
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Stars: ✭ 233 (+375.51%)
Mutual labels:  text-classification
Icdar 2019 Sroie
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction
Stars: ✭ 202 (+312.24%)
Mutual labels:  text-classification
Text Classification
Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK
Stars: ✭ 239 (+387.76%)
Mutual labels:  text-classification
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+300%)
Mutual labels:  text-classification
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+5622.45%)
Mutual labels:  text-classification
clustext
Easy, fast clustering of texts
Stars: ✭ 18 (-63.27%)
Mutual labels:  text-classification
Ai law
all kinds of baseline models for long text classificaiton( text categorization)
Stars: ✭ 243 (+395.92%)
Mutual labels:  text-classification
Chinese text cnn
TextCNN Pytorch实现 中文文本分类 情感分析
Stars: ✭ 235 (+379.59%)
Mutual labels:  text-classification

Indonesian-Twitter-Emotion-Dataset

This dataset contains 4.403 Indonesian tweets which are labeled into five emotion classes: love, anger, sadness, joy and fear.

Data Format

Each line consists of a tweet and its respective emotion label separated by semicolon (,). The first line is a header. For a tweet with coma (,) inside the text, there is an quote (" ") to avoid column separation.
The tweets in this dataset has been pre-processed using the following criterias:

  1. Username mention (@) has been replaced with term [USERNAME]
  2. URL/hyperlink (http://... or https://...) has been replaced with term [URL]
  3. Sensitive number, such as phone number, invoice number and courier tracking number has been replaced with term [SENSITIVE-NO]

Pre-trained Word Embedding

We have trained 1 Millions Indonesian tweets into Word2Vec and FastText vector. Those pre-trained word embedding can be downloaded here.

Citation

If you want to publish a paper using this dataset and pre-trained word embedding, please cite this publication:
Mei Silviana Saputri, Rahmad Mahendra, and Mirna Adriani, "Emotion Classification on Indonesian Twitter Dataset", in Proceeding of International Conference on Asian Language Processing 2018. 2018.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].