All Projects → dhavalpotdar → detecting-offensive-language-in-tweets

dhavalpotdar / detecting-offensive-language-in-tweets

Licence: MIT License
Detecting cyberbullying in tweets using Machine Learning

Programming Languages

Jupyter Notebook
11667 projects
HTML
75241 projects

Projects that are alternatives of or similar to detecting-offensive-language-in-tweets

Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (+415.79%)
Mutual labels:  twitter-api, text-classification
TextUnderstandingTsetlinMachine
Using the Tsetlin Machine to learn human-interpretable rules for high-accuracy text categorization with medical applications
Stars: ✭ 48 (+152.63%)
Mutual labels:  text-classification
fake-news-detection
This repo is a collection of AWESOME things about fake news detection, including papers, code, etc.
Stars: ✭ 34 (+78.95%)
Mutual labels:  text-classification
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+247.37%)
Mutual labels:  text-classification
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+647.37%)
Mutual labels:  text-classification
Kaggle-Twitter-Sentiment-Analysis
Kaggle Twitter Sentiment Analysis Competition
Stars: ✭ 18 (-5.26%)
Mutual labels:  text-classification
twgitbot
A node.js bot that checks a github repo changes and tweets it to your Twitter account
Stars: ✭ 10 (-47.37%)
Mutual labels:  twitter-api
twitter4j-v2
a simple wrapper for Twitter API v2 that is designed to be used with Twitter4J
Stars: ✭ 22 (+15.79%)
Mutual labels:  twitter-api
WeSTClass
[CIKM 2018] Weakly-Supervised Neural Text Classification
Stars: ✭ 67 (+252.63%)
Mutual labels:  text-classification
opentc
OpenTC is a text classification engine using several algorithms in machine learning
Stars: ✭ 27 (+42.11%)
Mutual labels:  text-classification
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (+73.68%)
Mutual labels:  text-classification
tweet-to-image
Convert tweets to beautiful images
Stars: ✭ 134 (+605.26%)
Mutual labels:  twitter-api
node-fasttext
Nodejs binding for fasttext representation and classification.
Stars: ✭ 39 (+105.26%)
Mutual labels:  text-classification
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (+15.79%)
Mutual labels:  text-classification
Lbl2Vec
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Stars: ✭ 25 (+31.58%)
Mutual labels:  text-classification
TwitterPiBot
A Python based bot for Raspberry Pi that grabs tweets with a specific hashtag and reads them out loud.
Stars: ✭ 85 (+347.37%)
Mutual labels:  twitter-api
tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
Stars: ✭ 24 (+26.32%)
Mutual labels:  twitter-api
markdown-tweet-scheduler
Schedule daily tweets from markdown files in your repo, posted via github actions.
Stars: ✭ 49 (+157.89%)
Mutual labels:  twitter-api
cnn-text-classification-keras
Convolutional Neural Network for Text Classification in Keras
Stars: ✭ 14 (-26.32%)
Mutual labels:  text-classification
archive-explorer-web
Browse your Twitter archive with a friendly, responsive, full experience, and quickly delete the tweets you don't want.
Stars: ✭ 19 (+0%)
Mutual labels:  twitter-api

Detecting Offensive Language in Tweets

This project aims to detect offensive language in tweets using ML Classification Algorithms. A training and predicting pipeline is implemented to contrast performance of various popular classification algorithms and determine the best suited model.

Data

Data is taken from two sources:

  1. Hate Speech Twitter Annotations

    • Publication: Z. Waseem and D. Hovy. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In NAACL SRW, pages 88–93, 2016
    • Authors: Waseem, Zeerak and Hovy, Dirk
    • GitHub Link: https://github.com/zeerakw/hatespeech
    • Description: The dataset contains about 17,000 Tweet ID's labeled for racism and sexism. We downloaded this dataset and querried a Twitter API to scrape the actual tweets from Twitter. Retrieval of about tweets 5,900 failed either because the tweet was deleted or the account was deactivated.
  2. Hate Speech and Offensive Language Detection

    • Publication: Automated Hate Speech Detection and the Problem of Offensive Language
    • Authors: Davidson, Thomas and Warmsley, Dana and Macy, Michael and Weber, Ingmar
    • GitHub Link: https://github.com/t-davidson/hate-speech-and-offensive-language
    • Description: The dataset has about 25,000 Tweets annotated by crowd sourcing. As per the number of users labeling the Tweets, each is put in one of three classes - hate speech, offensive language and neither. We downloaded the dataset in Python from GitHub as a csv file.
      The data from both sources are clubbed. Here is the distribution of the tweets into the two classes:

Dataset

Code

The code is distributed into two Juypyter Notebooks which can be viewed in rendered format on the links:

  • cyberbullying_wrangling.ipynb | Link
  • cyberbullying_v2.ipynb | Link

Summary

performance

time

After tuning hyper-parameters to optimize the algorithms, Stochastic Gradient Descent was found to be the best suited algorithm, taking both performance and time complexity into account. Following performance metrics were achievec:

  • Accuracy: 92.81 %
  • Precision: 96.97 %
  • Recall: 91.94 %
  • F1-Score: 94.39 %

License

The contents of this repository are covered under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].