This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+647.37%)

Mutual labels: text-classification

Kaggle-Twitter-Sentiment-Analysis

Kaggle Twitter Sentiment Analysis Competition

Stars: ✭ 18 (-5.26%)

Mutual labels: text-classification

twgitbot

A node.js bot that checks a github repo changes and tweets it to your Twitter account

Stars: ✭ 10 (-47.37%)

Mutual labels: twitter-api

twitter4j-v2

a simple wrapper for Twitter API v2 that is designed to be used with Twitter4J

Stars: ✭ 22 (+15.79%)

Mutual labels: twitter-api

WeSTClass

[CIKM 2018] Weakly-Supervised Neural Text Classification

Stars: ✭ 67 (+252.63%)

Mutual labels: text-classification

opentc

OpenTC is a text classification engine using several algorithms in machine learning

Stars: ✭ 27 (+42.11%)

Mutual labels: text-classification

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (+73.68%)

Mutual labels: text-classification

tweet-to-image

Convert tweets to beautiful images

Stars: ✭ 134 (+605.26%)

Mutual labels: twitter-api

node-fasttext

Nodejs binding for fasttext representation and classification.

Stars: ✭ 39 (+105.26%)

Mutual labels: text-classification

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (+15.79%)

Mutual labels: text-classification

Lbl2Vec

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Stars: ✭ 25 (+31.58%)

Mutual labels: text-classification

TwitterPiBot

A Python based bot for Raspberry Pi that grabs tweets with a specific hashtag and reads them out loud.

Stars: ✭ 85 (+347.37%)

Mutual labels: twitter-api

tweetsOLAPing

implementing an end-to-end tweets ETL/Analysis pipeline.

Stars: ✭ 24 (+26.32%)

Mutual labels: twitter-api

markdown-tweet-scheduler

Schedule daily tweets from markdown files in your repo, posted via github actions.

Stars: ✭ 49 (+157.89%)

Mutual labels: twitter-api

cnn-text-classification-keras

Convolutional Neural Network for Text Classification in Keras

Stars: ✭ 14 (-26.32%)

Mutual labels: text-classification

archive-explorer-web

Browse your Twitter archive with a friendly, responsive, full experience, and quickly delete the tweets you don't want.

Stars: ✭ 19 (+0%)

Mutual labels: twitter-api

View All Similar Projects ➔

Detecting Offensive Language in Tweets

This project aims to detect offensive language in tweets using ML Classification Algorithms. A training and predicting pipeline is implemented to contrast performance of various popular classification algorithms and determine the best suited model.

Data

Data is taken from two sources:

Hate Speech Twitter Annotations
- Publication: Z. Waseem and D. Hovy. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In NAACL SRW, pages 88–93, 2016
- Authors: Waseem, Zeerak and Hovy, Dirk
- GitHub Link: https://github.com/zeerakw/hatespeech
- Description: The dataset contains about 17,000 Tweet ID's labeled for racism and sexism. We downloaded this dataset and querried a Twitter API to scrape the actual tweets from Twitter. Retrieval of about tweets 5,900 failed either because the tweet was deleted or the account was deactivated.
Hate Speech and Offensive Language Detection
- Publication: Automated Hate Speech Detection and the Problem of Offensive Language
- Authors: Davidson, Thomas and Warmsley, Dana and Macy, Michael and Weber, Ingmar
- GitHub Link: https://github.com/t-davidson/hate-speech-and-offensive-language
- Description: The dataset has about 25,000 Tweets annotated by crowd sourcing. As per the number of users labeling the Tweets, each is put in one of three classes - hate speech, offensive language and neither. We downloaded the dataset in Python from GitHub as a csv file.
  The data from both sources are clubbed. Here is the distribution of the tweets into the two classes:

Code

The code is distributed into two Juypyter Notebooks which can be viewed in rendered format on the links:

cyberbullying_wrangling.ipynb | Link
cyberbullying_v2.ipynb | Link

Summary

After tuning hyper-parameters to optimize the algorithms, Stochastic Gradient Descent was found to be the best suited algorithm, taking both performance and time complexity into account. Following performance metrics were achievec:

Accuracy: 92.81 %
Precision: 96.97 %
Recall: 91.94 %
F1-Score: 94.39 %

License

The contents of this repository are covered under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dhavalpotdar / detecting-offensive-language-in-tweets

Programming Languages

Labels

Projects that are alternatives of or similar to detecting-offensive-language-in-tweets

Detecting Offensive Language in Tweets

Data

Code

Summary

License