All Projects β†’ niquejoe β†’ Classification-of-Depression-on-Social-Media-Using-Text-Mining

niquejoe / Classification-of-Depression-on-Social-Media-Using-Text-Mining

Licence: other
The first asian machine learning in Jeju Island, South Korea - Project

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Classification-of-Depression-on-Social-Media-Using-Text-Mining

nhl-twitter-bot
🚨 Hockey Game Bot is a Python application that sends important NHL events to social media platforms in (near) real time.
Stars: ✭ 18 (-68.42%)
Mutual labels:  social-media
squeaknode
Peer-to-peer status feed πŸ“œ with posts unlocked by Lightning ⚑
Stars: ✭ 29 (-49.12%)
Mutual labels:  social-media
phd-resources
Internet Delivered Treatment using Adaptive Technology
Stars: ✭ 37 (-35.09%)
Mutual labels:  depression
Socioboard-5.0
Socioboard is world's first and open source Social Technology Enabler. Socioboard Core is our flagship product.
Stars: ✭ 663 (+1063.16%)
Mutual labels:  social-media
apollo-instagram-clone
Apollogram | A place where you could share photos, like media, and follow peoples.
Stars: ✭ 24 (-57.89%)
Mutual labels:  social-media
E-commerceCustomerFYP
Android E-commerce Platform. Allow customer to buy product, chat, feedback rating, make payment to retailer
Stars: ✭ 41 (-28.07%)
Mutual labels:  social-media
InstaCrawlR
Crawl public Instagram data using R scripts without API access token. See InstaCrawlR Instructions.pdf
Stars: ✭ 108 (+89.47%)
Mutual labels:  social-media
Ocelot-Social
Free and open-source social network for active citizenship.
Stars: ✭ 49 (-14.04%)
Mutual labels:  social-media
VKontakte
[READ ONLY] Subtree split of the SocialiteProviders/VKontakte Provider (see SocialiteProviders/Providers)
Stars: ✭ 82 (+43.86%)
Mutual labels:  social-media
watchman
Watchman: An open-source social-media event-detection system
Stars: ✭ 18 (-68.42%)
Mutual labels:  social-media
Twitch
[READ ONLY] Subtree split of the SocialiteProviders/Twitch Provider (see SocialiteProviders/Providers)
Stars: ✭ 20 (-64.91%)
Mutual labels:  social-media
affiliate-marketing-disclosures
Code and data belonging to our CSCW 2018 paper: "Endorsements on Social Media: An Empirical Study of Affiliate Marketing Disclosures on YouTube and Pinterest".
Stars: ✭ 22 (-61.4%)
Mutual labels:  social-media
v chat sdk
official sdk for v chat this is a complete chat ecosystem use flutter for clint node js and socket io for server side flutter chat v chat sdk and flutter group chat
Stars: ✭ 25 (-56.14%)
Mutual labels:  social-media
awosome-ai-in-social-media
πŸ’» Collect those AI & Bot use in social media wechat/facebook/twitter/instagram/weibo/TikTok etc.
Stars: ✭ 21 (-63.16%)
Mutual labels:  social-media
Pepaverse
Pepaverse is an open source social network build with nodejs, mongoDB, passportjs and socket.io
Stars: ✭ 16 (-71.93%)
Mutual labels:  social-media
Hashtag-Wall-Server
Hashtag wall that displays posts from social media
Stars: ✭ 33 (-42.11%)
Mutual labels:  social-media
Spotify
[READ ONLY] Subtree split of the SocialiteProviders/Spotify Provider (see SocialiteProviders/Providers)
Stars: ✭ 13 (-77.19%)
Mutual labels:  social-media
gobo
πŸ’­ Gobo: Your social media. Your rules.
Stars: ✭ 87 (+52.63%)
Mutual labels:  social-media
Hacker
SOCIAL MEDIA PHISHING TOOL
Stars: ✭ 36 (-36.84%)
Mutual labels:  social-media
sociallink
Alignments between knowledge bases and social media
Stars: ✭ 16 (-71.93%)
Mutual labels:  social-media

Classification of Depression on Social Media Using Text Mining

Author | Introduction | The Project | Video Demo | References | Acknowledgement

Author - μ €μž

Name: Nikie Jo Elauria Deocampo

Country: Philippines

Educational Background:

Undergraduate: Bachelor of Science in Information System

Graduate: Masters in Information Technology

School: West Visayas State University

Mentor: Dr. Bobby Gerardo

Motto: I work hard so my dog can have a better life.

Introduction - μ†Œκ°œ

Mental illness has been prevalent in the world, depression is one of the most common psychological problem i know and i would like to help as much as i can. Being a fan of Anthony Bourdain and Robin Williams, It has propel me to explore in this study. With the use of the Large amount of data tweets and Facebook post online i can use machine learning to data mine it and be able to produce a meaningful and useful outcome.

Social media generates countless data every day because of millions of active users share and communicate in entire community, it changes human interaction. For this project, I will be using Python and various modules and libraries.

The Project - ν”„λ‘œμ νŠΈ

Requirements:

  • Python 3.6.1 or Higher
  • Twitter developer account
  • A bunch of modules (Keras, TF, Numpy, sklearns, pandas and itertools)
  • A lot of patience and a love for machine learning.

The aim of the project is to predict early signs of depression through Social Media text mining. Below are the steps to run the python codes using the data sets uploaded in the repositories or you can download your own.

Follow steps below:

  1. Create a twitter developers account ( Register Here), From that account your would need 4 things.
  2. consumer_key = '', consumer_secret = '', access_token = '', access_secret = ''
  3. Using the file "Download_twitter_Api.py" insert the credentials and you can download current tweets using keywords such us depression, anxiety or sadness. When data sets are ready you may proceed on the preprocessing stage.
  4. Run "preprocessor.py", This stage will go through your data sets and the given dictionary. The dictionary contain words with their corresponding polarity, which is essential to calcualting the sentiment of each tweet, each word will be seperated, tokenized and given its polarity. Every tweet will consist of the summation of all polarity of each word and devided by number of words in that tweet.
  5. Once preprocess is done. You can find the file in the directory "processed_data/output.xlsx". Opening it you will find that the ID (tweet) and Sentiment of each tweet is seperated into 2 columns. With this output you now have a twitter data set and its corresponding sentiment filtered by depress keywords. (Positive, Neutral and Negative).
  6. Now for training and Predicting. Make sure all files are located in proper folders, Run "depression_sentiment_analysis.py". The code will run through the output.xlsx file and at the same time recover the tweet corresponding to the id of each sentiment. using this we use the original data and feed them to our classifiers. When everything is done you should have all the AUC of each classifier listed in the console.
  7. But wait, There's more. You will also have the ability to type in a sample tweet, The tweet will go through the highest AUC in the list of classifier to predict the sentiment of the tweet you wrote.

What the result could mean? Postive, This mean that person is unlikely to have depression or anxiety. Neutral, This is the middle level wherein the user may or may not have depression but may also be more prone to being depress. At that stage the user may display some depression like symptoms. lasty, Negative is the lowest level where depression and anxiety symptoms are being detected through the users tweets. The more negative words the user uses mean the more negative emotion the tweet has.

Video Tutorials

Results - κ²°κ³Όλ“€

Below are the Matrix for the 5 classifier with Decision tree having the highest score.

Using the same data set to test my accuracy, I trained and tested about 10,000 Tweets:

AUC is an abbrevation for area under the curve. It is used in classification analysis in order to determine which of the used models predicts the classes best.

Accuracy:

  • Naive Bayes Accuracy: 93.79406648429645 %
  • Decision Tree: 98.55668748040587 %
  • Support Vector Machine: 50.0 %
  • Kneighbors: 81.464022923447 %
  • Random Forest: 49.1038137743686 %

Completion Time:

  • Naive Bayes Accuracy: 0.59779 Seconds
  • Decision Tree: 3.40457 Seconds
  • Support Vector Machine: 29.83311 Seconds
  • Kneighbors: 7.99048 Seconds
  • Random Forest: 0.60994 Seconds

Future Plans - ν–₯ν›„ κ³„νš

This study is not yet perfect and im still aiming to improve it.

  • Use Contextual Semantic segmentation
  • Use Stopwords to increase accuracy of model
  • Eliminating features with extremely low frequency
  • Use Complex Features: n-grams and part of speech tags

References - μ°Έκ³ 

Acknowledgement - 승인

This work is not possible without the overwhelming support from Jeju National University, Jeju Development Center and other selfless sponsors. I would like to specifically give a big thanks to Prof. Yungcheol Byun for being the best host ever and my mentor Dr. Bobby Gerardo for the help and guidance.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].