All Projects → MaartenGr → soan

MaartenGr / soan

Licence: MIT license
Social Analysis based on Whatsapp data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to soan

Cadmium
Natural Language Processing (NLP) library for Crystal
Stars: ✭ 172 (+62.26%)
Mutual labels:  sentiment-analysis, tf-idf
lorca
Natural Language Processing for Spanish in Node.js. Stemmer, sentiment analysis, readability, tf-idf with batteries, concordance and more!
Stars: ✭ 95 (-10.38%)
Mutual labels:  sentiment-analysis, tf-idf
chartjs-chart-wordcloud
Chart.js Word Clouds
Stars: ✭ 34 (-67.92%)
Mutual labels:  word-cloud, wordcloud
Whatsapp Android App
This is sample code for layout for chatting app like Whatsapp.
Stars: ✭ 32 (-69.81%)
Mutual labels:  whatsapp, whatsapp-analysis
visualization
Text visualization tools
Stars: ✭ 18 (-83.02%)
Mutual labels:  sentiment-analysis, word-cloud
Paribhasha
paribhasha.herokuapp.com/
Stars: ✭ 21 (-80.19%)
Mutual labels:  sentiment-analysis, word-cloud
Wordcloud2.js
Tag cloud/Wordle presentation on 2D canvas or HTML
Stars: ✭ 1,905 (+1697.17%)
Mutual labels:  word-cloud, wordcloud
Whatsapp-analytics
performing sentiment analysis on the whatsapp chats.
Stars: ✭ 20 (-81.13%)
Mutual labels:  sentiment-analysis, whatsapp-analysis
r-whatsapp-analysis-parte1
Análisis de texto y visualización de datos con R, de conversaciones de WhatsApp, primer parte. Uso de librería rwhatsapp.
Stars: ✭ 22 (-79.25%)
Mutual labels:  whatsapp-statistics, whatsapp-analysis
Nana-MD
Nana Multi Device Testing Bot
Stars: ✭ 29 (-72.64%)
Mutual labels:  whatsapp, whatsapp-statistics
phone-reviews-nlp
Modern NLP and sentiment analysis on amazon mobile phone reviews
Stars: ✭ 21 (-80.19%)
Mutual labels:  sentiment-analysis
Stickerworld
Bot for whatsapp that automatically generates Sticker from the images or videos it receives
Stars: ✭ 47 (-55.66%)
Mutual labels:  whatsapp
Keywords-Abstract-TFIDF-TextRank4ZH
使用tf-idf, TextRank4ZH等不同方式从中文文本中提取关键字,从中文文本中提取摘要和关键词
Stars: ✭ 26 (-75.47%)
Mutual labels:  tf-idf
PlanSum
[AAAI2021] Unsupervised Opinion Summarization with Content Planning
Stars: ✭ 25 (-76.42%)
Mutual labels:  sentiment-analysis
whatsapp-bot
Made with Python and Selenium, can be used to send multiple messages and send messages as characters made of emojis
Stars: ✭ 34 (-67.92%)
Mutual labels:  whatsapp
ArSarcasm
This repository contains the Arabic sarcasm dataset (ArSarcasm)
Stars: ✭ 18 (-83.02%)
Mutual labels:  sentiment-analysis
wppconnect-php-client
Um simples cliente PHP que proporciona acesso fácil aos endpoints do WPPConnect Server.
Stars: ✭ 29 (-72.64%)
Mutual labels:  whatsapp
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+38.68%)
Mutual labels:  sentiment-analysis
twitter mining
Twitter Mining in Java
Stars: ✭ 25 (-76.42%)
Mutual labels:  sentiment-analysis
twitter-aws-comprehend
An app to analyze tweets using Amazon Comprehend's Sentiment Analysis service
Stars: ✭ 13 (-87.74%)
Mutual labels:  sentiment-analysis

SoAn

Code for applying natural language processing methods on whatsapp conversations

SoAn (Social Analysis) can be used to extract word frequency, word clouds, TF-IDF, sentiment analysis, and more from whatsapp conversations. The main application was initially used to analyze the messages between my wife and me, but I extended so that it can be used for your own messages.

Table of Contents

  1. Instructions

  2. Output

    a. General Plots

    b. TF-IDF

    c. Emoji

    d. Sentiment

    e. Word Clouds

    f. Topic Modeling

1. Instructions

Back to ToC

There are several steps for using this repository:

  • Download or fork this repository
  • Install the requirements with pip install -r requirements.txt
  • Save your whatsapp.txt file in the data folder
    • To download your whatsapp messages simply go open your whatsapp, go to a conversation, click the three vertical dots and export the file
  • Finally, from the commandline, run the following:
    • python soan.py --file whatsapp.txt --language english
  • The results will be saved as images and text files in the results folder

In the notebooks folder, you will also find the soan.ipynb where you can run individual pieces of the code.

2. Output

Back to ToC

2.a General Plots

There are 4 types of plots to be generated:

  • Messages over time

  • Active days of each user

    • Spider
    • Histogram
  • Active hours of each user

  • Calendar plot

  • There are 2 types of stats that are generated:

    • General statistics (text frequency, etc.)
    • Timing

Below are some examples of the plots above:

Below are some examples of the text generated:

##########################
Number of Messages
##########################

4444 Her
3266 Me

#########################
Messages per hour
#########################

Her: 0.1259887165820883
Me: 0.09259206758710628

2.b TF-IDF

Using a class-based TF-IDF, I extract the most important words per person and plot them using a horizontal barchart with a mask as image. I created a horizontal bar chart with two bars stacked on top of each other both plotted on a background image. I started with a background image and plotted the actual values on the left and made it fully transparent with a white border to separate the bars. Then, on top of that I plotted which bars so that the right part of the image would get removed.

NOTE: In the notebook, you will see more instructions on how to use your own image.

2.c Emoji

These analysis are based on the Emojis used in each message. Below you can find the following:

  • Unique Emoji per user
  • Commonly used Emoji per user

2.d Sentiment Analysis

The sentiment from each sentence in the messages is extract per user using Vader and visualized as follows:

2.e Sentiment Analysis

For each user, a word cloud will be made based on frequent and important words. Stopwords are removed if you have supplied the language:

2.f Topic Modeling

For each user, the most frequent topics using LDA and NMF are modeled and saved a .txt file:

Me

Topics in nmf model:
Topic #0: ga boodschappen nodig lieverd halen uurtje half
Topic #1: thuis wel goed haha lekker we morgen
Topic #2: lieverd dank hey fijn allerliefste plezier verwacht
Topic #3: gezellig jeey super jeeeey erg hartstikke samen
Topic #4: love you most more schattie much very

Visualizations Wife
Below, you will find an overview of the visualizations I made for my wife, in part using this package:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].