All Projects → emumba-com → live_twitter_sentiment_analysis

emumba-com / live_twitter_sentiment_analysis

Licence: other
Live Twitter sentiment analysis using Python, Apache Spark Streaming, Kafka, NLTK, SocketIO

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to live twitter sentiment analysis

Stock-Analyser
📈 Stocks technical analysis code collection and Stocks data platform.
Stars: ✭ 30 (+50%)
Mutual labels:  nltk
ru punkt
Russian language support for NLTK's PunktSentenceTokenizer
Stars: ✭ 49 (+145%)
Mutual labels:  nltk
gnip
Connect to Gnip streaming API and manage rules
Stars: ✭ 28 (+40%)
Mutual labels:  twitter-streaming-api
nltk-maxent-pos-tagger
maximum entropy based part-of-speech tagger for NLTK
Stars: ✭ 45 (+125%)
Mutual labels:  nltk
reddit-opinion-mining
Sentiment analysis and opinion mining of Reddit data.
Stars: ✭ 15 (-25%)
Mutual labels:  nltk
NYC Taxi Pipeline
Design/Implement stream/batch architecture on NYC taxi data | #DE
Stars: ✭ 16 (-20%)
Mutual labels:  spark-streaming
recyclebin
♻️ measures usage of a particular term on twitter
Stars: ✭ 21 (+5%)
Mutual labels:  twitter-streaming-api
tweets-preprocessor
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team
Stars: ✭ 26 (+30%)
Mutual labels:  nltk
Introduction-to-text-mining-with-Python
Lectures in Urban Data Science Lab, Seoul
Stars: ✭ 25 (+25%)
Mutual labels:  nltk
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (+80%)
Mutual labels:  spark-streaming
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (+25%)
Mutual labels:  spark-streaming
Reuters-21578-Classification
Text classification with Reuters-21578 datasets using Gensim Word2Vec and Keras LSTM
Stars: ✭ 44 (+120%)
Mutual labels:  nltk
data-processing-pipeline
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra
Stars: ✭ 79 (+295%)
Mutual labels:  twitter-streaming-api
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (+25%)
Mutual labels:  spark-streaming
summarize-webpage
A small NLP SAAS project that summarize a webpage
Stars: ✭ 34 (+70%)
Mutual labels:  nltk
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (+45%)
Mutual labels:  twitter-streaming-api
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (+50%)
Mutual labels:  nltk
curso-IRI
Introdução à Recuperação de Informações
Stars: ✭ 14 (-30%)
Mutual labels:  nltk
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Stars: ✭ 42 (+110%)
Mutual labels:  spark-streaming
gcp-dataprep-bigquery-twitter-stream
Stream Twitter Data into BigQuery with Cloud Dataprep
Stars: ✭ 21 (+5%)
Mutual labels:  twitter-streaming-api

Scalable architecture for real-time Twitter sentiment analysis

This project implements a scalable architecture to monitor and visualize sentiment against a twitter hashtag in real-time. It streams live tweets from Twitter against a hashtag, performs sentiment analysis on each tweet, and calculates the rolling mean of sentiments. This sentiment mean is continuously sent to connected browser clients and displayed in a sparkline graph.

System design

Diagram below illustrates different components and information flow (from right to left). system design

Project breakdown

Project has three parts

1. Web server

WebServer is a python flask server. It fetches data from twitter using Tweepy. Tweets are pushed into Kafka. A sentiment analyzer picks tweets from kafka, performs sentiment analysis using NLTK and pushes the result back in Kafka. Sentiment is read by Spark Streaming server (part 3), it calculates the rolling average and writes data back in Kafka. In the final step, the web server reads the rolling mean from Kafka and sends it to connected clients via SocketIo. A html/JS client displays the live sentiment in a sparkline graph using google annotation charts.

Web server runs each independent task in a separate thread.
Thread 1: fetches data from twitter
Thread 2: performs sentiment analysis on each tweet
Thread 3: looks for rolling mean from spark streaming

All these threads can run as an independent service to provide a scalable and fault tolerant system.

2. Kafka

Kafka acts as a message broker between different modules running within the web server as well as between web server and spark streaming server. It provides a scalable and fault tolerant mechanism of communication between independently running services.

3. Calculating rolling mean of sentiments

A separate java program reads sentiment from Kafka using spark streaming, calculates the rolling average using spark window operations, and writes the results back to Kafka.

How to run

To run the project

  1. Download, setup and run Apache Kafka. I use following commands on OSX from bin dir of kafka
sh zookeeper-server-start.sh ../config/zookeeper.properties
sh kafka-server-start.sh ../config/server.properties
  1. Install complete NLTK
  2. Create a twitter app and set your keys in
    live_twitter_sentiment_analysis/webapp/tweet_ingestion/config.py
  3. Install python packages
pip install -r /live_twitter_sentiment_analysis/webapp/requirements.txt
  1. Run webserver
python live_twitter_sentiment_analysis/webapp/main.py
  1. Run the maven-java project (rolling_average) after installing maven dependencies specified in live_twitter_sentiment_analysis/rolling_average/pom.xml. Don't forget to set checkpoint dir in Main.java
  2. open the url localhost:8001/index.html

Output

Here is what final output looks like in browser

output

Note: Tested on python 2.7

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].