All Projects → harshkavdikar1 → Tweet-Analysis-With-Kafka-and-Spark

harshkavdikar1 / Tweet-Analysis-With-Kafka-and-Spark

Licence: other
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Tweet-Analysis-With-Kafka-and-Spark

litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (+100%)
Mutual labels:  spark-streaming, spark-sql
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+9461.11%)
Mutual labels:  spark-streaming, spark-sql
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (+88.89%)
Mutual labels:  spark-streaming, spark-sql
nodejs-dev-vm
DEPRECATED Simple Node.js Development VM using Vagrant + VirtualBox + Ansible
Stars: ✭ 25 (+38.89%)
Mutual labels:  node-js
debug
A tiny JavaScript debugging utility modelled after Node.js core's debugging technique. Works in Node.js and web browsers
Stars: ✭ 10,554 (+58533.33%)
Mutual labels:  node-js
trifolia-on-fhir
Sister product to Trifolia Workbench that has native support for FHIR resources
Stars: ✭ 23 (+27.78%)
Mutual labels:  node-js
geospark
bring sf to spark in production
Stars: ✭ 53 (+194.44%)
Mutual labels:  spark-sql
E-Voting-App
A simple E-voting Decentralised App using the Ethereum Blockchain, Solidity and the MERN(MongoDB, Express.js, ReactJS, Node.js) stack
Stars: ✭ 84 (+366.67%)
Mutual labels:  node-js
recent-activity
Add your recent activity to your profile readme!
Stars: ✭ 87 (+383.33%)
Mutual labels:  node-js
backend-server
📠 The backend of the Fairfield Programming Association website.
Stars: ✭ 26 (+44.44%)
Mutual labels:  node-js
content-moderation-image-api
An NSFW Image Classification REST API for effortless Content Moderation built with Node.js, Tensorflow, and Parse Server
Stars: ✭ 50 (+177.78%)
Mutual labels:  node-js
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+205.56%)
Mutual labels:  spark-sql
op-mattermost
OpenProject and Mattermost integration
Stars: ✭ 19 (+5.56%)
Mutual labels:  node-js
posthog-node
Official PostHog Node library
Stars: ✭ 18 (+0%)
Mutual labels:  node-js
midtrans-nodejs-client
Official Midtrans Payment API Client for Node JS | https://midtrans.com
Stars: ✭ 124 (+588.89%)
Mutual labels:  node-js
haykal
A Typescript MVC Framework
Stars: ✭ 24 (+33.33%)
Mutual labels:  node-js
biguint-format
Node.js module to format big uint numbers from a byte array or a Buffer
Stars: ✭ 16 (-11.11%)
Mutual labels:  node-js
winston-dev-console
Winston@3 console format aimed to improve development UX
Stars: ✭ 88 (+388.89%)
Mutual labels:  node-js
boilerplate
Boilerplate for @prisma-cms
Stars: ✭ 22 (+22.22%)
Mutual labels:  node-js
Ecommerce
Angular 6 Ecommerce Application POC
Stars: ✭ 46 (+155.56%)
Mutual labels:  highcharts

Tweet Analysis using Kafka and Spark Streaming

Built a real-time analytics dashboard to visualize the trending hashtags and @mentions at a given location by using real time streaming twitter API to get data.

Installation Guide

Download and Install Kafka, Spark, Python and npm.

  1. You can refer to following guide to install kafka.
  2. https://towardsdatascience.com/running-zookeeper-kafka-on-windows-10-14fc70dcc771

  3. Spark can be downloaded from following link
  4. https://spark.apache.org/downloads.html


How to run the code.

  • Create kafka topic.
  • Update conf file with your secret key and access tokens.
  • Install Python dependencies.
  •  pip install -r requirements.txt
    
  • Install Node js dependencies.
  • npm install
    
  • Start Zookeeper
  • Open cmd and execute

    zkserver
    
  • Start Kafka
  • Go to Kafka installation directory. ..\kafka_2.11-2.3.1\bin\windows. Open cmd here and execute following command.

    kafka-server-start.bat C:\ProgramData\Java\kafka_2.11-2.3.1\config\server.properties
    
  • Run python file to fetch tweets.
  • python fetch_tweets.py
    
  • Run python file to analyze tweets.
  • python analyze_tweets.py
    
  • Start npm server
  • npm start
    

Technology stack

stack


Area Technology
Front-End HTML5, Bootstrap, CSS3, Socket.IO, highcharts.js
Back-End Express, Node.js
Cluster Computing Framework Apache Spark (python)
Message Broker Apache kafka

Architecture


architecture


How it works

  1. Extract data from Twitter's streaming API and put it into Kakfa topic.
  2. Spark is listening to this topic, it will read the data from topic, analyze it is using spark streaming and put top 10 trending hashtags and @mentions into another kafka topic.
  3. Spark Streaming creates DStream whenever it read the data from kafka and analyze it by performing operation like map, filter, updateStateByKey, countByValues and forEachRDD on the RDD and top 10 hashtags and mentions are obtained from RDD using SparkSQL.
  4. Node.js will pick up the this data from kafka topic on server side and emit it to the socket.
  5. Socket will push data to user's dashboard which is rendered using highcharts.js in realtime.
  6. The dashboard is refreshed every 60 secs.


hashtags

mentions

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].