All Projects → P7h → Stormtweetssentimentd3viz

P7h / Stormtweetssentimentd3viz

Licence: apache-2.0
Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Stormtweetssentimentd3viz

Bigdata Notebook
Stars: ✭ 100 (+300%)
Mutual labels:  hadoop, storm
Storm Camel Example
Real-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.
Stars: ✭ 28 (+12%)
Mutual labels:  hadoop, storm
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+43864%)
Mutual labels:  hadoop, storm
Recommendsys
推荐项目(实时推荐和离线推荐)
Stars: ✭ 198 (+692%)
Mutual labels:  hadoop, storm
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+712%)
Mutual labels:  hadoop, storm
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (+48%)
Mutual labels:  hadoop, storm
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-28%)
Mutual labels:  hadoop, storm
hadoop-docker-lite
Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-4%)
Mutual labels:  hadoop, storm
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+1852%)
Mutual labels:  hadoop
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (+2480%)
Mutual labels:  hadoop
School Of Sre
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+20464%)
Mutual labels:  hadoop
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (+2116%)
Mutual labels:  storm
Winutils
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+2528%)
Mutual labels:  hadoop
Gis Tools For Hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+1840%)
Mutual labels:  hadoop
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-80%)
Mutual labels:  hadoop
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+47936%)
Mutual labels:  hadoop
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+1724%)
Mutual labels:  storm
Floating Elephants
Docker containers for Hadoop.
Stars: ✭ 19 (-24%)
Mutual labels:  hadoop
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+3168%)
Mutual labels:  hadoop
Javapdf
🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (+2336%)
Mutual labels:  hadoop

StormTweetsSentimentD3Viz


You might also be interested in checking out my other project, an extension of this repo for Twitter sentiment of counties / regions of UK using D3.js Choropleth Map on StormTweetsSentimentD3UKViz.

Introduction

This repository contains an application which is built to demonstrate as an example of Apache Storm distributed framework by performing sentiment analysis of tweets originating from U.S. in real-time. This Topology retrieves tweets originating from US and computes and visualizes the sentiment scores of each of the state of United States [based on tweets] in a Choropleth Map using D3.js continuously for 10 minutes [in local mode]. User can also explicitly kill the topology by pressing Ctrl+C for exiting the application. Also, there is a column chart visualization of each State and its sentiment value using Highcharts.

Apache Storm is an open source distributed real-time computation system, developed at BackType by Nathan Marz and team. It has been open sourced by Twitter [post BackType acquisition] in August, 2011. And Storm became a top level project in Apache on 29th September, 2014.
This application has been developed and tested with Storm v0.8.2 on Windows 7 in local mode; and was eventually updated and tested with Storm v0.9.3 on 05th January, 2014. Application may or may not work with earlier or later versions than Storm v0.9.3.

This application has been tested in:

  • Local mode on a CentOS virtual machine and even on Microsoft Windows 7 machine.
  • Cluster mode on a private cluster and also on Amazon EC2 environment of 4 machines and 5 machines respectively; with all the machines in private cluster running Ubuntu while EC2 environment machines were powered by CentOS.
    • Recent update to Apache Storm v0.9.3 has not been tested in a Cluster mode.

Features

  • Application retrieves tweets using Twitter Streaming API (using Twitter4J).
  • It analyses sentiments of all the tweets originating from US.
  • There are three different objects within a tweet that we can use to determine it’s origin. This application tries to find the location using all the three options and prioritizes location received in the following order [high to low]:
    • The coordinates object.
    • The place object.
    • The user object.
  • For reverse geocoding, this application uses Bing Maps API.
    • For more information and sign up, please check Getting Started with Bing Maps.
    • Please note that you would need Windows Live account for signing up for Bing Maps API key.
    • Also, please consider opting for Basic Plan for Bing Maps API, as that is better for our usage. As of 18th June, 2013, limit is 50k requests for 24 hours in Basic Plan.
    • I chose Bing Maps and not Google Maps since Google Maps is too restrictive for our usage, as it has a limit of only 2500 requests per day.
  • This application uses AFINN which contains a list of pre-computed sentiment scores.
    • These words are used to determine sentiment of the each tweet which is retrieved using Streaming API.
  • By understanding sentiment values, we can get the most happiest state of US and most unhappiest state as well.
  • For visualization, I am using D3 to display the sentiment value of each state in real-time by conveying it in a color, appropriate to the sentiment value. Color of the State moves from red to green, as its corresponding sentiment value improves.
  • There is another visualization context using Highcharts, which is a column chart of each State and its corresponding sentiment value. This chart also updates in real-time based on the sentiment value of each of the state.
  • This codebase has been updated with decent comments, wherever required.
  • Also this project has been made compatible with both Eclipse IDE and IntelliJ IDEA. Import the project in your favorite IDE [which has Maven plugin installed] and you can quickly follow the code.

Demo

D3 Choropleth Visualization

GIF of D3 Choropleth Visualization

GIF of D3 Visualization

Screenshot of D3 Choropleth Visualization

Screenshot of D3 Visualization

Highcharts Visualization

GIF of Highcharts Visualization

GIF of Highcharts Visualization

Screenshot of Highcharts Visualization

Screenshot of Highcharts Visualization

Configuration

  • Please check the config.properties and add your own values and complete the integration of Twitter API to your application by looking at your values from Twitter Developer Page.
    • If you did not create a Twitter App before, then please create a new Twitter App where you will get all the required values of config.properties afresh and then populate them here without any mistake.
  • Also please add the value of Bing Maps API Key to config.properties, as that will be used for getting the reverse geocode location using Latitude and Longitude.
  • And finally please check [but do not modify] the AFINN-111.txt file to see the pre-computed sentiment scores of ~2500 words / phrases.

Dependencies

  • Storm v0.9.3
  • Jackson v1.9.13
  • Spring v4.0.3
  • Camel v2.13.0
  • ActiveMQ Camel v5.9.0
  • Twitter4J v4.0.2
  • Google Guava v18.0
  • Logback v1.1.2

Also, please check pom.xml for more information on the various other dependencies of the project.

Requirements

This project uses Maven to build and run the topology.
You need the following on your machine:

  • Oracle JDK >= 1.8.x
  • Apache Maven >= 3.2.3
  • Python v2.7.x installed on the machine for triggering the visualization server-side hosting.
  • Clone this repo and import as an existing Maven project to either Eclipse IDE or IntelliJ IDEA.
  • This application uses Google Guava for making life simple while using Collections and other generic stuff.
  • This application also uses Jackson for unmarshalling the JSON response got from Bing Maps API.
  • Requires ZooKeeper, etc installed and configured in case of executing this project in distributed mode i.e. Storm Cluster.
    • Follow the steps on Storm Wiki for more details on setting up a Storm Cluster.

Rest of the required frameworks and libraries are downloaded by Maven as required in the build process, the first time the Maven build is invoked.

Usage

To build and run this topology, you must use Java 1.8.

Local Mode:

  • All the required frameworks and libraries are downloaded by Maven as required.
  • Local mode can also be run on Windows environment without installing any specific software or framework as such.
    Note: Please be sure to clear your temp folder as it adds lot of temporary files in every run.
  • In local mode, this application can be run from command line by invoking:

Either

mvn clean compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology

or

mvn clean compile package && java -jar target/storm-sentiment-viz-0.1-jar-with-dependencies.jar
  • Start Python SimpleHTTPServer in the web folder of this code repo.

Command:

python -m SimpleHTTPServer
  • For D3 Choropleth Map visualization, launch a browser [preferably Google Chrome] and point to index.html hosted on the above Python server.
    • Click on "Start Viz" button to trigger the D3 Choropleth Map visualization.
    • You can stop the visualization any time by clicking on "Stop Viz" button.
    • This Map updates as and when a tweet is analyzed by Storm and displays in real-time, visualization of the sentiment value of each of the State of United States of America.
  • For Highcharts visualization, launch a browser [preferably Google Chrome] and point to US__HighchartsViz.html
    • Click on "Start Viz" button to trigger the Highcharts visualization.
    • You can stop the visualization as well by clicking on "Stop Viz" button.
    • This chart updates every second and displays in real-time, visualization of the sentiment value of each of the State of United States of America.

Distributed [or Cluster / Production] Mode:

Distributed mode requires a complete and proper Storm Cluster setup. Please check wiki on Apache Storm website for setting up a Storm Cluster.
In distributed mode, after starting Nimbus and Supervisors on individual machines, this application can be executed on the master [or Nimbus] machine by invoking the following on the command line:

Command:

storm jar target/storm-sentiment-viz-0.1.jar org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology SentimentAnalysis

Problems

If you find any issues, please report them either raising an issue here on GitHub or alert me on my Twitter handle @P7h. Or even better, please send a pull request. Appreciate your help. Thanks!

License

Copyright © 2013-2015 Prashanth Babu.
Licensed under the Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].