All Projects → kaantas → spark-twitter-sentiment-analysis

kaantas / spark-twitter-sentiment-analysis

Licence: other
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to spark-twitter-sentiment-analysis

kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Stars: ✭ 38 (-30.91%)
Mutual labels:  twitter-api, pyspark, apache-kafka
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-29.09%)
Mutual labels:  apache-spark, pyspark, spark-sql
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-58.18%)
Mutual labels:  apache-spark, spark-sql, spark-structured-streaming
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+5.45%)
Mutual labels:  sentiment-analysis, twitter-api
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-74.55%)
Mutual labels:  apache-spark, pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+52.73%)
Mutual labels:  apache-spark, pyspark
Awesome Kafka
A list about Apache Kafka
Stars: ✭ 397 (+621.82%)
Mutual labels:  apache-spark, apache-kafka
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+78.18%)
Mutual labels:  apache-spark, pyspark
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+1829.09%)
Mutual labels:  apache-spark, pyspark
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+3029.09%)
Mutual labels:  apache-spark, spark-sql
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+200%)
Mutual labels:  apache-spark, pyspark
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (-67.27%)
Mutual labels:  apache-spark, apache-kafka
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+1223.64%)
Mutual labels:  apache-spark, apache-kafka
SA-DL
Sentiment Analysis with Deep Learning models. Implemented with Tensorflow and Keras.
Stars: ✭ 35 (-36.36%)
Mutual labels:  sentiment-analysis, twitter-sentiment-analysis
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+650.91%)
Mutual labels:  apache-spark, apache-kafka
Awesome Pulsar
A curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (+3.64%)
Mutual labels:  apache-spark, apache-kafka
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+172.73%)
Mutual labels:  apache-spark, pyspark
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+221.82%)
Mutual labels:  apache-spark, twitter-api
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-58.18%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+478.18%)
Mutual labels:  apache-spark, pyspark

Twitter Sentiment Analysis

This project is about Sentiment Analysis of a desired Twitter topic with Apache Spark Structured Streaming, Apache Kafka, Python and AFINN Module. You can learn sentiment status of a topic that is desired.

For example; you might be curious about Game of Thrones’s new episode and you might get someone’s opinions about this new episode previously. Answer can be NEGATIVE, NEUTRAL or POSITIVE according to opinions.

Code Explanation

  1. Authentication operations were completed with Tweepy module of Python. You must take keys from Twitter API.
  2. StreamListener named TweetListener was create for Twitter Streaming. StreamListener produces data for Kafka Topic named 'twitter'.
  3. StreamListener also calculates Tweets' sentiment value with AFINN module and sends this value to 'twitter' topic.
  4. Producing data was filtered about including desired topic.
  5. Kafka Consumer that consumes data from 'twitter' topic was created.
  6. Also, it converts streaming data to structured data. This structured data is placed into a SQL table named 'data'.
  7. Data table has 2 columns named 'text' and 'senti_val'.
  8. Average of sentiment values of senti_val column is calculated by pyspark.sql.functions.
  9. Also, user defined function named fun is created for status column.
  10. Status column has POSITIVE, NEUTRAL or NEGATIVE that change according to avg(senti_val) column in real-time.

Running

  1. Create Twitter API account and get keys for twitter_config.py
  2. Start Apache Kafka
/bin/kafka-server-start.sh /config/server.properties
  1. Run tweet_listener.py with Python version 3 and desired topic name.
PYSPARK_PYTHON=python3 bin/spark-submit tweet_listener.py "Game of Thrones"
  1. Run twitter_topic_avg_sentiment_val.py with Python version 3.
PYSPARK_PYTHON=python3 bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.1 twitter_topic_avg_sentiment_val.py
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].