All Projects → kaantas → kafka-twitter-spark-streaming

kaantas / kafka-twitter-spark-streaming

Licence: other
Counting Tweets Per User in Real-Time

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kafka-twitter-spark-streaming

spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+44.74%)
Mutual labels:  twitter-api, pyspark, apache-kafka
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Stars: ✭ 42 (+10.53%)
Mutual labels:  pyspark, spark-streaming
TinyFlowerBeds
Educational bot that posts a tiny flower bed on Twitter every few hours. Check it out if you're new to Python and open source!
Stars: ✭ 12 (-68.42%)
Mutual labels:  twitter-api, tweepy
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+468.42%)
Mutual labels:  pyspark, spark-streaming
discord-twitter-webhooks
🤖 Stream tweets to Discord
Stars: ✭ 47 (+23.68%)
Mutual labels:  twitter-api, tweepy
TwitterAutoReplyBot
This is a tiny Python script that replies to a specified number of tweets containing a specified hashtag.
Stars: ✭ 33 (-13.16%)
Mutual labels:  twitter-api, tweepy
Pyspark Learning
Updated repository
Stars: ✭ 147 (+286.84%)
Mutual labels:  pyspark, spark-streaming
TwitterPiBot
A Python based bot for Raspberry Pi that grabs tweets with a specific hashtag and reads them out loud.
Stars: ✭ 85 (+123.68%)
Mutual labels:  twitter-api, tweepy
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+365.79%)
Mutual labels:  twitter-api, spark-streaming
TwitterScraper
Scrape a User's Twitter data! Bypass the 3,200 tweet API limit for a User!
Stars: ✭ 80 (+110.53%)
Mutual labels:  twitter-api, tweepy
twitter-stream-rs
A Rust library for listening on Twitter Streaming API.
Stars: ✭ 66 (+73.68%)
Mutual labels:  twitter-api
fdp-modelserver
An umbrella project for multiple implementations of model serving
Stars: ✭ 47 (+23.68%)
Mutual labels:  spark-streaming
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+121.05%)
Mutual labels:  pyspark
Spark ALS
基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
Stars: ✭ 89 (+134.21%)
Mutual labels:  spark-streaming
twitter api
A Dart wrapper for the Twitter API v1.1
Stars: ✭ 56 (+47.37%)
Mutual labels:  twitter-api
kafkaSaur
Apache Kafka client for Deno
Stars: ✭ 42 (+10.53%)
Mutual labels:  apache-kafka
jgit-spark-connector
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+86.84%)
Mutual labels:  pyspark
tweet png
A flutter app to generate beautiful, high-quality screenshots of tweets from twitter.
Stars: ✭ 51 (+34.21%)
Mutual labels:  twitter-api
terraform-provider-twitter
No description or website provided.
Stars: ✭ 24 (-36.84%)
Mutual labels:  twitter-api
kafkaer
Template based Kafka topic/cluster/ACL management
Stars: ✭ 37 (-2.63%)
Mutual labels:  apache-kafka

Twitter and Spark Streaming with Apache Kafka

This project counts tweets that include #GoTS7 hashtag per user in real-time.
Also, username and tweet counts are printed.

Code Explanation

  1. Authentication operations were completed with Tweepy module of Python.
  2. StreamListener named KafkaPushListener was create for Twitter Streaming. StreamListener produces data for Kafka Consumer.
  3. Producing data was filtered about including Game of Thrones hashtag.
  4. SparkContext was created to connect Spark Cluster.
  5. Kafka Consumer that consumes data from 'twitter' topic was created.
  6. Calculated how many tweets include #GotS7 hashtag per user and printed usernames and counts in real-time.

Running

  1. Create Twitter API account and get keys for twitter_config.py
  2. Start Apache Kafka
./kafka/kafka_2.11-0.11.0.0/bin/kafka-server-start.sh ./kafka/kafka_2.11-0.11.0.0/config/server.properties
  1. Run kafka_push_listener.py with Python version 3.
PYSPARK_PYTHON=python3 bin/spark-submit kafka_push_listener.py
  1. Run kafka_twitter_spark_streaming.py with Python version 3.
PYSPARK_PYTHON=python3 bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 kafka_twitter_spark_streaming.py
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].