All Projects → GoogleCloudPlatform → dataproc-pubsub-spark-streaming

GoogleCloudPlatform / dataproc-pubsub-spark-streaming

Licence: Apache-2.0 license
No description, website, or topics provided.

Programming Languages

scala
5932 projects
python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language

In this tutorial you learn how to deploy an Apache Spark streaming application on Cloud Dataproc and process messages from Cloud Pub/Sub in near real-time. The system you build in this scenario generates thousands of random tweets, identifies trending hashtags over a sliding window, saves results in Cloud Datastore, and displays the results on a web page.

Please refer to the related article for all the steps to follow in this tutorial:

https://cloud.google.com/solutions/using-apache-spark-dstreams-with-dataproc-and-pubsub

Contents of this repository:

  • http_function: Javascript code for the HTTP function deployed on Cloud Functions.
  • spark: Scala code for the Apache Spark streaming application.
  • tweet-generator: Python code for the randomized tweet generator.

Running the tests

To run the tests:

cd spark
mvn test
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].