All Projects → jeremylorino → gcp-dataprep-bigquery-twitter-stream

jeremylorino / gcp-dataprep-bigquery-twitter-stream

Licence: MIT license
Stream Twitter Data into BigQuery with Cloud Dataprep

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to gcp-dataprep-bigquery-twitter-stream

blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+171.43%)
Mutual labels:  google-cloud-platform, google-bigquery, google-dataflow
retail-demo
Google Cloud Dataflow Demo Application. デモ用アプリのため更新(依存関係の更新・脆弱性対応)は行っていません。参考にされる方はご注意ください。
Stars: ✭ 12 (-42.86%)
Mutual labels:  google-cloud-dataflow, google-cloud-platform
Laravel Google Cloud Storage
A Google Cloud Storage filesystem for Laravel
Stars: ✭ 415 (+1876.19%)
Mutual labels:  google-cloud-storage, google-cloud-platform
Professional Services
Common solutions and tools developed by Google Cloud's Professional Services team
Stars: ✭ 1,923 (+9057.14%)
Mutual labels:  google-cloud-dataflow, google-cloud-platform
Dataflow Tutorial
Cloud Dataflow Tutorial for Beginners
Stars: ✭ 17 (-19.05%)
Mutual labels:  google-cloud-storage, google-cloud-platform
Crmint
Reliable data integration & processing for advertisers
Stars: ✭ 106 (+404.76%)
Mutual labels:  google-cloud-storage, google-cloud-platform
Functions Samples
Collection of sample apps showcasing popular use cases using Cloud Functions for Firebase
Stars: ✭ 10,576 (+50261.9%)
Mutual labels:  google-cloud-storage, google-cloud-platform
Google Cloud Cpp
C++ Client Libraries for Google Cloud Services
Stars: ✭ 233 (+1009.52%)
Mutual labels:  google-cloud-storage, google-cloud-platform
Flysystem Google Cloud Storage
Flysystem Adapter for Google Cloud Storage
Stars: ✭ 237 (+1028.57%)
Mutual labels:  google-cloud-storage, google-cloud-platform
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-23.81%)
Mutual labels:  google-cloud-storage
parse-server-gcs-adapter
parse-server adapter for Google Cloud Storage
Stars: ✭ 26 (+23.81%)
Mutual labels:  google-cloud-storage
polynimbus
Multi-cloud infrastructure inventory and management tool, supporting AWS, Google Cloud, Azure, Oracle Cloud, Rackspace Cloud, Hetzner Cloud, Alibaba Cloud, e24cloud.com, Linode, Cloudflare, GoDaddy and Backblaze B2.
Stars: ✭ 70 (+233.33%)
Mutual labels:  google-cloud-platform
iris
Automatically tag Google Cloud resources for better manageability and billing reporting
Stars: ✭ 50 (+138.1%)
Mutual labels:  google-cloud-platform
artefactory-connectors-kit
ACK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.
Stars: ✭ 34 (+61.9%)
Mutual labels:  google-cloud-storage
terraform-splunk-log-export
Deploy Google Cloud log export to Splunk using Terraform
Stars: ✭ 26 (+23.81%)
Mutual labels:  google-cloud-platform
appengine-java-standard
Google App Engine Standard Java runtime: Prod runtime, local devappserver, Cloud SDK Java components, GAE APIs, and GAE API emulators.
Stars: ✭ 141 (+571.43%)
Mutual labels:  google-cloud-platform
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (+38.1%)
Mutual labels:  twitter-streaming-api
gcp-get-secret
A simple command line utility to get secrets from the Google Secret Manager into your environment
Stars: ✭ 35 (+66.67%)
Mutual labels:  google-cloud-platform
zip-bucket
zips files in a Google Cloud Storage [tm] bucket
Stars: ✭ 32 (+52.38%)
Mutual labels:  google-cloud-storage
botkit-storage-datastore
Google Cloud Datastore storage module for Botkit
Stars: ✭ 13 (-38.1%)
Mutual labels:  google-cloud-platform

Twitter for BigQuery

This sample code will help you streaming Twitter data into BigQuery, and running simple visualizations. This sample also generates the queries you can run directly in the BigQuery interface, or extend for your applications.

Additionally, you can use other public or private datasets in BigQuery to do additional joins and develop other insights/correlations.

Requirements

Setup & Configuration

To work with Google Cloud and BigQuery, follow the below instructions to create a new project, service account and get your PEM file.

  • Go to http://console.developers.google.com
  • Click on "Create Project"
  • Open the project dashboard by clicking on the new project
  • Open "APIs & auth->Credentials"
  • Click on "Create new Client ID", "Service account" and "Create Client ID"
  • Note your Service Account email (Under "EMAIL ADDRESS")
  • Generate and store your JSON key (Or save from auto-download)

Loading Twitter data into BigQuery from your local machine

As a pre-requisite for setting up BigQuery, you need to first set up a billing account. To do so:

  • Go to https://console.developers.google.com/billing and add a credit card
  • Back in your project view, click on the gear icon in the top-right and then "Project billing settings"
  • Ensure your project is associated with a billing account

The enclosed sample includes a simple file to stream Tweets into Google Cloud Storage.

  • Go to http://console.developers.google.com
  • Go to your project
  • In the left-hand side, click on "Big Data->BigQuery" to open the BigQuery console
  • Click on the down arrow by the project, select "Create new dataset" and enter "twitter"
  • Run npm install then npm start to begin loading data from your local machine

When developing on top of the Twitter platform, you must abide by the Developer Agreement & Policy.

Most notably, you must respect the section entitled "Maintain the Integrity of Twitter's Products", including removing all relevant Content with regard to unfavorites, deletes and other user actions.

The schema

Sample queries

To help you get started, below are some sample queries.

Text search

Querying for tweets contain a specific word or phrase.

SELECT text FROM [twitter.tweets] WHERE text CONTAINS ' something ' LIMIT 10
#Hashtag search

Searching for specific hashtags.

SELECT entities.hashtags.text, HOUR(TIMESTAMP(created_at)) AS create_hour, count(*) as count FROM [twitter.tweets] WHERE LOWER(entities.hashtags.text) in ('John', 'Paul', 'George', 'Ringo') GROUP by create_hour, entities.hashtags.text ORDER BY entities.hashtags.text ASC, create_hour ASC
Tweet source

Listing the most popular Twitter applications.

SELECT source, count(*) as count FROM [twitter.tweets] GROUP by source ORDER BY count DESC LIMIT 1000
Media/URLs shared

Finding the most popular content shared on Twitter.

SELECT text, entities.urls.url FROM [twitter.tweets] WHERE entities.urls.url IS NOT NULL LIMIT 10
User activity

Users that tweet the most.

SELECT user.screen_name, count(*) as count FROM [twitter.tweets] GROUP BY user.screen_name ORDER BY count DESC LIMIT 10

To learn more about querying, go to https://cloud.google.com/bigquery/query-reference

Going further

Using BigQuery allows you to combine Twitter data with other public sources of information. Here are some ideas to inspire your next project:

  • Perform and store sentiment analysis on tweet text for worldwide sentiment
  • Cross reference Twitter data to other public data sets

You can also visit http://demo.redash.io/ to perform queries and visualizations against publicly available data sources.

Additional reading

The following documents serve as additional information on streaming data from Twitter and working with BigQuery.

Credits

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].