All Projects → blockchain-etl → blockchain-etl-streaming

blockchain-etl / blockchain-etl-streaming

Licence: MIT license
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Programming Languages

python
139335 projects - #7 most used programming language
Mustache
554 projects
shell
77523 projects

Projects that are alternatives of or similar to blockchain-etl-streaming

awesome-bigquery-views
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Stars: ✭ 325 (+470.18%)
Mutual labels:  gcp, data-engineering, data-analytics, web3, google-cloud-platform, blockchain-analytics, on-chain-analysis
gcp-dataprep-bigquery-twitter-stream
Stream Twitter Data into BigQuery with Cloud Dataprep
Stars: ✭ 21 (-63.16%)
Mutual labels:  google-cloud-platform, google-bigquery, google-dataflow
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-7.02%)
Mutual labels:  etl, gcp, data-engineering
Gcp Data Engineer Exam
Study materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (+152.63%)
Mutual labels:  gcp, data-engineering, google-cloud-platform
iris3
An upgraded and improved version of the Iris automatic GCP-labeling project
Stars: ✭ 38 (-33.33%)
Mutual labels:  gcp, google-cloud-platform, google-pubsub
Benthos
Fancy stream processing made operationally mundane
Stars: ✭ 3,705 (+6400%)
Mutual labels:  etl, stream-processing, data-engineering
cloudenvoy
Cross-application messaging for Ruby and Rails using Google Cloud Pub/Sub
Stars: ✭ 31 (-45.61%)
Mutual labels:  google-cloud-platform, google-pubsub
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+389.47%)
Mutual labels:  etl, data-engineering
drf-angular-docker-tutorial
Dockerized Django Back-end API using DRF with Angular Front-end Tutorial
Stars: ✭ 53 (-7.02%)
Mutual labels:  gcp, google-cloud-platform
dagger
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
Stars: ✭ 238 (+317.54%)
Mutual labels:  stream-processing, real-time-analytics
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (-35.09%)
Mutual labels:  etl, data-engineering
gisjogja
GISJOGJA - aplikasi web based sistem informasi geografis (SIG) / GIS wisata kota JOGJA - www.firstplato.com
Stars: ✭ 17 (-70.18%)
Mutual labels:  gcp, google-cloud-platform
Everything-Tech
A collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+594.74%)
Mutual labels:  gcp, data-engineering
gcp auth
Minimal authentication library for Google Cloud Platform (GCP)
Stars: ✭ 42 (-26.32%)
Mutual labels:  gcp, google-cloud-platform
cloud-speech-and-vision-demos
A set of demo applications that make use of google speech, nlp and vision apis based in angular2
Stars: ✭ 35 (-38.6%)
Mutual labels:  gcp, google-cloud-platform
deploy-appengine
A GitHub Action that deploys source code to Google App Engine.
Stars: ✭ 184 (+222.81%)
Mutual labels:  gcp, google-cloud-platform
augle
Auth + Google = Augle
Stars: ✭ 22 (-61.4%)
Mutual labels:  gcp, google-cloud-platform
GoogleCloudLogging
Swift (Darwin) library for logging application events in Google Cloud.
Stars: ✭ 24 (-57.89%)
Mutual labels:  gcp, google-cloud-platform
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+35.09%)
Mutual labels:  etl, data-engineering
course-material
Course Material for in28minutes courses on Java, Spring Boot, DevOps, AWS, Google Cloud, and Azure.
Stars: ✭ 544 (+854.39%)
Mutual labels:  gcp, google-cloud-platform

Blockchain ETL Streaming

Streams the following Ethereum entities to Pub/Sub or Postgres using ethereum-etl stream:

  • blocks
  • transactions
  • logs
  • token_transfers
  • traces
  • contracts
  • tokens

Streams blocks and transactions to Pub/Sub using bitcoin-etl stream. Supported chains:

  • bitcoin
  • bitcoin_cash
  • dogecoin
  • litecoin
  • dash
  • zcash

Deployment Instructions

  1. Create a cluster:
gcloud container clusters create ethereum-etl-streaming \
--zone us-central1-a \
--num-nodes 1 \
--disk-size 10GB \
--machine-type custom-2-4096 \
--network default \
--subnetwork default \
--scopes pubsub,storage-rw,logging-write,monitoring-write,service-management,service-control,trace
  1. Get kubectl credentials:
gcloud container clusters get-credentials ethereum-etl-streaming \
--zone us-central1-a
  1. Create Pub/Sub topics (use create_pubsub_topics_ethereum.sh). Skip this step if you need to stream to Postgres.
  • "crypto_ethereum.blocks"
  • "crypto_ethereum.transactions"
  • "crypto_ethereum.token_transfers"
  • "crypto_ethereum.logs"
  • "crypto_ethereum.traces"
  • "crypto_ethereum.contracts"
  • "crypto_ethereum.tokens"
  1. Create GCS bucket. Upload a text file with block number you want to start streaming from to gs://<YOUR_BUCKET_HERE>/ethereum-etl/streaming/last_synced_block.txt.

  2. Create "ethereum-etl-app" service account with roles:

    • Pub/Sub Editor
    • Storage Object Admin
    • Cloud SQL Client

Download the key. Create a Kubernetes secret:

kubectl create secret generic streaming-app-key --from-file=key.json=$HOME/Downloads/key.json -n eth
  1. Install [helm] (https://github.com/helm/helm#install)
brew install helm
helm init  
bash patch-tiller.sh
  1. Copy example values directory to values dir and adjust all the files at least with your bucket and project ID.
  2. Install ETL apps via helm using chart from this repo and values we adjust on previous step, for example:
helm install --name btc --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/bitcoin/values.yaml
helm install --name bch --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/bitcoin_cash/values.yaml
helm install --name dash --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/dash/values.yaml
helm install --name dogecoin --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/dogecoin/values.yaml
helm install --name litecoin --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/litecoin/values.yaml
helm install --name zcash --namespace btc charts/blockchain-etl-streaming --values values/bitcoin/zcash/values.yaml

helm install --name eth-blocks --namespace eth charts/blockchain-etl-streaming \ 
--values values/ethereum/values.yaml --values values/ethereum/block_data/values.yaml
helm install --name eth-traces --namespace eth charts/blockchain-etl-streaming \ 
--values values/ethereum/values.yaml --values values/ethereum/trace_data/values.yaml 

Ethereum block and trace data streaming are decoupled for higher reliability.

To stream to Postgres:

helm install --name eth-postgres --namespace eth charts/blockchain-etl-streaming \ 
--values values/ethereum/values-postgres.yaml

Refer to https://github.com/blockchain-etl/ethereum-etl-postgres for table schema and initial data load.

  1. Use describe command to troubleshoot, f.e.:
kubectl describe pods -n btc
kubectl describe node [NODE_NAME]

Refer to blockchain-etl-dataflow for connecting Pub/Sub to BigQuery.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].