GraphSense Transformation Pipeline
The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.
Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.
This component is implemented in Scala using Apache Spark.
Local Development Environment Setup
Prerequisites
Make sure Java 8 and sbt >= 1.0 is installed:
java -version
sbt about
Download, install, and run Apache Spark (version 3.2.1)
in $SPARK_HOME
:
$SPARK_HOME/sbin/start-master.sh
Download, install, and run Apache Cassandra
(version >= 3.11) in $CASSANDRA_HOME
$CASSANDRA_HOME/bin/cassandra -f
Ingest Raw Block Data
Run the following script for ingesting raw block test data
./scripts/ingest_test_data.sh
This should create a keyspace btc_raw
(tables exchange_rates
,
transaction
, block
, block_transactions
). Check as follows
cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;
Execute Transformation Locally
Create the target keyspace for transformed data
cqlsh -f scripts/schema_transformed.cql
Compile and test the implementation
sbt test
Package the transformation pipeline
sbt package
Run the transformation pipeline on localhost
./submit.sh
macOS only: make sure gnu-getopt
is installed (brew install gnu-getopt
).
Check the running job using the local Spark UI at http://localhost:4040/jobs
Submit on a standalone Spark Cluster
Use the submit.sh
script and specify the Spark master node
(e.g., -s spark://SPARK_MASTER_IP:7077
) and other options:
./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
[--currency CURRENCY]
[--raw_keyspace RAW_KEYSPACE]
[--tgt_keyspace TGT_KEYSPACE]
[--bucket_size BUCKET_SIZE]
[--bech32-prefix BECH32_PREFIX]
[--checkpoint-dir CHECKPOINT_DIR]
[--coinjoin-filtering]