Programming Languages

scala

5932 projects

shell

77523 projects

python

139335 projects - #7 most used programming language

Dockerfile

14818 projects

Makefile

30231 projects

Labels

spark graphsense

GraphSense Transformation Pipeline

The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.

Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.

This component is implemented in Scala using Apache Spark.

Local Development Environment Setup

Prerequisites

Make sure Java 8 and sbt >= 1.0 is installed:

java -version
sbt about

Download, install, and run Apache Spark (version 3.2.1) in $SPARK_HOME:

$SPARK_HOME/sbin/start-master.sh

Download, install, and run Apache Cassandra (version >= 3.11) in $CASSANDRA_HOME

$CASSANDRA_HOME/bin/cassandra -f

Ingest Raw Block Data

Run the following script for ingesting raw block test data

./scripts/ingest_test_data.sh

This should create a keyspace btc_raw (tables exchange_rates, transaction, block, block_transactions). Check as follows

cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;

Execute Transformation Locally

Create the target keyspace for transformed data

cqlsh -f scripts/schema_transformed.cql

Compile and test the implementation

sbt test

Package the transformation pipeline

sbt package

Run the transformation pipeline on localhost

./submit.sh

macOS only: make sure gnu-getopt is installed (brew install gnu-getopt).

Check the running job using the local Spark UI at http://localhost:4040/jobs

Submit on a standalone Spark Cluster

Use the submit.sh script and specify the Spark master node (e.g., -s spark://SPARK_MASTER_IP:7077) and other options:

./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
                 [--currency CURRENCY]
                 [--raw_keyspace RAW_KEYSPACE]
                 [--tgt_keyspace TGT_KEYSPACE]
                 [--bucket_size BUCKET_SIZE]
                 [--bech32-prefix BECH32_PREFIX]
                 [--checkpoint-dir CHECKPOINT_DIR]
                 [--coinjoin-filtering]

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

graphsense / graphsense-transformation