All Projects → graphsense → graphsense-transformation

graphsense / graphsense-transformation

Licence: MIT license
GraphSense Transformation Pipeline

Programming Languages

scala
5932 projects
shell
77523 projects
python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects

sbt test License: MIT

GraphSense Transformation Pipeline

The GraphSense Transformation Pipeline reads raw block data, which is ingested into Apache Cassandra by the graphsense-blocksci / graphsense-bitcoin-etl component. The transformation pipeline computes de-normalized views using Apache Spark, which are again stored in Cassandra.

Access to computed de-normalized views is subsequently provided by the GraphSense REST interface, which is used by the graphsense-dashboard component.

This component is implemented in Scala using Apache Spark.

Local Development Environment Setup

Prerequisites

Make sure Java 8 and sbt >= 1.0 is installed:

java -version
sbt about

Download, install, and run Apache Spark (version 3.2.1) in $SPARK_HOME:

$SPARK_HOME/sbin/start-master.sh

Download, install, and run Apache Cassandra (version >= 3.11) in $CASSANDRA_HOME

$CASSANDRA_HOME/bin/cassandra -f

Ingest Raw Block Data

Run the following script for ingesting raw block test data

./scripts/ingest_test_data.sh

This should create a keyspace btc_raw (tables exchange_rates, transaction, block, block_transactions). Check as follows

cqlsh localhost
cqlsh> USE btc_raw;
cqlsh:btc_raw> DESCRIBE tables;

Execute Transformation Locally

Create the target keyspace for transformed data

cqlsh -f scripts/schema_transformed.cql

Compile and test the implementation

sbt test

Package the transformation pipeline

sbt package

Run the transformation pipeline on localhost

./submit.sh

macOS only: make sure gnu-getopt is installed (brew install gnu-getopt).

Check the running job using the local Spark UI at http://localhost:4040/jobs

Submit on a standalone Spark Cluster

Use the submit.sh script and specify the Spark master node (e.g., -s spark://SPARK_MASTER_IP:7077) and other options:

./submit.sh -h
Usage: submit.sh [-h] [-m MEMORY_GB] [-c CASSANDRA_HOST] [-s SPARK_MASTER]
                 [--currency CURRENCY]
                 [--raw_keyspace RAW_KEYSPACE]
                 [--tgt_keyspace TGT_KEYSPACE]
                 [--bucket_size BUCKET_SIZE]
                 [--bech32-prefix BECH32_PREFIX]
                 [--checkpoint-dir CHECKPOINT_DIR]
                 [--coinjoin-filtering]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].