All Projects → kasna-cloud → dataflow-fsi-example

kasna-cloud / dataflow-fsi-example

Licence: MIT License
Using Google Cloud, this project is an example of how to detect anomalies in financial, technical indicators by modeling their expected distribution and thus inform when the Relative Strength Indicator (RSI) is unreliable.

Programming Languages

Jupyter Notebook
11667 projects
java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to dataflow-fsi-example

terraform-splunk-log-export
Deploy Google Cloud log export to Splunk using Terraform
Stars: ✭ 26 (+0%)
Mutual labels:  gcp, dataflow
bigflow
A Python framework for data processing on GCP.
Stars: ✭ 96 (+269.23%)
Mutual labels:  gcp, dataflow
terraform-dcos
DC/OS Terraform Installation and Upgrading Scripts
Stars: ✭ 64 (+146.15%)
Mutual labels:  gcp
secrets-init
minimalistic init system for containers with AWS/GCP secrets support
Stars: ✭ 114 (+338.46%)
Mutual labels:  gcp
restme
Template to bootstrap a fully functional, multi-region, REST service on GCP with a developer release pipeline.
Stars: ✭ 19 (-26.92%)
Mutual labels:  gcp
gke-managed-certificates-demo
GKE ingress with GCP managed certificates
Stars: ✭ 21 (-19.23%)
Mutual labels:  gcp
steampipe-plugin-gcp
Use SQL to instantly query GCP resources across regions, projects and organizations. Open source CLI. No DB required.
Stars: ✭ 12 (-53.85%)
Mutual labels:  gcp
alpine-firestore-backup
Image that performs Firestore backups based on Alpine Google Cloud SDK image.
Stars: ✭ 23 (-11.54%)
Mutual labels:  gcp
gcp-iap-auth
A simple server implementation and package in Go for helping you secure your web apps running on GCP behind a Cloud IAP (Identity-Aware Proxy)
Stars: ✭ 75 (+188.46%)
Mutual labels:  gcp
PDF-Bot
A bot for PDF for doing Many Things....
Stars: ✭ 38 (+46.15%)
Mutual labels:  gcp
warp
WARP one-click script. Add an IPv4, IPv6 or dual-stack CloudFlare WARP network interface and Socks5 proxy for VPS. 一键脚本
Stars: ✭ 950 (+3553.85%)
Mutual labels:  gcp
opal
Policy and data administration, distribution, and real-time updates on top of Open Policy Agent
Stars: ✭ 459 (+1665.38%)
Mutual labels:  gcp
cloud-service-broker
OSBAPI service broker that uses Terraform to provision and bind services. Derived from https://github.com/GoogleCloudPlatform/gcp-service-broker
Stars: ✭ 54 (+107.69%)
Mutual labels:  gcp
mlops-with-vertex-ai
An end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI
Stars: ✭ 155 (+496.15%)
Mutual labels:  gcp
cocoon-demo
Cocoon – a flow-based workflow automation, data mining and visual analytics tool.
Stars: ✭ 19 (-26.92%)
Mutual labels:  dataflow
resoto
Resoto - Find leaky resources, manage quota limits, detect drift, and clean up!
Stars: ✭ 562 (+2061.54%)
Mutual labels:  gcp
shamash
Autoscaling for Google Cloud Dataproc
Stars: ✭ 31 (+19.23%)
Mutual labels:  gcp
algorithmic-market-prediction
Implementation of Algorthmic Prediction of Candle Patterns
Stars: ✭ 21 (-19.23%)
Mutual labels:  rsi
end-to-end-machine-learning-with-google-cloud
End to End Machine Learning with Google Cloud Platform
Stars: ✭ 39 (+50%)
Mutual labels:  dataflow
gcp-class-1
Google Cloud class 1
Stars: ✭ 14 (-46.15%)
Mutual labels:  gcp

Dataflow Financial Services Time-Series Example

This project is an example of how to detect anomalies in financial, technical indicators by modeling their expected distribution and thus inform when the Relative Strength Indicator (RSI) is unreliable. RSI is a popular indicator for traders of financial assets, and it can be helpful to understand when it is reliable or not. This example will show how to implement a RSI model using realistic foreign exchange market data, Google Cloud Platform and the Dataflow time-series sample library.

Dashboards

The Dataflow samples library is a fast, flexible library for processing time-series data -- particularly for financial market data due to its large volume. Its ability to generate useful metrics in real-time significantly reduces the time and effort to build machine learning models and solve problems in the finance domain. This library is used in the metrics generator component of this example and detailed information on it's usage can be found in docs.

The GCP infrastructure used in this example includes Dataflow, Pub/Sub, BigQuery, Kubernetes Engine, and AI Platform. Further information on components, flows and diagrams can be found in the docs directory.

A great place to start is to run this example in GCP and view the excellent blog for a detailed walk-through of the solution.

Quickstart

Run from laptop

To install:

  1. Create a new project in GCP
  2. Install gcloud and set PROJECT_ID
  3. Execute this script to create base infrastructure. This will take about 5-10mins
    ./deploy-infra.sh
  4. After this has completed, deploy the pipelines and model by executing the run-app script. This will take about 5mins
    ./run-app.sh
  5. View the grafana dashboard. The username and password is your PROJECT_ID and the location is found in the Cloud Console and output in the build log.

Run on Cloud Shell

You can also run this example using Cloud Shell. To begin, login to the GCP console and select the “Activate Cloud Shell” icon in the top right of your project dashboard. Then run the following:

  1. Clone the repo:
    git clone https://github.com/kasna-cloud/dataflow-fsi-example.git && cd dataflow-fsi-example
  2. Execute this script to create base infrastructure. This will take about 5-10mins
    ./deploy-infra.sh
  3. After this has completed, deploy the pipelines and model by executing the run-app script. This will take about 5mins
    ./run-app.sh
  4. View the grafana dashboard. The username and password is your PROJECT_ID and the location is found in the Cloud Console and output in the build log.

Problem Domain

The Relative Strength Index, or RSI, is a popular financial technical indicator that measures the magnitude of recent price changes to evaluate whether an asset is currently overbought or oversold.

To detect when RSI is reliable or not for a given asset, the modelling approach is as follows. We train an anomaly detection model to learn the expected behaviour of metrics describing the asset when RSI is greater than 70 or RSI is less than 30. When an anomaly is detected, the model is informing that these input metrics are behaving differently to how they usually behave when RSI is greater than 70 or RSI is less than 30. And so in these instances, RSI is not reliable and a trade is not advised. If no anomaly is detected, then the metrics are behaving as expected, so you can trust RSI and make a trade. NOTE:

This blog contains general advice only. It was prepared without taking into account your objectives, financial situation, or needs. You should speak to a financial planner before making a financial decision, and you should speak to a licensed ML practitioner before making an ML decision.

A deep-dive on the problem domain, data science and model creation are in Jupyter notebooks which you can run yourself, or view right here on github:

Be sure to view the blog for a detailed walk-through of the solution.

Repo Layout

This repo is organised into folders containing logical functions of the example. A brief description of these are below:

  • app
    • app/bootstrap_models This is the LSTM TFX model pre-populated with the RSI example so that dashboards can immediately render RSI values. During the run-app.sh deployment of components, this model will be uploaded into GCS and a new Cloud Machine Learning model version will be created for the inference pipeline to use. This model is then updated by the re-training data pipeline.
    • app/grafana Contains visualization configuration used in the grafana dashboards.
    • app/java This directory holds the Dataflow pipeline code using the Dataflow samples library. The pipeline creates metrics from the prices stream.
    • app/kubernetes Directory of deployment manifests for starting the Dataflow pipelines, prices generator and retraining job.
    • app/python This directory contains a containerized python program for:
      • inference and retraining pipelines
      • pubsub to bigquery pipeline
      • forex generator to create realistic prices
  • docs This folder contains further example information and diagrams
  • infra Contains the cloudbuild and terraform code to deploy this example GCP infrastructure.
  • notebooks This folder has detailed AI Notebooks which step through the RSI use case from a Data Science perspective.

Further information is available in the directory READMEs and the docs directory.

Components

This example can be thought of in two distinct, logical functions. One for real-time ingestion of prices and determination of RSI presence, and another for the re-training of the model to improve prediction.

The logical diagram for the real-time and training in GCP components is here:

Logical diagram

A detailed list of the components and data flows can be found in the FLOWS doc.

Storage Components

  • Three PubSub Topics:
    • prices
    • metrics
    • reconerr
  • One BigQuery Dataset with 3 Tables, schema defined in table_schemas:
    • prices
    • metrics
    • reconerr
  • One AI Platform Model
  • One Cloud SQL Database for ML Metadata

Compute Components

Deployment

This repo uses java, python, cloudbuild, terraform and other technologies which require configuration. For this example we have chosen to store all configuration values in the config.sh file. You can change any values in this file to modfiy the behaviour or deployment of the example.

This example is designed to be run in a fresh GCP project and requires at least Owner privileges to the project. All further IAM permissions are set by Cloud Build or Terraform.

Deployment of this example is done in two steps:

  1. infrastructure into GCP by CloudBuild and terraform
  2. application and pipeline deployment using CloudBuild

Both of these CloudBuild steps can be triggered using the deploy-infra.sh and run-app.sh scripts and require only a gcloud Google Cloud SDK to be installed locally.

To install this example repo into your Google Cloud project, follow the instructions in the Quickstart section. If needed, this example can be run using GCP Cloud Shell.

Further information is available in the app and infra directories.

License

This code is licensed under the terms of the MIT license and is available for free.

Links

This repo has been built with the support of Google, Kasna and Eliiza. Links the relevant doco, libraries and resources are below:

Contributing

The excellent contributors to this repo are listed in the AUTHORS file and in the git history. If you would like to contribute please see the CODE-OF-CONDUCT and CONTRIBUTING info.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].