All Projects → znly → bazel-cache

znly / bazel-cache

Licence: Apache-2.0 license
Minimal cloud oriented Bazel gRPC cache

Programming Languages

go
31211 projects - #10 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bazel-cache

Buildbuddy
BuildBuddy is an open source Bazel build event viewer, result store, and remote cache.
Stars: ✭ 182 (+451.52%)
Mutual labels:  cache, grpc, bazel
Kubernetes Nexus
Run Sonatype Nexus Repository Manager OSS on top of Kubernetes (GKE). Includes instructions for automated backups (GCS) and day-to-day usage.
Stars: ✭ 122 (+269.7%)
Mutual labels:  google-cloud-storage, google-cloud
Berglas
A tool for managing secrets on Google Cloud
Stars: ✭ 959 (+2806.06%)
Mutual labels:  google-cloud-storage, google-cloud
Php Ffmpeg Video Streaming
📼 Package media content for online streaming(DASH and HLS) using FFmpeg
Stars: ✭ 246 (+645.45%)
Mutual labels:  google-cloud-storage, google-cloud
server
The ViUR application development framework - legacy version 2.x for Python 2.7
Stars: ✭ 12 (-63.64%)
Mutual labels:  google-cloud-storage, google-cloud
arc gcs
Provides an Arc backend for Google Cloud Storage
Stars: ✭ 48 (+45.45%)
Mutual labels:  google-cloud-storage, google-cloud
Flysystem Google Cloud Storage
Flysystem Adapter for Google Cloud Storage
Stars: ✭ 237 (+618.18%)
Mutual labels:  google-cloud-storage, google-cloud
Rules protobuf
Bazel rules for building protocol buffers and gRPC services (java, c++, go, ...)
Stars: ✭ 206 (+524.24%)
Mutual labels:  grpc, bazel
Drone Cache
A Drone plugin for caching current workspace files between builds to reduce your build times
Stars: ✭ 194 (+487.88%)
Mutual labels:  cache, google-cloud-storage
clj-gcloud-storage
Clojure wrapper for google-cloud-storage Java client.
Stars: ✭ 20 (-39.39%)
Mutual labels:  google-cloud-storage, google-cloud
Drachtio Freeswitch Modules
A collection of open-sourced freeswitch modules that I use in various drachtio applications
Stars: ✭ 73 (+121.21%)
Mutual labels:  google-cloud, grpc
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-51.52%)
Mutual labels:  google-cloud-storage, google-cloud
Grpc Gke Nlb Tutorial
gRPC load-balancing on GKE using Envoy
Stars: ✭ 42 (+27.27%)
Mutual labels:  google-cloud, grpc
Laravel Google Cloud Storage
A Google Cloud Storage filesystem for Laravel
Stars: ✭ 415 (+1157.58%)
Mutual labels:  google-cloud-storage, google-cloud
Microservices Demo
Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.
Stars: ✭ 11,369 (+34351.52%)
Mutual labels:  google-cloud, grpc
Google Cloud Cpp
C++ Client Libraries for Google Cloud Services
Stars: ✭ 233 (+606.06%)
Mutual labels:  google-cloud-storage, google-cloud
Colossus
Colossus — An example microservice architecture for Kubernetes using Bazel, Go, Java, Docker, Kubernetes, Minikube, Gazelle, gRPC, Prometheus, Grafana, and more
Stars: ✭ 917 (+2678.79%)
Mutual labels:  grpc, bazel
Trunk
Make bazel an out of box solution for C++/Java developers
Stars: ✭ 203 (+515.15%)
Mutual labels:  grpc, bazel
storage
Go package for abstracting local, in-memory, and remote (Google Cloud Storage/S3) filesystems
Stars: ✭ 49 (+48.48%)
Mutual labels:  cache, google-cloud-storage
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+242.42%)
Mutual labels:  google-cloud-storage, google-cloud

bazel-cache

znly/bazel-cache is a minimal, cloud oriented Bazel remote cache.

It only supports Bazel's v2 remote execution protocol over gRPC.

It is meant to be deployed serverless-ly (currently on Cloud Run) and backed by an object storage (currently Google Cloud Storage). Of course PRs for other platforms are welcomed. Thanks to Google Cloud Storage Object Lifecycle Management, it features automatic TTL-based garbage collection.

It was inspired by buchgr/bazel-remote's simplicity and accessibility.

Usage

To start the server, simply run:

$ bazel-cache serve --help
Starts the Bazel cache gRPC server

Usage:
  bazel-cache serve [flags]

Flags:
  -c, --cache string           cache uri
  -h, --help                   help for serve
  -p, --port string            listen address (default ":9092")
  -e, --port_from_env string   get listen port from an environment variable

Global Flags:
  -l, --loglevel zapcore.Level   Log Level (default info)

$ bazel-cache serve -c gcs://MYBUCKET?ttl_days=14
2021-02-28T21:33:41.932+0100	INFO	server/server.go:59	Listening	{"addr": "[::]:9092", "cache": "gcs://MYBUCKET?ttl_days=14"}

Currently only gcs:// and file:// URIs are supported.

There are a few options behind --help.

Bazel configuration

Simply add the following on the .bazelrc

build:cache             --remote_download_minimal
build:cache             --remote_cache=grpcs://MY-CLOUD-RUN-SERVICE.a.run.app

Copy the supplied tools/bazel to tools/bazel in the workspace. Then modify with the correct Cloud Run URL:

CACHE_URL = "https://MY-CLOUD-RUN-SERVICE.a.run.app"

Now, simply add --config=cache to the any bazel command.

Bazel gRPC remote caching protocol

Bazel implements remote caching over several protocols: HTTP/WebDAV, gRPC and Google Cloud Storage. We went with the v2 remote execution over gRPC. It is a lot faster than the legacy ones, especially when combined with --remote_download_toplevel a.k.a Remote Builds without the Bytes. Additionally, because gRPC is really HTTP/2, it is a lot less constrained by geographic imperatives for throughput based applications.

As an example, here is a fully cached build of @com_github_apple_swift_protobuf//:SwiftProtobuf with gRPC and HTTPS backed by the same GCS bucket on Bazel 4.0.0:

gRPC


$ bazel build @com_github_apple_swift_protobuf//:SwiftProtobuf --remote_cache=grpcs://mybazelcache-xxxx.xxxx.run.app --remote_download_minimal --remote_max_connections=30 --remote_timeout=30
Target @com_github_apple_swift_protobuf//:SwiftProtobuf up-to-date:
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf-Swift.h
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf.swiftdoc
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf.swiftmodule
  bazel-bin/external/com_github_apple_swift_protobuf/libSwiftProtobuf.a
INFO: Elapsed time: 4.471s, Critical Path: 1.40s
INFO: 136 processes: 122 remote cache hit, 14 internal.
INFO: Build completed successfully, 136 total actions

HTTPS

$ $ bazel build @com_github_apple_swift_protobuf//:SwiftProtobuf --remote_cache=https://storage.googleapis.com/MYBUCKET --remote_download_minimal --remote_max_connections=30 --remote_timeout=30
Target @com_github_apple_swift_protobuf//:SwiftProtobuf up-to-date:
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf-Swift.h
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf.swiftdoc
  bazel-bin/external/com_github_apple_swift_protobuf/SwiftProtobuf.swiftmodule
  bazel-bin/external/com_github_apple_swift_protobuf/libSwiftProtobuf.a
INFO: Elapsed time: 10.455s, Critical Path: 3.09s
INFO: 136 processes: 122 remote cache hit, 14 internal.
INFO: Build completed successfully, 136 total actions

The reason it's faster is mostly due to how the gRPC remote protocol works vs HTTP and znly/bazel-cache spawns multiple calls to GCS in parallel on a per-file basis. Also, Cloud Run is much closer to GCS that our CI or developer machines are.

Automatic Garbage Collection

The gcs:// URI can take a ttl_days parameter to enable automatic garbage collection. It is enabled via Google Cloud Storage Object Lifecycle Management by setting a DaysSinceCustomTime + DeleteAction on the bucket when it is started.

When a hash is looked up by Bazel, znly/bazel-cache will determine if the hash exists by updating its CustomTime. This means that testing if an object exists (and fetching its metadata such as size) and bumping its CustomTime happens in a single pass. Because the CustomTime is bumped when testing if an object exists, only objects that haven't been looked up in a won't get their CustomTime updated, and thus will be deleted by GCS. Effectively enabling garbage collection for free. This is controlled by the ttl_days URL parameter.

Deploying on Google Cloud Run

When deployed on Google Cloud Run, it will handle TLS, load balancing, scaling (down to zero) and authentication. This is standard Cloud Run really, but here goes:

Creating the GCS Bucket

First, create the GCS bucket in the same region your plan to deploy the container to (for optimised latency). We found the STANDARD class in a single region to be the best performing (YMMV):

gsutil mb gs://MYBUCKET \
    -p MYPROJECT \
    -c STANDARD \   # standard is fastest
    -l REGION \
    -b on           # use bucket-wide ACLs

Creating a dedicated Service Account

Create a service account that has the proper permissions to write to the bucket and administer it by adding the necessary roles to it:

gcloud iam service-accounts create bazel-cache --project MYPROJECT

ROLES=(run.invoker storage.objectCreator storage.objectViewer)
for role in ${ROLES}; do
    gcloud projects add-iam-policy-binding MYPROJECT \
        --member=serviceAccount:[email protected] \
        --role=roles/${role}
done

Pushing the Image

Pull the znly/bazel-cache image and push it to your project'sgcr.io registry:

docker pull znly/bazel-cache:0.0.3
docker tag znly/bazel-cache:0.0.3 gcr.io/MYPROJECT/bazel-cache:0.0.3
docker push gcr.io/MYPROJECT/bazel-cache:0.0.3

Deploying the service

Once everything is done, deploy thebazel-cache service with the proper service account:

gcloud run deploy bazel-cache \
    [email protected] \
    --project=MYPROJECT \
    --region=REGION \
    --platform=managed \
    --port=9092 \
    --cpu=4 \
    --memory=8Gi \
    '--args=serve,--loglevel,INFO,--port,:9092,--cache,gcs://MYBUCKET?ttl_days=30' \
    --image=gcr.io/MYPROJECT/bazel-cache:0.0.3 \
    --concurrency=80

The service should appear in your Cloud Run console. It's done! bazel-cache is running in Cloud Run.

Authentication

This one is tricky at the moment. Cloud Run has builtin authentication, and it's perfect for that use case.

However, while Bazel supports Google Cloud's Access Tokens via the --google_credentials and --google_default_credentials flags, Cloud Run only supports Identity Tokens as of the writing of this README. Bazel could support Identity Tokens natively, but it is not possible until googleapis/google-auth-library-java#469 and bazelbuild/bazel#12135 are fixed.

Thankfully, Bazel's builtin wrapper support makes it possible to invoke gcloud auth print-identity-token and pass the token via the --remote_header flag: when invoked, Bazel will look for an executable at tools/bazel in the workspace and, if it exists, will invoke it with a $BAZEL_REAL environment pointing the actual Bazel binary.

You'll need to edit the supplied tools/bazel wrapper (made in python) for you custom Cloud Run URL. The script will look for a --config=cache flag and invoke gcloud as needed. This is a tiny bit hackish, but it works well enough for now. Until native support in Bazel or Cloud Run, that is.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].