Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → allegro → Akubra

allegro / Akubra

Licence: other

Simple solution to keep a independent S3 storages in sync

Programming Languages

31211 projects - #10 most used programming language

Labels

storage s3 sync ceph object-storage

Projects that are alternatives of or similar to Akubra

benji

📁 This library is a Scala reactive DSL for object storage (e.g. S3/Amazon, S3/CEPH, Google Cloud Storage).

Stars: ✭ 18 (-77.22%)

Mutual labels: storage, s3, ceph

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+439.24%)

Mutual labels: s3, storage, object-storage

Infinit

The Infinit policy-based software-defined storage platform.

Stars: ✭ 363 (+359.49%)

Mutual labels: s3, storage, object-storage

awesome-storage

A curated list of storage open source tools. Backups, redundancy, sharing, distribution, encryption, etc.

Stars: ✭ 324 (+310.13%)

Mutual labels: storage, s3, ceph

Radosgw Admin4j

A Ceph Object Storage Admin SDK / Client Library for Java ✨🍰✨

Stars: ✭ 50 (-36.71%)

Mutual labels: s3, storage, ceph

esop

Cloud-enabled backup and restore tool for Apache Cassandra

Stars: ✭ 40 (-49.37%)

Mutual labels: storage, s3, ceph

Cloudexplorer

Cloud Explorer

Stars: ✭ 170 (+115.19%)

Mutual labels: s3, sync, storage

S4 is 100% S3 compatible storage, accessed through Tor and distributed using IPFS.

Stars: ✭ 67 (-15.19%)

Mutual labels: storage, s3, object-storage

Juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Stars: ✭ 4,262 (+5294.94%)

Mutual labels: s3, storage, object-storage

Cloudserver

Zenko CloudServer, an open-source Node.js implementation of the Amazon S3 protocol on the front-end and backend storage capabilities to multiple clouds, including Azure and Google.

Stars: ✭ 1,167 (+1377.22%)

Mutual labels: storage, object-storage

Mort

Storage and image processing server written in Go

Stars: ✭ 420 (+431.65%)

Mutual labels: s3, storage

Oio Sds

High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift

Stars: ✭ 465 (+488.61%)

Mutual labels: storage, object-storage

Rook

Storage Orchestration for Kubernetes

Stars: ✭ 9,369 (+11759.49%)

Mutual labels: storage, ceph

Rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files

Stars: ✭ 30,541 (+38559.49%)

Mutual labels: s3, sync

Edgefs

EdgeFS - decentralized, scalable data fabric platform for Edge/IoT Computing and Kubernetes apps

Stars: ✭ 358 (+353.16%)

Mutual labels: s3, storage

Weibo Picture Store

🖼 新浪微博图床 Chrome/Firefox 扩展，支持同步到微相册

Stars: ✭ 624 (+689.87%)

Mutual labels: sync, storage

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+317.72%)

Mutual labels: s3, storage

S5cmd

Parallel S3 and local filesystem execution tool.

Stars: ✭ 565 (+615.19%)

Mutual labels: s3, storage

Minio

High Performance, Kubernetes Native Object Storage

Stars: ✭ 30,698 (+38758.23%)

Mutual labels: s3, storage

Api

SODA API is an open source implementation of SODA API Standards for Data and Storage Management.

Stars: ✭ 795 (+906.33%)

Mutual labels: storage, ceph

View All Similar Projects ➔

Akubra

Goals

Redundancy

Akubra is a simple solution to keep independent S3 storages in sync - almost realtime, eventually consistent.

Keeping synchronized storage clusters, which handles great volume of new objects, is the most efficient by feeding them with all incoming data at once. That's what Akubra does, with a minimum memory and cpu footprint.

Synchronizing S3 storages offline is almost impossible with a high volume of traffic. It would require keeping track of new objects (or periodical bucket listing), downloading and uploading them to the other storage. It's slow, expensive and hard to implement robustly.

Akubra way is to put files in all storages at once by copying requests to multiple backends. I case one if clusters rejects request it logs that event, and synchronizes troublesome object with an independent process.

Seamless storage space extension with new storage clusters

Akubra has sharding capabilities. You can easily configure new backends with weights and append them to regions cluster pool.

Based on cluster weights akubra splits all operations between clusters in pool. It also backtracks to older cluster when requested for not existing object on target cluster. This kind of events are logged, so it's possible to rebalance clusters in background.

Multi cloud cost optimization

While all objects has to be stored in each storage within a shard, not all storages has to be read. With load balancing and storage prioritization akubra will peak cheapest one.

Build

Prerequisites

You need go >= 1.8 compiler see

Build

In main directory of this repository do:

make build

Test

make test

Usage of Akubra:

usage: akubra [<flags>]

Flags:
      --help       Show context-sensitive help (also try --help-long and --help-man).
  -c, --conf=CONF  Configuration file e.g.: "conf/dev.yaml"

Example:

akubra -c devel.yaml

How it works?

Once a request comes to our proxy we copy all its headers and create pipes for body streaming to each endpoint. If any endpoint returns a positive response it's immediately returned to a client. If all endpoints return an error, then the first response is passed to the client

If some nodes respond incorrectly we log which cluster has a problem, is it storing or reading and where the erroneous file may be found. In that case we also return positive response as stated above.

We also handle slow endpoint scenario. If there are more connections than safe limit defined in configuration, the backend with most of them is taken out of the pool and an error is logged.

Configuration

Configuration is read from a YAML configuration file with the following fields:

Service:
  Server:
    BodyMaxSize: 100MB
    MaxConcurrentRequests: 200
    # Listen interface and port e.g. "0:8000", "localhost:9090", ":80"
    Listen: ":7082"
    # Technical endpoint interface
    TechnicalEndpointListen: ":7005"
    # Health check endpoint (for load balancers)
    HealthCheckEndpoint: "/status/ping"
  Client:
    # Additional not AWS S3 specific headers proxy will add to original request
    AdditionalResponseHeaders:
        'Access-Control-Allow-Origin': "*"
        'Access-Control-Allow-Credentials': "true"
        'Access-Control-Allow-Methods': "GET, POST, OPTIONS"
        'Access-Control-Allow-Headers': "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,X-CSRFToken"
        'Cache-Control': "public, s-maxage=600, max-age=600"
    # Additional headers added to backend response
    AdditionalRequestHeaders:
        'Cache-Control': "public, s-maxage=600, max-age=600"
    # Backends in maintenance mode
    # MaintainedBackends:
    #  - "http://s3.dc2.internal"
    # List request methods to be logged in synclog in case of backend failure
    SyncLogMethods:
      - GET
      - PUT
      - DELETE
    # Transports rules with dedicated timeouts
    Transports:
      -
        Name: TransportDef-Method:GET|POST
        Rules:
          Method: GET|POST
          Path: .*
        Properties:
          MaxIdleConns: 200
          MaxIdleConnsPerHost: 1000
          IdleConnTimeout: 2s
          ResponseHeaderTimeout: 5s
      -
        Name: TransportDef-Method:GET|POST|PUT
        Rules:
          Method: GET|POST|PUT
          QueryParam: acl
        Properties:
          MaxIdleConns: 200
          MaxIdleConnsPerHost: 500
          IdleConnTimeout: 5s
          ResponseHeaderTimeout: 5s
      -
        Name: OtherTransportDefinition
        Rules:
        Properties:
          MaxIdleConns: 300
          MaxIdleConnsPerHost: 600
          IdleConnTimeout: 2s
          ResponseHeaderTimeout: 2s

# List request methods to be logged in synclog in case of backend failure
SyncLogMethods:
  - PUT
  - DELETE
# Configure sharding
Clusters:
  cluster1:
    Backends:
      - http://127.0.0.1:9001
  cluster2:
    Backends:
      - http://127.0.0.1:9002
Regions:
  myregion:
    Clusters:
      - Cluster: cluster1
        Weight: 0
      - Cluster: cluster2
        Weight: 1
    Domains:
      - myregion.internal

Logging:
  Synclog:
    stderr: true
  #  stdout: false  # default: false
  #  file: "/var/log/akubra/sync.log"  # default: ""
  #  syslog: LOG_LOCAL1  # default: LOG_LOCAL1
  #  database:
  #    user: dbUser
  #    password: ""
  #    dbname: dbName
  #    host: localhost
  #    inserttmpl: |
  #      INSERT INTO tablename(path, successhost, failedhost, ts,
  #       method, useragent, error)
  #      VALUES ('new','{{.path}}','{{.successhost}}','{{.failedhost}}',
  #      '{{.ts}}'::timestamp, '{{.method}}','{{.useragent}}','{{.error}}');

  Mainlog:
    stderr: true
  #  stdout: false  # default: false
  #  file: "/var/log/akubra/akubra.log"  # default: ""
  #  syslog: LOG_LOCAL2  # default: LOG_LOCAL2
  #  level: Error   # default: Debug

  Accesslog:
    stderr: true  # default: false
  #  stdout: false  # default: false
  #  file: "/var/log/akubra/access.log"  # default: ""
  #  syslog: LOG_LOCAL3  # default: LOG_LOCAL3

# Enable metrics collection
Metrics:
  # Possible targets: "graphite", "expvar", "stdout"
  Target: graphite
  # Expvar handler listener address
  ExpAddr: ":8080"
  # How often metrics should be released, applicable for "graphite" and "stdout"
  Interval: 30s
  # Graphite metrics prefix path
  Prefix: my.metrics
  # Shall prefix be suffixed with "<hostname>.<process>"
  AppendDefaults: true
  # Graphite collector address
  Addr: graphite.addr.internal:2003
  # Debug includes runtime.MemStats metrics
  Debug: false

Configuration validation for CI

Akubra has a technical http endpoint for configuration validation purposes. It's configured with TechnicalEndpointListen property.

Example usage

curl -vv -X POST -H "Content-Type: application/yaml" --data-binary @akubra.cfg.yaml http://127.0.0.1:8071/configuration/validate

Possible responses:

* HTTP 200
Configuration checked - OK.

or:

* HTTP 400, 405, 413, 415 and info in body with validation error message

Health check endpoint

Feature required by load balancers, DNS servers and related systems for health checking. In configuration YAML we have a HealthCheckEndpoint parameter - it's an URI path for health check HTTP endpoint.

Example usage

curl -vv -X GET http://127.0.0.1:8080/status/ping

Response:

< HTTP/1.1 200 OK
< Cache-Control: no-cache, no-store
< Content-Type: text/html
< Content-Length: 2
OK

Transports and Rules with dedicated timeouts

This feature guarantees high availability and better transmission.

For example, when one specific HTTP method has lag we can set timeouts with special 'Rule'. Another example, when user adds big chunks by multi upload, default timeout needs to be changed with dedicated 'Transport' with 'Rule' for this case.

We have 'Rules' for 'Transports' definitions:

required minimum one item in 'Transports' section
required empty or one property (Method, Path, QueryParam) in 'Rules' section
if 'Rules' section is empty, the transport will match any requests
when transport cannot be matched, http 500 error code will be sent to client.

Limitations

Users credentials have to be identical on every backend
We do not support S3 partial uploads

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗