All Projects → karlicoss → arctee

karlicoss / arctee

Licence: other
Atomic tee

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to arctee

ghexport
Export your Github activity: events, repositories, stars, etc.
Stars: ✭ 18 (-18.18%)
Mutual labels:  export, backup, data-liberation
goodrexport
Goodreads data export
Stars: ✭ 16 (-27.27%)
Mutual labels:  export, backup, data-liberation
pockexport
Export/access your Pocket data, including highlights!
Stars: ✭ 124 (+463.64%)
Mutual labels:  export, backup, data-liberation
evernote-backup
Backup & export all Evernote notes and notebooks
Stars: ✭ 104 (+372.73%)
Mutual labels:  export, backup
Qzoneexport
QQ空间导出助手,用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享、最近访客为文件,便于迁移与保存
Stars: ✭ 456 (+1972.73%)
Mutual labels:  export, backup
Roam To Git
Automatic RoamResearch backup to Git
Stars: ✭ 489 (+2122.73%)
Mutual labels:  export, backup
fb-export
Export (most) of your Facebook data using Node.js and the Graph API.
Stars: ✭ 21 (-4.55%)
Mutual labels:  export, backup
Github records archiver
Backs up a GitHub organization's repositories and all their associated information for archival purposes.
Stars: ✭ 100 (+354.55%)
Mutual labels:  export, backup
Quip Export
Export all folders and documents from Quip
Stars: ✭ 28 (+27.27%)
Mutual labels:  export, backup
Dynein
DynamoDB CLI written in Rust.
Stars: ✭ 126 (+472.73%)
Mutual labels:  export, backup
open2fa
Two-factor authentication app with import/export for iOS and macOS. All codes encrypted with AES 256. FaceID & TouchID support included. Written with love in SwiftUI ❤️
Stars: ✭ 24 (+9.09%)
Mutual labels:  export, backup
connect-backup
A tool to backup and restore AWS Connect, with some useful other utilities too
Stars: ✭ 19 (-13.64%)
Mutual labels:  export, backup
Wikiteam
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2020, WikiTeam has preserved more than 250,000 wikis.
Stars: ✭ 404 (+1736.36%)
Mutual labels:  export, backup
Node Firestore Import Export
Firestore data import and export
Stars: ✭ 271 (+1131.82%)
Mutual labels:  export, backup
Elasticsearch Dump
Import and export tools for elasticsearch
Stars: ✭ 5,977 (+27068.18%)
Mutual labels:  export, backup
Rexport
Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
Stars: ✭ 87 (+295.45%)
Mutual labels:  export, backup
Flares
Flares 🔥 is a CloudFlare DNS backup tool
Stars: ✭ 156 (+609.09%)
Mutual labels:  export, backup
calcardbackup
calcardbackup: moved to https://codeberg.org/BernieO/calcardbackup
Stars: ✭ 67 (+204.55%)
Mutual labels:  export, backup
browserexport
backup and parse browser history databases (chrome, firefox, safari, and other chrome/firefox derivatives)
Stars: ✭ 54 (+145.45%)
Mutual labels:  export, backup
wechat-export
📃 Export WeChat chat histories to HTML files.
Stars: ✭ 585 (+2559.09%)
Mutual labels:  export

Helper script to run your data exports. It works kind of like *tee* command, but:

  • a: writes output atomically
  • r: supports retrying command
  • c: supports compressing output

You can read more on how it’s used here.

Motivation

Many things are very common to all data exports, regardless of the source. In the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.

This script aims to minimize the common boilerplate:

  • path argument allows easy ISO8601 timestamping and guarantees atomic writing, so you’d never end up with corrupted exports.
  • --compression allows to compress simply by passing the extension. No more tar -zcvf!
  • --retries allows easy exponential backoff in case service you’re querying is flaky.

Example:

arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py
  1. runs /soft/export/rememberthemilk.py, retrying it up to three times if it fails

    The script is expected to dump its result in stdout; stderr is simply passed through.

  2. once the data is fetched it’s compressed as zstd
  3. timestamp is computed and compressed data is written to /exports/rtm/20200102T170015Z.ical.zstd

Do you really need a special script for that?

  • why not use date command for timestamps?

    passing $(date -Iseconds --utc).json as path works, however I need it for most of my exports; so it ends up polluting my crontabs.

Next, I want to do several things one after another here. That sounds like a perfect candidate for pipes, right? Sadly, there are serious caveats:

  • pipe errors don’t propagate. If one parts of your pipe fail, it doesn’t fail everything

    That’s a major problem that often leads to unexpected behaviours.

    In bash you can fix this by setting set -o pipefail. However:

    • default cron shell is /bin/sh. Ok, you can change it to SHELL=/bin/bash, but
    • you can’t set it to /bin/bash -o pipefail

      You’d have to prepend all of your pipes with set -o pipefail, which is quite boilerplaty

  • you can’t use pipes for retrying; you need some wrapper script anyway

    E.g. similar to how you need a wrapper script when you want to stop your program on timeout.

  • it’s possible to use pipes for atomically writing output to a file, however I haven’t found any existing tools to do that

    E.g. I want something like curl https://some.api/get-data | tee --atomic /path/to/data.sjon.

    If you know any existing tool please let me know!

  • it’s possible to pipe compression

    However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.

It feels that cron isn’t a suitable tool for my needs due to pipe handling and the need for retries, however I haven’t found a better alternative. If you think any of these things can be simplified, I’d be happy to know and remove them in favor of more standard solutions!

Installation

This can be installed with pip by running: pip3 install --user git+https://github.com/karlicoss/arctee

You can also manually install this by installing atomicwrites (pip3 install atomicwrites) and downloading and running arctee.py directly

Optional Dependencies

  • pip3 install --user backoff

    backoff is a library to simplify backoff and retrying. Only necessary if you want to use –retries–.

  • apt install atool

    atool is a tool to create archives in any format. Only necessary if you want to use compression.

Usage

usage: arctee [-h] [-r RETRIES] [-c COMPRESSION] path

Wrapper for automating boilerplate for reliable and regular data exports.

Example: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user "[email protected]"

Arguments past '--' are the actuall command to run.

positional arguments:
  path                  Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}.

                        Example: '/exports/pocket/pocket_{utcnow}.json'

                        (see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html)

optional arguments:
  -h, --help            show this help message and exit
  -r RETRIES, --retries RETRIES
                        Total number of tries, 1 (default) means only try once. Uses exponential backoff.
  -c COMPRESSION, --compression COMPRESSION
                        Set compression format.

                        See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].