All Projects → alopezrivera → anchorage

alopezrivera / anchorage

Licence: GPL-3.0 license
Save your bookmark collection in the Internet Archive, or locally.

Programming Languages

python
139335 projects - #7 most used programming language
powershell
5483 projects

Projects that are alternatives of or similar to anchorage

Warc
Golang WARC (Web ARChive) Library
Stars: ✭ 25 (+31.58%)
Mutual labels:  archiving
Wal G
Archival and Restoration for Postgres
Stars: ✭ 1,974 (+10289.47%)
Mutual labels:  archiving
Reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
Stars: ✭ 231 (+1115.79%)
Mutual labels:  archiving
Static Filez
Build compressed archives for static files and serve them over HTTP
Stars: ✭ 33 (+73.68%)
Mutual labels:  archiving
I7j Pdfhtml
pdfHTML is an iText 7 add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.
Stars: ✭ 104 (+447.37%)
Mutual labels:  archiving
Wikipedia Mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kimix + ZIM dump, and MediaWiki/XOWA + XML dump
Stars: ✭ 160 (+742.11%)
Mutual labels:  archiving
Django Urlarchivefield
A custom Django model field that automatically archives a URL
Stars: ✭ 5 (-73.68%)
Mutual labels:  archiving
wayback
⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (+173.68%)
Mutual labels:  internet-archive
Libarchive
Multi-format archive and compression library
Stars: ✭ 1,625 (+8452.63%)
Mutual labels:  archiving
Archivebot
ArchiveBot, an IRC bot for archiving websites
Stars: ✭ 218 (+1047.37%)
Mutual labels:  archiving
Paperless
Scan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+40226.32%)
Mutual labels:  archiving
Cli
A tiny CLI for HedgeDoc
Stars: ✭ 94 (+394.74%)
Mutual labels:  archiving
Jarchivelib
A simple archiving and compression library for Java
Stars: ✭ 162 (+752.63%)
Mutual labels:  archiving
Crocoite
Web archiving using Google Chrome
Stars: ✭ 30 (+57.89%)
Mutual labels:  archiving
Archiveror
Archiveror will help you preserve the webpages you love. 💾
Stars: ✭ 246 (+1194.74%)
Mutual labels:  archiving
Itext7
iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 913 (+4705.26%)
Mutual labels:  archiving
Archiveis
A simple Python wrapper for the archive.is capturing service
Stars: ✭ 140 (+636.84%)
Mutual labels:  archiving
artwork-redirect
URL redirect service for the coverartarchive.org
Stars: ✭ 25 (+31.58%)
Mutual labels:  internet-archive
Unifiedarchive
UnifiedArchive - an archive manager with a unified way for different formats. Supports all basic (listing, reading, extracting and creation) and specific features (compression level, password-protection). Bundled with console program for working with archives.
Stars: ✭ 246 (+1194.74%)
Mutual labels:  archiving
Pdf Archiver
A tool for tagging files and archiving tasks.
Stars: ✭ 182 (+857.89%)
Mutual labels:  archiving

Anchorage

alt text

Anchorage consists of a Python library and CLI to save your bookmark collection in bulk, forever: online in the Internet Archive or locally, using ArchiveBox.

Anchorage will automatically retrieve your bookmark collection from your browser of choice, filter out duplicates, local files as well as entries matching filters of your own making, and archive the chosen ones.

Read on to get started. The full Python API documentation is available here.

alt text


Table of Contents

1. Introduction

3. Requirements & Install

4. Anchorage configuration

4. Anchorage CLI

6. Python API

6.1 Anchorage configuration

6.3 Bookmark retrieval

6.3 Archiving


1. Introduction

As the internet ages link rot takes over larger and larger swathes of it, from the tiny to the mighty, from the trivial to the best pieces you ever found: all lost forever. Anchorage is an attempt to make it as easy as possible for you to save the little corner of it you're most fond of, for your own peace of mind and the enjoyment of us all :)

2. Requirements & Install

A working Docker install is the only requirement, beyond Python and Anchorage's dependencies. Without Docker: Docker is used to run ArchiveBox, via a provided docker-compose file. Without Docker Anchorage will not be able to archive your collection locally, but it will still be able to save it online in the Internet Archive.

Anchorage can be installed using pip as any Python package. Its dependencies will be downloaded automatically.

pip install anchorage

3. Anchorage configuration

To access a browser's bookmarks file, Anchorage stores its location in its configuration file:

~/.anchorage/config.toml

There's an example config.toml in this repo for reference.

To add a new browser simply add a new top-level key, followed by its bookmark file paths. Anchorage only needs the path in your operating system to work.

[<browser name>]
linux = <path>
macos = <path>
windows = <path>

Importantly:

  • Linux and MacOS paths are stored in full.
  • Windows paths are stored from the AppData directory.

The default config.toml contains the bookmark file paths for Google Chrome, Mozilla Firefox and Microsoft Edge and Edge Beta for Windows only. To use Anchorage in Linux or MacOS add the bookmark file path of your browser of choice to your config.toml.

Editing the Anchorage config file

The config file can be edited just as any other. New browsers will automatically be listed in the CLI.

Importantly:

  • Set unknown bookmark file paths to "?". That way the CLI will recognize those as unknown and behave appropriately.

alt text

4. Anchorage CLI

The CLI will guide you through retrieving your bookmarks from your browser of choice, applying filters to you bookmark collection and archiving your bookmarks in the Internet Archive or locally, using ArchiveBox.

To start the CLI open your shell and type

anchorage

You will be asked whether you're ready to proceed. On the ok it will ensure all dependencies are present.

1. Config check

If a config file is found, you will be prompted to choose whether to keep the current config or overwrite it with the default one.

2. Browser choice

You will be prompted to choose which browser to retrieve your bookmark collection from. The browser choices are sourced from config.toml. Refer to section 3 for editing it to add a missing browser or enter the path to the bookmarks file of your browser, if it's missing (equal to "?").

3. Applying filters to the collection

Filters can be applied to your bookmark collection before archiving. Any or all of four filters can be chosen, one specific for URLs:

  • Local files: remove local URLs (say, PDFs stored in your computer) from the collection.

and three general:

  • Match string: remove bookmark URLs, names or bookmark directories matching a provided string or any string in a string list.
  • Match substring: remove bookmark URLs, names or bookmark directories containing a provided string or any string in a string list.
  • Regex: remove bookmark URLs, names or bookmark directories matching a provided regex formula.

For each you will be prompted to choose to apply it to any or all of the previous.

4. Archive choice

You will be then asked to choose whether to archive your collection online or locally.

Online

By default websites will not be archived if a previous image exists in The Internet Archive. This is to save time: we rest easy as a those sites are saved already at some point. In case you want to save a current snapshot of the colection, you will be prompted whether to override this and archive all sites in the collection regardless. This may take significantly longer. Based on your choice, you will be given an estimate of the archive time.

Local

To archive your collection locally you will be prompted for an archive directory.

5. Run

After a last confirmation the process will begin. A progress bar will inform you of how far the process is from finishing, how many bookmarks have been saved and provide a dynamic estimate of the time remaining before the process is finished.

5. Python API: user's guide

The full documentation of the Anchorage API is available in the docs site.

5.1 Anchorage configuration

Generate the Anchorage config file with the init command.

from anchorage import init

init()

5.2 Bookmark retrieval

Three methods are relevant:

  • path(<browser>): obtain the path to your chosen browser's bookmarks file (in your OS) from config.toml.
  • load(<path>): read your chosen browser's JSON or JSONLZ4 bookmarks file and return a Python dictionary.
  • bookmarks(<dict>): create an instance of the bookmarks class.

The bookmarks class creates a second bookmarks dictionary more suitable for our intent, and contains methods to filter and loop through the collection. Filters can be applied as seen below.

from anchorage import path, load, bookmarks

collection = bookmarks(load(path(<browser name>)),
                       drop_local_files= <boolean>,
                       drop_dirs=        <string or list of strings>,
                       drop_names=       <string or list of strings>,
                       drop_urls=        <string or list of strings>,
                       drop_dirs_subs=   <string or list of strings>,
                       drop_names_subs=  <string or list of strings>,
                       drop_urls_subs=   <string or list of strings>,
                       drop_dirs_regex=  <string>,
                       drop_names_regex= <string>,
                       drop_urls_regex=  <string>
                       )

5.3 Archiving

Input: bookmarks instance or bookmark dictionary returned by load.

Online

from anchorage import anchor_online

anchor_online(bookmarks, overwrite=<bool>)

The overwrite parameter determines whether to save snapshots of sites already present in the Internet Archive or not.

Locally

from anchorage import anchor_locally

anchor_locally(bookmarks, archive=<dir>)

The archive parameter specifies the directory in which to create the local archive.

Running the ArchiveBox default NGINX server can be done with the following command.

from anchorage import server

server()

Back to top

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].