All Projects → oduwsdl → MemGator

oduwsdl / MemGator

Licence: MIT license
A Memento Aggregator CLI and Server in Go

Programming Languages

go
31211 projects - #10 most used programming language
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to MemGator

warrick
Recover lost websites from the Web Infrastructure
Stars: ✭ 76 (+80.95%)
Mutual labels:  memento, web-archiving, memento-rfc
awesome-memento
A list of things related to software, literature, and other content for 🕣 Memento
Stars: ✭ 62 (+47.62%)
Mutual labels:  memento, memento-rfc
Archivebox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+29383.33%)
Mutual labels:  web-archiving
undoredo-cpp
Some concepts of Undo/Redo attempted in C++03
Stars: ✭ 20 (-52.38%)
Mutual labels:  memento
Archiveror
Archiveror will help you preserve the webpages you love. 💾
Stars: ✭ 246 (+485.71%)
Mutual labels:  web-archiving
Replayweb.page
Serverless Web Archive Replay directly in the browser
Stars: ✭ 84 (+100%)
Mutual labels:  web-archiving
svelte-undoable
Memento design pattern in Svelte
Stars: ✭ 39 (-7.14%)
Mutual labels:  memento
Pywb
Core Python Web Archiving Toolkit for replay and recording of web archives
Stars: ✭ 798 (+1800%)
Mutual labels:  web-archiving
wayback
⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs
Stars: ✭ 52 (+23.81%)
Mutual labels:  memento
Wail
🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
Stars: ✭ 232 (+452.38%)
Mutual labels:  web-archiving
Warcio
Streaming WARC/ARC library for fast web archive IO
Stars: ✭ 195 (+364.29%)
Mutual labels:  web-archiving
Archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (+164.29%)
Mutual labels:  web-archiving
Warcreate
Chrome extension to "Create WARC files from any webpage"
Stars: ✭ 143 (+240.48%)
Mutual labels:  web-archiving
Conifer
Collect and revisit web pages.
Stars: ✭ 1,259 (+2897.62%)
Mutual labels:  web-archiving
oh-my-design-patterns
🎨 Record the articles and code I wrote while learning design patterns
Stars: ✭ 33 (-21.43%)
Mutual labels:  memento
Archiveweb.page
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Stars: ✭ 69 (+64.29%)
Mutual labels:  web-archiving
Archivenow
A Tool To Push Web Resources Into Web Archives
Stars: ✭ 253 (+502.38%)
Mutual labels:  web-archiving
timedmap
A thread safe map which has expiring key-value pairs.
Stars: ✭ 49 (+16.67%)
Mutual labels:  timemap
MementoEmbed
A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
Stars: ✭ 13 (-69.05%)
Mutual labels:  memento
pastpages.org
The news homepage archive
Stars: ✭ 81 (+92.86%)
Mutual labels:  memento

MemGator

A Memento Aggregator CLI and Server in Go.

Features

  • The binary (available for various platforms) can be used as the CLI or run as a Web Service
  • Results available in three formats - Link/JSON/CDXJ
  • TimeMap, TimeGate, and Memento (redirect or description) endpoints
  • Optional streaming of benchmarks over Server-Sent Events (SSE) for realtime visualization and monitoring
  • Good API parity with the main Memento Aggregator service
  • Concurrent - Splits every session in subtasks for parallel execution
  • Parallel - Utilizes all the available CPUs
  • Custom archive list (a local JSON file or a remote URL) - A sample JSON is included in the repository
  • Probability based archive prioritization and limit
  • Configurable automated temporary exclusion of malfunctioning upstream archives
  • Three levels of customizable timeouts for greater control over remote requests
  • Customizable logging and profiling in CDXJ format
  • Customizable endpoint URLs - Helpful in load-balancing
  • Customizable User-Agent to be sent to each archive and User-Agent spoofing
  • Configurable archive failure detection and automatic hibernation
  • CORS support to make it easy to use it from JavaScript clients
  • Memento count exposed in the header that can be retrieved via HEAD request
  • Docker friendly - An image available as oduwsdl/memgator
  • Sensible defaults - Batteries included, but replaceable

Usage

CLI

Command line interface of MemGator allows retrieval of the TimeMap and the description of the closest Memento (equivalent to the TimeGate) over STDOUT in all supported formats. Logs and benchmarks (in verbose mode) and Error output are available on STDERR unless appropriate files are configured. For further details, see the full usage.

$ memgator [options] {URI-R}                            # TimeMap from CLI
$ memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI

Server

When run as a Web Service, MemGator exposes following customizable endpoints:

$ memgator [options] server
TimeMap:  http://localhost:1208/timemap/{FORMAT}/{URI-R}
TimeGate: http://localhost:1208/timegate/{URI-R} [Accept-Datetime]
Memento:  http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{URI-R}
About:    http://localhost:1208/about
Monitor:  http://localhost:1208/monitor - (Over SSE, if enabled)

  {FORMAT}          => link|json|cdxj
  {DATETIME}        => YYYY[MM[DD[hh[mm[ss]]]]]
  [Accept-Datetime] => Header in RFC1123 format
  • TimeMap endpoint serves an aggregated TimeMap for a given URI-R in accordance with the Memento RFC. Additionally, it makes sure that the Mementos are chronologically ordered. It also provides the TimeMap data serialized in additional experimental formats.
  • TimeGate endpoint allows datetime negotiation via the Accept-Datetime header in accordance with the Memento RFC. A successful response redirects to the closes Memento (to the given datetime) using the Location header. The default datetime is the current time. A successful response also includes a Link header which provides links to the first, last, next, and previous Mementos.
  • Memento endpoint allows datetime negotiation in the request URL itself for clients that cannot easily send custom request headers (as opposed to the TimeGate which requires the Accept-Datetime header). This endpoint behaves differently based on whether the format was specified in the request. It essentially splits the functionality of the TimeGate endpoint as follows:
    • If a format is specified, it returns the description of the closest Memento (to the given datetime) in the specified format. It is essentially the same data that is available in the Link header of the TimeGate response, but as the payload in the format requested by the client.
    • If a format is not specified, it redirects to the closest Memento (to the given datetime) using the Location header.
    • If the term proxy is used instead of a format then it acts like a proxy for the closest original unmodified Memento with added CORS headers.
  • About endpoint reports the list of upstream archives, their status, and values of various configurations of the server.
  • Monitor is an optional endpoint that can be enabled by the --monitor flag when the server is started. If enabled, it provides a stream of the benchmark log over SSE for realtime visualization and monitoring.

NOTE: A fallback endpoint /api is added for compatibility with Time Travel APIs to allow drop-in replacement in existing tools. This endpoint is an alias to the /memento endpoint that returns the description of a Memento.

Download and Install

Depending on the machine and operating system download appropriate binary from the releases page. Change the mode of the file to executable chmod +x MemGator-BINARY. Run from the current location of the downloaded binary or rename it to memgator and move it into a directory that is in the PATH (such as /usr/local/bin/) to make it available as a command.

Running as a Docker Container

Build a Docker image locally from the source.

$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ docker image build -t oduwsdl/memgator .

Alternatively, pull a published image from one of the two Docker image registries below:

$ docker image pull docker.pkg.github.com/oduwsdl/memgator/memgator
$ docker image pull oduwsdl/memgator

Run MemGator with various options inside a Docker container.

$ docker container run -it --rm oduwsdl/memgator -h
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R}
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]}
$ docker container run -d --name=memgator-server -p 1208:1208 oduwsdl/memgator [options] server
$ curl -i http://localhost:1208/about
$ docker container rm -f memgator-server

Full Usage

   _____                  _______       __
  /     \  _____  _____  / _____/______/  |___________
 /  Y Y  \/  __ \/     \/  \  ___\__  \   _/ _ \_   _ \
/   | |   \  ___/  Y Y  \   \_\  \/ __ |  | |_| |  | \/
\__/___\__/\____\__|_|__/\_______/_____|__|\___/|__|

# MemGator ({Version})

A Memento Aggregator CLI and Server in Go

Usage:
  memgator [options] {URI-R}                            # TimeMap from CLI
  memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI
  memgator [options] server                             # Run as a Web Service

Options:
  -A, --agent=MemGator/{Version} <{CONTACT}>  User-agent string sent to archives
  -a, --arcs=https://git.io/archives          Local/remote JSON file path/URL for list of archives
  -b, --benchmark=                            Benchmark file location - defaults to Logfile
  -c, --contact=https://git.io/MemGator       Comment/Email/URL/Handle - used in the user-agent
  -D, --static=                               Directory path to serve static assets from
  -d, --dormant=15m0s                         Dormant period after consecutive failures
  -F, --tolerance=-1                          Failure tolerance limit for each archive
  -f, --format=Link                           Output format - Link/JSON/CDXJ
  -H, --host=localhost                        Host name - only used in web service mode
  -k, --topk=-1                               Aggregate only top k archives based on probability
  -l, --log=                                  Log file location - defaults to STDERR
  -m, --monitor=false                         Benchmark monitoring via SSE
  -P, --proxy=http://{HOST}[:{PORT}]{ROOT}    Proxy URL - defaults to host, port, and root
  -p, --port=1208                             Port number - only used in web service mode
  -R, --root=/                                Service root path prefix
  -r, --restimeout=1m0s                       Response timeout for each archive
  -S, --spoof=false                           Spoof each request with a random user-agent
  -T, --hdrtimeout=30s                        Header timeout for each archive
  -t, --contimeout=5s                         Connection timeout for each archive
  -V, --verbose=false                         Show Info and Profiling messages on STDERR
  -v, --version=false                         Show name and version

Build

Assuming that Git and Go (version >= 1.14) are installed. Cloning, running, building, and installing the code can be done using following commands:

$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ go run main.go
$ go build
$ go install
$ memgator --help
$ memgator http://example.com/

To compile cross-platform binaries run the crossbuild.sh script:

$ ./crossbuild.sh

This will generate binaries for various OSes and Architectures in /tmp/mgbins directory.

Citing Project

A publication related to this project appeared in the proceedings of JCDL 2016 (Read the PDF). Please cite it as below:

Sawood Alam, Michael Nelson. MemGator - A Portable Concurrent Memento Aggregator: Cross-Platform CLI and Server Binaries in Go. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016, pp. 243-244, Newark, New Jersey, USA, June 2016.

@inproceedings{jcdl-2016:alam:memgator,
  author    = {Sawood Alam and
               Michael L. Nelson},
  title     = {{MemGator - A Portable Concurrent Memento Aggregator}},
  booktitle = {Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries},
  series    = {JCDL '16},
  year      = {2016},
  month     = {jun},
  location  = {Newark, New Jersey, USA},
  pages     = {243--244},
  numpages  = {2},
  url       = {http://dx.doi.org/10.1145/2910896.2925452},
  doi       = {10.1145/2910896.2925452},
  isbn      = {978-1-4503-4229-2},
  publisher = {ACM},
  address   = {New York, NY, USA}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].