All Projects → rcarmo → Python Webarchive

rcarmo / Python Webarchive

Licence: mit
Create WebKit/Safari .webarchive files on any platform

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Labels

Projects that are alternatives of or similar to Python Webarchive

Strawberry
A new GraphQL library for Python 🍓
Stars: ✭ 891 (+2870%)
Mutual labels:  asyncio
V3n0m Scanner
Popular Pentesting scanner in Python3.6 for SQLi/XSS/LFI/RFI and other Vulns
Stars: ✭ 847 (+2723.33%)
Mutual labels:  asyncio
Fastapi
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Stars: ✭ 39,588 (+131860%)
Mutual labels:  asyncio
Aiomixcloud
Mixcloud API wrapper for Python and Async IO
Stars: ✭ 23 (-23.33%)
Mutual labels:  asyncio
Chili
Chili: HTTP Served Hot
Stars: ✭ 7 (-76.67%)
Mutual labels:  asyncio
Quart
Official mirror of https://gitlab.com/pgjones/quart
Stars: ✭ 872 (+2806.67%)
Mutual labels:  asyncio
Lofty
Coroutines, stack traces and smart I/O for C++11, inspired by Python and Golang.
Stars: ✭ 5 (-83.33%)
Mutual labels:  asyncio
Turbulette
😴 Turbulette - A batteries-included framework to build high performance, fully async GraphQL APIs
Stars: ✭ 29 (-3.33%)
Mutual labels:  asyncio
Td revent
tickdream rust event - Async IO similar to libevent
Stars: ✭ 9 (-70%)
Mutual labels:  asyncio
Lux
Asynchronous web toolkit for python - alpha
Stars: ✭ 20 (-33.33%)
Mutual labels:  asyncio
Aioslacker
slacker wrapper for asyncio
Stars: ✭ 23 (-23.33%)
Mutual labels:  asyncio
Tailsocket
A WebSocket application to tail files.
Stars: ✭ 24 (-20%)
Mutual labels:  asyncio
Eliot
Eliot: the logging system that tells you *why* it happened
Stars: ✭ 874 (+2813.33%)
Mutual labels:  asyncio
Sphinxcontrib Asyncio
Sphinx extension to add asyncio-specific markups
Stars: ✭ 19 (-36.67%)
Mutual labels:  asyncio
Fast bitrix24
Высокопроизводительный API wrapper для Питона для быстрого массового обмена данными с Битрикс24 через REST API
Stars: ✭ 28 (-6.67%)
Mutual labels:  asyncio
Async Reduce
Reducer for similar simultaneously coroutines
Stars: ✭ 17 (-43.33%)
Mutual labels:  asyncio
Python Microjet
Python 3 asynchronous microservices framework powered by asyncio.
Stars: ✭ 11 (-63.33%)
Mutual labels:  asyncio
Asyncio
asyncio historical repository
Stars: ✭ 952 (+3073.33%)
Mutual labels:  asyncio
Telepyrobot
A userbot for Telegram account made using Pyrogram Library and Python
Stars: ✭ 27 (-10%)
Mutual labels:  asyncio
Heroku Aiohttp Web
A project starter template for deploying an aiohttp app to Heroku
Stars: ✭ 14 (-53.33%)
Mutual labels:  asyncio

python-webarchive

This is a quick hack demonstrating how to create WebKit/Safari .webarchive files, inspired by pocket-archive-stream.

Usage

TARGET_URL=http://foo.com python3 main.py

Why .webarchive?

.webarchive is the native web page archive format on the Mac, and is essentially a serialized snapshot of Safari/WebKit state. On a Mac, these files are Spotlight-indexable and can be opened by just about anything that takes a "webpage" as input.

Despite the rising prominence of WARC as the standard web archiving format (which to this day requires plug-ins to be viewable on a browser) I quite like .webarchive, and built this in order to both demonstrate how to use it and have a minimally viable archive creator I can deploy as a service.

Anatomy of a .webarchive file

The file format is a nested binary .plist, with roughly the following structure:

{
    "WebMainResource": {
        "WebResourceURL": String(),
        "WebResourceMIMEType": String(),
        "WebResourceResponse": NSKeyedArchiver(NSObject)),
        "WebResourceData": Bytes(),
        "WebResourceTextEncodingName": String(optional=True)
    },
    "WebSubresources": [
        {item, item, item...}
    ]

}

So creating a .webarchive turns out to be fairly straightforward if you simply build a dict with the right structure and then serialize it using biplist (which works on any platform).

The only hitch would be WebResourceResponse (which uses a rather more complex way to encode the HTTP result headers), but fortunately that appears not to be necessary at all.

Next Steps

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].