All Projects → seanbreckenridge → browserexport

seanbreckenridge / browserexport

Licence: MIT license
backup and parse browser history databases (chrome, firefox, safari, and other chrome/firefox derivatives)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to browserexport

alfred-browser-tabs
🔍 Search browser tabs from Chrome, Brave, Safari, etc..
Stars: ✭ 302 (+459.26%)
Mutual labels:  safari, vivaldi, brave
Qzoneexport
QQ空间导出助手,用于备份QQ空间的说说、日志、私密日记、相册、视频、留言板、QQ好友、收藏夹、分享、最近访客为文件,便于迁移与保存
Stars: ✭ 456 (+744.44%)
Mutual labels:  export, backup, chromium
chrome-flags
💐 My personal Chromium-based flags
Stars: ✭ 13 (-75.93%)
Mutual labels:  chromium, vivaldi, brave
Smart-Text-Editor
The text editor that requires only a browser and a keyboard!
Stars: ✭ 60 (+11.11%)
Mutual labels:  safari, google-chrome
Flares
Flares 🔥 is a CloudFlare DNS backup tool
Stars: ✭ 156 (+188.89%)
Mutual labels:  export, backup
privacy-settings
Guide to privacy settings for most major softwares and services.
Stars: ✭ 97 (+79.63%)
Mutual labels:  vivaldi, brave
goodrexport
Goodreads data export
Stars: ✭ 16 (-70.37%)
Mutual labels:  export, backup
Colorblinding
An extension for Google Chrome (and Chromium) that simulates the website as a color vision impaired person would see.
Stars: ✭ 25 (-53.7%)
Mutual labels:  google-chrome, chromium
connect-backup
A tool to backup and restore AWS Connect, with some useful other utilities too
Stars: ✭ 19 (-64.81%)
Mutual labels:  export, backup
calcardbackup
calcardbackup: moved to https://codeberg.org/BernieO/calcardbackup
Stars: ✭ 67 (+24.07%)
Mutual labels:  export, backup
utools-recent-projects
uTools 插件, 快速查询最近打开的项目
Stars: ✭ 84 (+55.56%)
Mutual labels:  safari, brave
new-tab
⚡ A high-performance browser new tab page that gets you where you need to go faster.
Stars: ✭ 64 (+18.52%)
Mutual labels:  google-chrome, chromium
Dynein
DynamoDB CLI written in Rust.
Stars: ✭ 126 (+133.33%)
Mutual labels:  export, backup
Github records archiver
Backs up a GitHub organization's repositories and all their associated information for archival purposes.
Stars: ✭ 100 (+85.19%)
Mutual labels:  export, backup
open2fa
Two-factor authentication app with import/export for iOS and macOS. All codes encrypted with AES 256. FaceID & TouchID support included. Written with love in SwiftUI ❤️
Stars: ✭ 24 (-55.56%)
Mutual labels:  export, backup
Rexport
Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
Stars: ✭ 87 (+61.11%)
Mutual labels:  export, backup
Quip Export
Export all folders and documents from Quip
Stars: ✭ 28 (-48.15%)
Mutual labels:  export, backup
crx3
Node.js module to create CRX3 files (web extension package v3 format) for Chromium, Google Chrome and Opera browsers.
Stars: ✭ 39 (-27.78%)
Mutual labels:  google-chrome, chromium
Roam To Git
Automatic RoamResearch backup to Git
Stars: ✭ 489 (+805.56%)
Mutual labels:  export, backup
Elasticsearch Dump
Import and export tools for elasticsearch
Stars: ✭ 5,977 (+10968.52%)
Mutual labels:  export, backup

browserexport

PyPi version Python 3.7|3.8|3.9 PRs Welcome

This:

  • locates and backs up browser history by copying the underlying database files to some directory you specify
  • can identify and parse the resulting sqlite files into some common schema

This doesn't aim to offer a way to 'restore' your history (see #16 for discussion), it just denormalizes and merges your history from backed up databases so its all available under some common format:

Visit:
  url: the url
  dt: datetime (when you went to this page)
  metadata:
    title: the <title> for this page
    description: the <meta description> tag from this page
    preview_image: 'main image' for this page, often opengraph/favicon
    duration: how long you were on this page

metadata is dependent on the data available in the browser (e.g. firefox has preview images, chrome has duration, but not vice versa)

This currently supports:

  • Firefox
    • Waterfox
    • Firefox Android (pre-2020 schema and current Fenix)
  • Chrome
    • Chromium
    • Brave
    • Vivaldi
  • Safari
  • Palemoon

This can probably extract visits from other Firefox/Chromium-based browsers, but it doesn't know how to locate them to save them

Install

python3 -m pip install --user browserexport

Requires python3.7+

Usage

save

Usage: browserexport save [OPTIONS]

  Backs up a current browser database file

Options:
  -b, --browser [chrome|firefox|safari|brave|waterfox|chromium|vivaldi|palemoon]
                                  Browser name to backup history for
  --form-history [firefox]        Browser name to backup form (input field) history for
  --pattern TEXT                  Pattern for the resulting timestamped filename, should include an
                                  str.format replacement placeholder
  -p, --profile TEXT              Use to pick the correct profile to back up. If unspecified, will assume a
                                  single profile  [default: *]
  --path FILE                     Specify a direct path to a database to back up
  -t, --to DIRECTORY              Directory to store backup to  [required]
  --help                          Show this message and exit.  [default: False]

Must specify one of --browser, --form-history or --path

Since browsers in typically remove old history over time, I'd recommend backing up your history periodically, like:

$ browserexport save -b firefox --to ~/data/browser_history
$ browserexport save -b chrome --to ~/data/browser_history
$ browserexport save -b safari --to ~/data/browser_history

That copies the sqlite databases which contains your history --to some backup directory.

If a browser you want to backup is Firefox/Chrome-like (so this would be able to parse it), but this doesn't support locating it yet, you can directly back it up with the --path flag:

$ browserexport save --path ~/.somebrowser/profile/places.sqlite \
  --to ~/data/browser_history

The --pattern argument can be used to change the resulting filename for the browser, e.g. --pattern 'places-{}.sqlite' or --pattern "$(uname)-{}.sqlite". The {} is replaced by the browser name.

Feel free to create an issue/contribute a browser file to locate the browser if this doesn't support some browser you use.

Can pass the --debug flag to show sqlite_backup logs

$ browserexport --debug save -b firefox --to .
[D 220202 10:10:22 common:87] Glob /home/sean/.mozilla/firefox with */places.sqlite (non recursive) matched [PosixPath('/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite')]
[I 220202 10:10:22 save:18] backing up /home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite to /home/sean/Repos/browserexport/firefox-20220202181022.sqlite
[D 220202 10:10:22 core:110] Source database files: '['/tmp/tmpcn6gpj1v/places.sqlite', '/tmp/tmpcn6gpj1v/places.sqlite-wal']'
[D 220202 10:10:22 core:111] Temporary Destination database files: '['/tmp/tmpcn6gpj1v/places.sqlite', '/tmp/tmpcn6gpj1v/places.sqlite-wal']'
[D 220202 10:10:22 core:64] Copied from '/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite' to '/tmp/tmpcn6gpj1v/places.sqlite' successfully; copied without file changing: True
[D 220202 10:10:22 core:64] Copied from '/home/sean/.mozilla/firefox/ew9cqpqe.dev-edition-default/places.sqlite-wal' to '/tmp/tmpcn6gpj1v/places.sqlite-wal' successfully; copied without file changing: True
[D 220202 10:10:22 core:230] Running backup, from '/tmp/tmpcn6gpj1v/places.sqlite' to '/home/sean/Repos/browserexport/firefox-20220202181022.sqlite'
[D 220202 10:10:22 save:14] Copied 1840 of 1840 database pages...
[D 220202 10:10:22 core:246] Executing 'wal_checkpoint(TRUNCATE)' on destination '/home/sean/Repos/browserexport/firefox-20220202181022.sqlite'

For Firefox Android, backing up the database from Fenix (at data/data/org.mozilla.fenix/files/places.sqlite) requires a rooted Android phone.

inspect/merge

Usage: browserexport inspect [OPTIONS] SQLITE_DB

  Extracts visits from a single sqlite database

  Provide a history database as the first argument
  Drops you into a REPL to access the data

Options:
  -s, --stream  Stream JSON objects instead of printing a JSON list
  -j, --json    Print result to STDOUT as JSON
  --help        Show this message and exit.
Usage: browserexport merge [OPTIONS] SQLITE_DB...

  Extracts visits from multiple sqlite databases

  Provide multiple sqlite databases as positional arguments, e.g.:
  browserexport merge ~/data/firefox/*.sqlite

  Drops you into a REPL to access the data

Options:
  -s, --stream  Stream JSON objects instead of printing a JSON list
  -j, --json    Print result to STDOUT as JSON
  --help        Show this message and exit.

Logs are hidden by default. To show the debug logs set export BROWSEREXPORT_LOGS=10 (uses logging levels) or pass the --debug flag.

As an example:

browserexport --debug merge ~/data/firefox/* ~/data/chrome/*
[D 210417 21:12:18 merge:38] merging information from 24 sources...
[D 210417 21:12:18 parse:19] Reading visits from /home/sean/data/firefox/places-20200828223058.sqlite...
[D 210417 21:12:18 common:40] Chrome: Running detector query 'SELECT * FROM keyword_search_terms'
[D 210417 21:12:18 common:40] Firefox: Running detector query 'SELECT * FROM moz_meta'
[D 210417 21:12:18 parse:22] Detected as Firefox
[D 210417 21:12:19 parse:19] Reading visits from /home/sean/data/firefox/places-20201010031025.sqlite...
[D 210417 21:12:19 common:40] Chrome: Running detector query 'SELECT * FROM keyword_search_terms'
....
[D 210417 21:12:48 common:40] Firefox: Running detector query 'SELECT * FROM moz_meta'
[D 210417 21:12:48 common:40] Safari: Running detector query 'SELECT * FROM history_tombstones'
[D 210417 21:12:48 parse:22] Detected as Safari
[D 210417 21:12:48 merge:51] Summary: removed 3001879 duplicates...
[D 210417 21:12:48 merge:52] Summary: returning 334490 visit entries...

Use vis to interact with the data

[1] ...

To dump all that info to JSON:

browserexport merge --json ~/data/browser_history/*.sqlite > ./history.json
du -h history.json
67M     history.json

Or, to create a quick searchable interface, using jq and fzf:

browserexport merge -j --stream ~/data/browsing/*.sqlite | jq '"\(.url)|\(.metadata.description)"' | awk '!seen[$0]++' | fzf

HPI

If you want to cache the merged results, this has a module in HPI which handles locating/caching and querying the results. See setup and module setup.

That uses cachew to automatically cache the merged results, recomputing whenever you backup new databases

As a few examples:

$ hpi doctor -S my.browser.all
✅ OK  : my.browser.all
✅     - stats: {'history': {'count': 721951, 'last': datetime.datetime(2021, 4, 19, 2, 26, 8, 29825, tzinfo=datetime.timezone.utc)}}
# supports arbitrary queries, e.g. how many visits did I have in January 2020?
$ hpi query my.browser.all --order-type datetime --after '2022-01-01 00:00:00' --before '2022-01-31 23:59:59' | jq length
50432
# how many github URLs in the past month
$ hpi query my.browser.all --recent 4w -s | jq .url | grep 'github.com' -c
16357

Library Usage

To save databases:

from browserexport.save import backup_history
backup_history("firefox", "~/data/backups")

To merge/read visits from databases:

from browserexport.merge import read_and_merge
read_and_merge(["/path/to/database", "/path/to/second/database", "..."])

If this doesn't support a browser and you wish to quickly extend without maintaining a fork (or contributing back to this repo), you can pass a Browser implementation (see browsers/all.py and browsers/common.py for more info) to browserexport.parse.read_visits or programatically override/add your own browsers as part of the browserexport.browsers namespace package.

Comparisons with Promnesia

A lot of the initial queries/ideas here were taken from promnesia and the browser_history.py script, but creating a package here allows its to be more extendible, e.g. allowing you to override/locate additional databases.

The primary goals of promnesia and this are quite different -- this is tiny subset of that project -- it replaces the sources/browser.py file with a package instead, while promnesia is an entire system to load data sources and use a browser extension to search/interface with your past data.

Eventually this project may be used in promnesia to replace the browser.py file

Testing

git clone https://github.com/seanbreckenridge/browserexport
cd ./browserexport
pip install '.[testing]'
pytest
flake8 ./browserexport
mypy ./browserexport
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].