All Projects → gnur → Demeter

gnur / Demeter

Licence: gpl-3.0
Demeter is a tool for scraping the calibre web ui

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Demeter

Skrape.it
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Stars: ✭ 231 (+49.03%)
Mutual labels:  hacktoberfest, scraper
lux
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 19,266 (+12329.68%)
Mutual labels:  scraper, download
fansly
Simply scrape / download all the media from an fansly account
Stars: ✭ 351 (+126.45%)
Mutual labels:  scraper, download
Annie
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 16,369 (+10460.65%)
Mutual labels:  scraper, download
Node Website Scraper
Download website to local directory (including all css, images, js, etc.)
Stars: ✭ 912 (+488.39%)
Mutual labels:  hacktoberfest, scraper
vsco-scraper
Easily allows for scraping a VSCO
Stars: ✭ 106 (-31.61%)
Mutual labels:  scraper, download
ceiba-dl
NTU CEIBA 資料下載工具
Stars: ✭ 80 (-48.39%)
Mutual labels:  scraper, download
Cryptocmd
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Stars: ✭ 299 (+92.9%)
Mutual labels:  hacktoberfest, scraper
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+3020.65%)
Mutual labels:  hacktoberfest, scraper
Scrape It
🔮 A Node.js scraper for humans.
Stars: ✭ 3,773 (+2334.19%)
Mutual labels:  hacktoberfest, scraper
Youtubeexplode
The ultimate dirty YouTube library
Stars: ✭ 1,775 (+1045.16%)
Mutual labels:  hacktoberfest, download
Youtubedownloader
Downloads videos and playlists from YouTube
Stars: ✭ 2,202 (+1320.65%)
Mutual labels:  hacktoberfest, download
Bandcamp Scraper
A scraper for https://bandcamp.com
Stars: ✭ 137 (-11.61%)
Mutual labels:  hacktoberfest, scraper
Pully
A simple CLI and library for downloading high quality YouTube videos!
Stars: ✭ 153 (-1.29%)
Mutual labels:  hacktoberfest
Timber Acf Wp Blocks
Create Gutenberg blocks from Twig templates and ACF fields.
Stars: ✭ 154 (-0.65%)
Mutual labels:  hacktoberfest
Go Dash
A Go library for generating MPEG-DASH manifests.
Stars: ✭ 153 (-1.29%)
Mutual labels:  hacktoberfest
Plotkicadsch
This project aims at being able to export Kicad Sch files to structured picture files
Stars: ✭ 153 (-1.29%)
Mutual labels:  hacktoberfest
Pagermon
Multimon-ng pager message parser and viewer
Stars: ✭ 154 (-0.65%)
Mutual labels:  hacktoberfest
Flutterresources
A list of Flutter resources that will help people get started with Flutter
Stars: ✭ 154 (-0.65%)
Mutual labels:  hacktoberfest
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Stars: ✭ 153 (-1.29%)
Mutual labels:  scraper

demeter

demeter is a tool for downloading the .epub files you don't have from a Calibre library. It does this by building a database of books it has seen based on some clever algorithms. At least, that's the idea.

(Demeter only allows scraping a host every 12 hours to prevent overloading the server.)

Installation and Usage

Download the appropriate demeter binary for your platform from the releases page.

This is a standalone binary, there's no need to install any dependencies.

Move it somewhere in your $PATH so you can call it with demeter

Add a Host

demeter host add http://example.com:8080

Scrape all hosts and store results in the directory ./books and only download the extension pdf

demeter scrape run -d books -e pdf

For the rest, use the built in help.

This tool can be used for whatever you want, enjoy.

important note regarding extensions

The -e flag on the scrape run command only affects that specific run, the books are stored without any extension information in the database. In general that means that if you switch from the -e epub (default) to -e mobi, you will only download new books in the mobi extension. Books that were already present will not be re-downloaded in a different extension.

Database

Demeter builds an internal database that is stored in ~/.demeter/demeter.db

Scraping

When scraping a host, demeter does the following:

  • Use the API to collect all book ids
  • Check if there a new book ids since the previous scrape
  • Use the API to get the details for all the new book ids
  • Check the internal db if a book has already been downloaded
  • Download the book if it isn't and add it to the internal db
  • Mark the host as scraped so it won't do it again within 12 hours
  • If the host failed, mark it as failed and disable it after a while

all commands

$ demeter -h
demeter is CLI application for scraping calibre hosts and
retrieving books in epub format that are not in your local library.

Usage:
  demeter [command]

Available Commands:
  dl          download related commands
  help        Help about any command
  host        all host related commands
  scrape      all scrape related commands

$ demeter dl -h
download related commands

Usage:
  demeter dl [command]

Aliases:
  dl, download, downloads, dls

Available Commands:
  add          add a number of hashes to the database
  deleterecent delete all downloads from this time period
  list         list all downloads

$ demeter host -h
all host related commands

Usage:
  demeter host [command]

Available Commands:
  add         add a host to the scrape list
  disable     disable a host
  enabled     make a host active
  list        list all hosts
  rm          delete a host
  stats       Get host stats

$ demeter scrape -h
all scrape related commands

Usage:
  demeter scrape [command]

Available Commands:
  run         run all scrape jobs

$ demeter scrape run -h
demeter scrape run -h
Go over all defined hosts and if the last scrape
is old enough it will scrape that host.

Usage:
  demeter scrape run [flags]

Flags:
  -h, --help               help for run
  -d, --outputdir string   path to downloaded books to (default "books")
  -n, --stepsize int       number of books to request per query (default 50)
  -u, --useragent string   user agent used to identify to calibre hosts (default "demeter / v1")
  -w, --workers int        number of workers to concurrently download books (default 10)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].