All Projects → cornelk → Goscrape

cornelk / Goscrape

Licence: mit
Web scraper that can create an offline readable version of a website

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Labels

Projects that are alternatives of or similar to Goscrape

Botvid 19
Messenger Bot that scrapes for COVID-19 data and periodically updates subscribers via Facebook Messages. Created using Python/Flask, MYSQL, HTML, Heroku
Stars: ✭ 34 (-50.72%)
Mutual labels:  scraper
Karate
Webscraper
Stars: ✭ 45 (-34.78%)
Mutual labels:  scraper
Warta Scrap
Indonesia Index News Crawler, including 10 online media
Stars: ✭ 57 (-17.39%)
Mutual labels:  scraper
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+11686.96%)
Mutual labels:  scraper
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1384.06%)
Mutual labels:  scraper
Scrapstagram
An Instagram Scrapper
Stars: ✭ 50 (-27.54%)
Mutual labels:  scraper
Pypatent
Search for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-55.07%)
Mutual labels:  scraper
Pastebin Scraper
Live-scraping pastebin to fight boredom.
Stars: ✭ 66 (-4.35%)
Mutual labels:  scraper
Repository.kodibae
Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]
Stars: ✭ 45 (-34.78%)
Mutual labels:  scraper
Tangerine
Tangerine Bank scraper
Stars: ✭ 54 (-21.74%)
Mutual labels:  scraper
Chinese Xinhua
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Stars: ✭ 8,705 (+12515.94%)
Mutual labels:  scraper
Public Instagram
Tool to fetch Instagram's public content.
Stars: ✭ 43 (-37.68%)
Mutual labels:  scraper
Pitchfork Npm
An Unofficial Pitchfork Music API client for Node.js
Stars: ✭ 50 (-27.54%)
Mutual labels:  scraper
Serp
Google Search SERP Scraper
Stars: ✭ 40 (-42.03%)
Mutual labels:  scraper
Bad Robo
🐙 Get Daily 400-500 Real Followers 👽 [BadRobo] is Best Instagram Bot Available Now with All Features!. Our BOT did not violate any of Instagram's rules, so you don't have to worry about getting ACTION BLOCK!
Stars: ✭ 59 (-14.49%)
Mutual labels:  scraper
Real Estate Scraper
Web scraper that makes it easier to find real estate in Slovenia.
Stars: ✭ 31 (-55.07%)
Mutual labels:  scraper
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-31.88%)
Mutual labels:  scraper
Pitchfork
🎶 Unofficial python API for pitchfork.com reviews.
Stars: ✭ 67 (-2.9%)
Mutual labels:  scraper
Scrape
Distributed Scraper
Stars: ✭ 65 (-5.8%)
Mutual labels:  scraper
Anutimetable
Intuitive timetable builder for the Australian National University.
Stars: ✭ 52 (-24.64%)
Mutual labels:  scraper

goscrape Build Status GoDoc Go Report Card codecov

A web scraper built with Golang. It downloads the content of a website or blog and allows you to read it offline.

Features and advantages over existing tools like wget, httrack, Teleport Pro:

  • Free and open source
  • Available for all platforms that Golang supports
  • JPEG and PNG images can be converted down in quality to save disk space
  • Excluded URLS will not be fetched (unlike wget)
  • No incomplete temp files are left on disk
  • Downloaded asset files are skipped in a new scraper run
  • Assets from external domains are downloaded automatically
  • Sane default values

Limitations:

  • No GUI version, console only

Installation

You need to have Golang installed, otherwise follow the guide at https://golang.org/doc/install.

go get github.com/cornelk/goscrape

Usage

goscrape http://website.com

Options

Scrape a website and create an offline browsable version on the disk

Usage:
  goscrape http://website.com [flags]

Flags:
      --config string         config file (default is $HOME/.goscrape.yaml)
  -d, --depth uint            download depth, 0 for unlimited (default 10)
  -x, --exclude stringArray   exclude URLs with PERL Regular Expressions support
  -h, --help                  help for goscrape
  -i, --imagequality int      image quality, 0 to disable reencoding
  -n, --include stringArray   only include URLs with PERL Regular Expressions support
  -o, --output string         output directory to write files to
  -t, --timeout uint          time limit in seconds for each http request to connect and read the request body
  -u, --user string           user[:password] to use for authentication
  -v, --verbose               verbose output

Dependencies

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].