cornelk / Goscrape
Licence: mit
Web scraper that can create an offline readable version of a website
Stars: ✭ 69
Labels
Projects that are alternatives of or similar to Goscrape
Botvid 19
Messenger Bot that scrapes for COVID-19 data and periodically updates subscribers via Facebook Messages. Created using Python/Flask, MYSQL, HTML, Heroku
Stars: ✭ 34 (-50.72%)
Mutual labels: scraper
Warta Scrap
Indonesia Index News Crawler, including 10 online media
Stars: ✭ 57 (-17.39%)
Mutual labels: scraper
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+11686.96%)
Mutual labels: scraper
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1384.06%)
Mutual labels: scraper
Pypatent
Search for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-55.07%)
Mutual labels: scraper
Repository.kodibae
Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]
Stars: ✭ 45 (-34.78%)
Mutual labels: scraper
Public Instagram
Tool to fetch Instagram's public content.
Stars: ✭ 43 (-37.68%)
Mutual labels: scraper
Pitchfork Npm
An Unofficial Pitchfork Music API client for Node.js
Stars: ✭ 50 (-27.54%)
Mutual labels: scraper
Bad Robo
🐙 Get Daily 400-500 Real Followers 👽 [BadRobo] is Best Instagram Bot Available Now with All Features!. Our BOT did not violate any of Instagram's rules, so you don't have to worry about getting ACTION BLOCK!
Stars: ✭ 59 (-14.49%)
Mutual labels: scraper
Real Estate Scraper
Web scraper that makes it easier to find real estate in Slovenia.
Stars: ✭ 31 (-55.07%)
Mutual labels: scraper
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-31.88%)
Mutual labels: scraper
Pitchfork
🎶 Unofficial python API for pitchfork.com reviews.
Stars: ✭ 67 (-2.9%)
Mutual labels: scraper
Anutimetable
Intuitive timetable builder for the Australian National University.
Stars: ✭ 52 (-24.64%)
Mutual labels: scraper
goscrape
A web scraper built with Golang. It downloads the content of a website or blog and allows you to read it offline.
Features and advantages over existing tools like wget, httrack, Teleport Pro:
- Free and open source
- Available for all platforms that Golang supports
- JPEG and PNG images can be converted down in quality to save disk space
- Excluded URLS will not be fetched (unlike wget)
- No incomplete temp files are left on disk
- Downloaded asset files are skipped in a new scraper run
- Assets from external domains are downloaded automatically
- Sane default values
Limitations:
- No GUI version, console only
Installation
You need to have Golang installed, otherwise follow the guide at https://golang.org/doc/install.
go get github.com/cornelk/goscrape
Usage
goscrape http://website.com
Options
Scrape a website and create an offline browsable version on the disk
Usage:
goscrape http://website.com [flags]
Flags:
--config string config file (default is $HOME/.goscrape.yaml)
-d, --depth uint download depth, 0 for unlimited (default 10)
-x, --exclude stringArray exclude URLs with PERL Regular Expressions support
-h, --help help for goscrape
-i, --imagequality int image quality, 0 to disable reencoding
-n, --include stringArray only include URLs with PERL Regular Expressions support
-o, --output string output directory to write files to
-t, --timeout uint time limit in seconds for each http request to connect and read the request body
-u, --user string user[:password] to use for authentication
-v, --verbose verbose output
Dependencies
- github.com/gorilla/css css file tokenizer
- github.com/h2non/filetype image format identification
- github.com/hashicorp/go-multierror multi error wrapping
- github.com/headzoo/surf virtual web browser
- github.com/PuerkitoBio/goquery HTML document traversal
- github.com/spf13/cobra command line handling
- github.com/spf13/viper configuration
- go.uber.org/zap logging
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].