Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cornelk → Goscrape

cornelk / Goscrape

Licence: mit

Web scraper that can create an offline readable version of a website

Programming Languages

31211 projects - #10 most used programming language

golang

3204 projects

Labels

scraper

Projects that are alternatives of or similar to Goscrape

Botvid 19

Messenger Bot that scrapes for COVID-19 data and periodically updates subscribers via Facebook Messages. Created using Python/Flask, MYSQL, HTML, Heroku

Stars: ✭ 34 (-50.72%)

Mutual labels: scraper

Karate

Webscraper

Stars: ✭ 45 (-34.78%)

Mutual labels: scraper

Warta Scrap

Indonesia Index News Crawler, including 10 online media

Stars: ✭ 57 (-17.39%)

Mutual labels: scraper

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (+11686.96%)

Mutual labels: scraper

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+1384.06%)

Mutual labels: scraper

Scrapstagram

An Instagram Scrapper

Stars: ✭ 50 (-27.54%)

Mutual labels: scraper

Pypatent

Search for and retrieve US Patent and Trademark Office Patent Data

Stars: ✭ 31 (-55.07%)

Mutual labels: scraper

Pastebin Scraper

Live-scraping pastebin to fight boredom.

Stars: ✭ 66 (-4.35%)

Mutual labels: scraper

Repository.kodibae

Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]

Stars: ✭ 45 (-34.78%)

Mutual labels: scraper

Tangerine

Tangerine Bank scraper

Stars: ✭ 54 (-21.74%)

Mutual labels: scraper

Chinese Xinhua

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Stars: ✭ 8,705 (+12515.94%)

Mutual labels: scraper

Public Instagram

Tool to fetch Instagram's public content.

Stars: ✭ 43 (-37.68%)

Mutual labels: scraper

Pitchfork Npm

An Unofficial Pitchfork Music API client for Node.js

Stars: ✭ 50 (-27.54%)

Mutual labels: scraper

Serp

Google Search SERP Scraper

Stars: ✭ 40 (-42.03%)

Mutual labels: scraper

Bad Robo

🐙 Get Daily 400-500 Real Followers 👽 [BadRobo] is Best Instagram Bot Available Now with All Features!. Our BOT did not violate any of Instagram's rules, so you don't have to worry about getting ACTION BLOCK!

Stars: ✭ 59 (-14.49%)

Mutual labels: scraper

Real Estate Scraper

Web scraper that makes it easier to find real estate in Slovenia.

Stars: ✭ 31 (-55.07%)

Mutual labels: scraper

Social Scraper

Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt

Stars: ✭ 47 (-31.88%)

Mutual labels: scraper

Pitchfork

🎶 Unofficial python API for pitchfork.com reviews.

Stars: ✭ 67 (-2.9%)

Mutual labels: scraper

Scrape

Distributed Scraper

Stars: ✭ 65 (-5.8%)

Mutual labels: scraper

Anutimetable

Intuitive timetable builder for the Australian National University.

Stars: ✭ 52 (-24.64%)

Mutual labels: scraper

View All Similar Projects ➔

goscrape

A web scraper built with Golang. It downloads the content of a website or blog and allows you to read it offline.

Features and advantages over existing tools like wget, httrack, Teleport Pro:

Free and open source
Available for all platforms that Golang supports
JPEG and PNG images can be converted down in quality to save disk space
Excluded URLS will not be fetched (unlike wget)
No incomplete temp files are left on disk
Downloaded asset files are skipped in a new scraper run
Assets from external domains are downloaded automatically
Sane default values

Limitations:

No GUI version, console only

Installation

You need to have Golang installed, otherwise follow the guide at https://golang.org/doc/install.

go get github.com/cornelk/goscrape

Usage

goscrape http://website.com

Options

Scrape a website and create an offline browsable version on the disk

Usage:
  goscrape http://website.com [flags]

Flags:
      --config string         config file (default is $HOME/.goscrape.yaml)
  -d, --depth uint            download depth, 0 for unlimited (default 10)
  -x, --exclude stringArray   exclude URLs with PERL Regular Expressions support
  -h, --help                  help for goscrape
  -i, --imagequality int      image quality, 0 to disable reencoding
  -n, --include stringArray   only include URLs with PERL Regular Expressions support
  -o, --output string         output directory to write files to
  -t, --timeout uint          time limit in seconds for each http request to connect and read the request body
  -u, --user string           user[:password] to use for authentication
  -v, --verbose               verbose output

Dependencies

github.com/gorilla/css css file tokenizer
github.com/h2non/filetype image format identification
github.com/hashicorp/go-multierror multi error wrapping
github.com/headzoo/surf virtual web browser
github.com/PuerkitoBio/goquery HTML document traversal
github.com/spf13/cobra command line handling
github.com/spf13/viper configuration
go.uber.org/zap logging

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 69

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗