Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → kabelsea → go-scrapy

kabelsea / go-scrapy

Licence: other

Web crawling and scraping framework for Golang

Programming Languages

31211 projects - #10 most used programming language

30231 projects

Labels

crawler framework scraping crawling

Projects that are alternatives of or similar to go-scrapy

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

Stars: ✭ 198 (+1064.71%)

Mutual labels: scraping, crawling

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (+205.88%)

Mutual labels: scraping, crawling

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+91282.35%)

Mutual labels: scraping, crawling

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+248976.47%)

Mutual labels: scraping, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (+211.76%)

Mutual labels: scraping, crawling

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (+10064.71%)

Mutual labels: scraping, crawling

An open source web crawling platform

Stars: ✭ 21 (+23.53%)

Mutual labels: scraping, crawling

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (+3329.41%)

Mutual labels: scraping, crawling

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (+117.65%)

Mutual labels: scraping, crawling

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+623.53%)

Mutual labels: scraping, crawling

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (+488.24%)

Mutual labels: scraping, crawling

crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

Stars: ✭ 22 (+29.41%)

Mutual labels: scraping, crawling

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

Stars: ✭ 98 (+476.47%)

Mutual labels: scraping, crawling

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (+905.88%)

Mutual labels: scraping, crawling

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+4541.18%)

Mutual labels: scraping, crawling

Distributed crawling framework for documents and structured data.

Stars: ✭ 248 (+1358.82%)

Mutual labels: scraping, crawling

Declarative web scraping

Stars: ✭ 4,837 (+28352.94%)

Mutual labels: scraping, crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+30070.59%)

Mutual labels: scraping, crawling

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

Stars: ✭ 42 (+147.06%)

Mutual labels: scraping, crawling

scrapy-fieldstats

A Scrapy extension to log items coverage when the spider shuts down

Stars: ✭ 17 (+0%)

Mutual labels: scraping, crawling

View All Similar Projects ➔

go-scrapy

A scrapy implementation in Go. (Work in progres)

Overview

go-scrapy is a very useful and productive web crawlign framework, used to crawl websites and extract structured data from parsed pages.

Requirements

Golang 1.x - 1.9.x
Works on Linux, Windows, Mac OSX, BSD

Installation

Install:

go get github.com/kabelsea/go-scrapy

Import:

import scrapy "github.com/kabelsea/go-scrapy/scrapy"

Quickstart

func main() {
  // Init spider configuration
  config := &scrapy.SpiderConfig{
    Name:               "HabraBot",
    MaxDepth:           5,
    ConcurrentRequests: 20,
    StartUrls: []string{
      "https://habrahabr.ru/",
    },
    Rules: []scrapy.Rule{
      {
        LinkExtractor: &scrapy.LinkExtractor{
          Allow:        []string{`^/post/\d+/$`},
          AllowDomains: []string{`^habrahabr\.ru`},
        },
        Follow: true,
      },
      {
        LinkExtractor: &scrapy.LinkExtractor{
          Allow:        []string{`^/users/[^/]+/$`},
          AllowDomains: []string{`^habrahabr\.ru`},
        },
        Handler: ProcessItem,
      },
    },
  }

  // Create new spider
  spider, err := scrapy.NewSpider(config)
  if err != nil {
    panic(err)
  }

  // Run spider and wait
  spider.Wait()
}

// Process crawled page
func ProcessItem(resp *scrapy.Response) {
  log.Println("Process item:", resp.Url, resp.StatusCode)
}

Howto

Please go through examples to get an idea how to use this package.

Roadmap

Middlewares
More examples
Tests

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 17

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗