Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jaeles-project → Gospider

jaeles-project / Gospider

Licence: mit

Gospider - Fast web spider written in Go

Programming Languages

31211 projects - #10 most used programming language

Labels

crawler spider bugbounty

Projects that are alternatives of or similar to Gospider

Go jobs

带你了解一下Golang的市场行情

Stars: ✭ 526 (-32.99%)

Mutual labels: crawler, spider

Netdiscovery

NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。

Stars: ✭ 573 (-27.01%)

Mutual labels: crawler, spider

Xsrfprobe

The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.

Stars: ✭ 532 (-32.23%)

Mutual labels: crawler, spider

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (+510.57%)

Mutual labels: crawler, spider

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (-2.93%)

Mutual labels: crawler, spider

Haipproxy

💖 High available distributed ip proxy pool, powerd by Scrapy and Redis

Stars: ✭ 4,993 (+536.05%)

Mutual labels: crawler, spider

Xxl Crawler

A distributed web crawler framework.（分布式爬虫框架XXL-CRAWLER）

Stars: ✭ 561 (-28.54%)

Mutual labels: crawler, spider

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (-48.92%)

Mutual labels: crawler, spider

Baiduimagespider

一个超级轻量的百度图片爬虫

Stars: ✭ 591 (-24.71%)

Mutual labels: crawler, spider

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (-24.97%)

Mutual labels: crawler, spider

Learnpython

Python的基础练习代码与各种爬虫代码

Stars: ✭ 451 (-42.55%)

Mutual labels: crawler, spider

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-16.43%)

Mutual labels: crawler, spider

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (-43.95%)

Mutual labels: crawler, spider

Grab Site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stars: ✭ 680 (-13.38%)

Mutual labels: crawler, spider

Html2article

Html网页正文提取

Stars: ✭ 441 (-43.82%)

Mutual labels: crawler, spider

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (-31.72%)

Mutual labels: crawler, spider

Signature algorithm

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

Stars: ✭ 380 (-51.59%)

Mutual labels: crawler, spider

Bilili

🍻 bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Stars: ✭ 379 (-51.72%)

Mutual labels: crawler, spider

Douyin

API of DouYin for Humans used to Crawl Popular Videos and Musics

Stars: ✭ 580 (-26.11%)

Mutual labels: crawler, spider

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (-19.87%)

Mutual labels: crawler, spider

View All Similar Projects ➔

GoSpider

GoSpider - Fast web spider written in Go

Painless integrate Gospider into your recon workflow?

Enjoying this tool? Support it's development and take your game to the next level by using HunterSuite.io

Installation

go get -u github.com/jaeles-project/gospider

Features

Fast web crawling
Brute force and parse sitemap.xml
Parse robots.txt
Generate and verify link from JavaScript files
Link Finder
Find AWS-S3 from response source
Find subdomains from response source
Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault
Format output easy to Grep
Support Burp input
Crawl multiple sites in parallel
Random mobile/web User-Agent

Showcases

Usage

Fast web spider written in Go - v1.1.2 by @thebl4ckturtle & @j3ssiejjj

Usage:
  gospider [flags]

Flags:
  -s, --site string            Site to crawl
  -S, --sites string           Site list to crawl
  -p, --proxy string           Proxy (Ex: http://127.0.0.1:8080)
  -o, --output string          Output folder
  -u, --user-agent string      User Agent to use
                               	web: random web user-agent
                               	mobi: random mobile user-agent
                               	or you can set your special user-agent (default "web")
      --cookie string          Cookie to use (testA=a; testB=b)
  -H, --header stringArray     Header to use (Use multiple flag to set multiple header)
      --burp string            Load headers and cookie from burp raw http request
      --blacklist string       Blacklist URL Regex
  -t, --threads int            Number of threads (Run sites in parallel) (default 1)
  -c, --concurrent int         The number of the maximum allowed concurrent requests of the matching domains (default 5)
  -d, --depth int              MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
  -k, --delay int              Delay is the duration to wait before creating a new request to the matching domains (second)
  -K, --random-delay int       RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
  -m, --timeout int            Request timeout (second) (default 10)
  -B, --base                   Disable all and only use HTML content
      --js                     Enable linkfinder in javascript file (default true)
      --sitemap                Try to crawl sitemap.xml
      --robots                 Try to crawl robots.txt (default true)
  -a, --other-source           Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
  -w, --include-subs           Include subdomains crawled from 3rd party. Default is main domain
  -r, --include-other-source   Also include other-source's urls (still crawl and request)
      --debug                  Turn on debug mode
  -v, --verbose                Turn on verbose
  -q, --quiet                  Suppress all the output and only show URL
      --no-redirect            Disable redirect
      --version                Check version
  -h, --help                   help for gospider

Example commands

Quite output

gospider -q -s "https://google.com/"

Run with single site

gospider -s "https://google.com/" -o output -c 10 -d 1

Run with site list

gospider -S sites.txt -o output -c 10 -d 1

Run with 20 sites at the same time with 10 bot each site

gospider -S sites.txt -o output -c 10 -d 1 -t 20

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs

Use custom header/cookies

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt

Blacklist url/file extension.

P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico) as default

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"

License

Gospider is made with ♥ by @j3ssiejjj & @thebl4ckturtle and it is released under the MIT license.

Donation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 785

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (23) 🔗