jaeles-project / Gospider
Licence: mit
Gospider - Fast web spider written in Go
Stars: ✭ 785
Programming Languages
go
31211 projects - #10 most used programming language
Projects that are alternatives of or similar to Gospider
Netdiscovery
NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。
Stars: ✭ 573 (-27.01%)
Mutual labels: crawler, spider
Xsrfprobe
The Prime Cross Site Request Forgery (CSRF) Audit and Exploitation Toolkit.
Stars: ✭ 532 (-32.23%)
Mutual labels: crawler, spider
Awesome Crawler
A collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (+510.57%)
Mutual labels: crawler, spider
Creeper
🐾 Creeper - The Next Generation Crawler Framework (Go)
Stars: ✭ 762 (-2.93%)
Mutual labels: crawler, spider
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+536.05%)
Mutual labels: crawler, spider
Xxl Crawler
A distributed web crawler framework.(分布式爬虫框架XXL-CRAWLER)
Stars: ✭ 561 (-28.54%)
Mutual labels: crawler, spider
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (-16.43%)
Mutual labels: crawler, spider
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-43.95%)
Mutual labels: crawler, spider
Grab Site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Stars: ✭ 680 (-13.38%)
Mutual labels: crawler, spider
Signature algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Stars: ✭ 380 (-51.59%)
Mutual labels: crawler, spider
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (-51.72%)
Mutual labels: crawler, spider
Douyin
API of DouYin for Humans used to Crawl Popular Videos and Musics
Stars: ✭ 580 (-26.11%)
Mutual labels: crawler, spider
Icrawler
A multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (-19.87%)
Mutual labels: crawler, spider
GoSpider
GoSpider - Fast web spider written in Go
Painless integrate Gospider into your recon workflow?
Enjoying this tool? Support it's development and take your game to the next level by using HunterSuite.io
Installation
go get -u github.com/jaeles-project/gospider
Features
- Fast web crawling
- Brute force and parse sitemap.xml
- Parse robots.txt
- Generate and verify link from JavaScript files
- Link Finder
- Find AWS-S3 from response source
- Find subdomains from response source
- Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault
- Format output easy to Grep
- Support Burp input
- Crawl multiple sites in parallel
- Random mobile/web User-Agent
Showcases
Usage
Fast web spider written in Go - v1.1.2 by @thebl4ckturtle & @j3ssiejjj
Usage:
gospider [flags]
Flags:
-s, --site string Site to crawl
-S, --sites string Site list to crawl
-p, --proxy string Proxy (Ex: http://127.0.0.1:8080)
-o, --output string Output folder
-u, --user-agent string User Agent to use
web: random web user-agent
mobi: random mobile user-agent
or you can set your special user-agent (default "web")
--cookie string Cookie to use (testA=a; testB=b)
-H, --header stringArray Header to use (Use multiple flag to set multiple header)
--burp string Load headers and cookie from burp raw http request
--blacklist string Blacklist URL Regex
-t, --threads int Number of threads (Run sites in parallel) (default 1)
-c, --concurrent int The number of the maximum allowed concurrent requests of the matching domains (default 5)
-d, --depth int MaxDepth limits the recursion depth of visited URLs. (Set it to 0 for infinite recursion) (default 1)
-k, --delay int Delay is the duration to wait before creating a new request to the matching domains (second)
-K, --random-delay int RandomDelay is the extra randomized duration to wait added to Delay before creating a new request (second)
-m, --timeout int Request timeout (second) (default 10)
-B, --base Disable all and only use HTML content
--js Enable linkfinder in javascript file (default true)
--sitemap Try to crawl sitemap.xml
--robots Try to crawl robots.txt (default true)
-a, --other-source Find URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
-w, --include-subs Include subdomains crawled from 3rd party. Default is main domain
-r, --include-other-source Also include other-source's urls (still crawl and request)
--debug Turn on debug mode
-v, --verbose Turn on verbose
-q, --quiet Suppress all the output and only show URL
--no-redirect Disable redirect
--version Check version
-h, --help help for gospider
Example commands
Quite output
gospider -q -s "https://google.com/"
Run with single site
gospider -s "https://google.com/" -o output -c 10 -d 1
Run with site list
gospider -S sites.txt -o output -c 10 -d 1
Run with 20 sites at the same time with 10 bot each site
gospider -S sites.txt -o output -c 10 -d 1 -t 20
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs
Use custom header/cookies
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt
Blacklist url/file extension.
P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico)
as default
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"
License
Gospider
is made with ♥ by @j3ssiejjj & @thebl4ckturtle and it is released under the MIT license.
Donation
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].