All Projects → dwisiswant0 → galer

dwisiswant0 / galer

Licence: MIT License
A fast tool to fetch URLs from HTML attributes by crawl-in.

Programming Languages

shell
77523 projects
go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to galer

Jd mask robot
京东口罩库存监控爬虫(非selenium),扫码登录、查价、加购、下单、秒杀
Stars: ✭ 216 (+56.52%)
Mutual labels:  crawler, spider
Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (+71.74%)
Mutual labels:  crawler, spider
Chromium for spider
dynamic crawler for web vulnerability scanner
Stars: ✭ 220 (+59.42%)
Mutual labels:  crawler, spider
Jssoup
JavaScript + BeautifulSoup = JSSoup
Stars: ✭ 203 (+47.1%)
Mutual labels:  crawler, spider
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-65.22%)
Mutual labels:  crawler, spider
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+11157.25%)
Mutual labels:  crawler, spider
ZhengFang System Spider
🐛一只登录正方教务管理系统,爬取数据的小爬虫
Stars: ✭ 21 (-84.78%)
Mutual labels:  crawler, spider
Fooproxy
稳健高效的评分制-针对性- IP代理池 + API服务,可以自己插入采集器进行代理IP的爬取,针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库,支持MongoDB 4.0 使用 Python3.7(Scored IP proxy pool ,customise proxy data crawler can be added anytime)
Stars: ✭ 195 (+41.3%)
Mutual labels:  crawler, spider
crawler
A simple and flexible web crawler framework for java.
Stars: ✭ 20 (-85.51%)
Mutual labels:  crawler, spider
Magic google
Google search results crawler, get google search results that you need
Stars: ✭ 247 (+78.99%)
Mutual labels:  crawler, spider
Querylist
🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Stars: ✭ 2,392 (+1633.33%)
Mutual labels:  crawler, spider
slime
🍰 一个可视化的爬虫平台
Stars: ✭ 27 (-80.43%)
Mutual labels:  crawler, spider
Zhihuspider
多线程知乎用户爬虫,基于python3
Stars: ✭ 201 (+45.65%)
Mutual labels:  crawler, spider
Webvideobot
Web crawler.
Stars: ✭ 214 (+55.07%)
Mutual labels:  crawler, spider
Ok ip proxy pool
🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池
Stars: ✭ 196 (+42.03%)
Mutual labels:  crawler, spider
Laravel Crawler Detect
A Laravel wrapper for CrawlerDetect - the web crawler detection library
Stars: ✭ 227 (+64.49%)
Mutual labels:  crawler, spider
Marmot
💐Marmot | Web Crawler/HTTP protocol Download Package 🐭
Stars: ✭ 186 (+34.78%)
Mutual labels:  crawler, spider
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+37.68%)
Mutual labels:  crawler, spider
Fast Lianjia Crawler
直接通过链家 API 抓取数据的极速爬虫,宇宙最快~~ 🚀
Stars: ✭ 247 (+78.99%)
Mutual labels:  crawler, spider
arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-87.68%)
Mutual labels:  crawler, spider

galer

made-with-Go issues

             __
   __    _ _(_ )   __  _ __ 
 /'_ '\/'_' )| | /'__'( '__)
( (_) ( (_| || |(  ___| |
'\__  '\__,_(___'\____(_)
( )_) |
 \___/'  @dwisiswant0

A fast tool to fetch URLs from HTML attributes by crawl-in. Inspired by the @omespino Tweet, which is possible to extract src, href, url and action values by evaluating JavaScript through Chrome DevTools Protocol.


Resources

Installation

from Binary

The installation is easy. You can download a prebuilt binary from releases page, unpack and run! or with

▶ (sudo) curl -sSfL https://git.io/galer | sh -s -- -b /usr/local/bin

from Source

If you have go1.15+ compiler installed and configured:

▶ GO111MODULE=on go get github.com/dwisiswant0/galer

from GitHub

▶ git clone https://github.com/dwisiswant0/galer
▶ cd galer
▶ go build .
▶ (sudo) mv galer /usr/local/bin

Usage

Basic Usage

Simply, galer can be run with:

▶ galer -u "http://domain.tld"

Flags

▶ galer -h

galer

This will display help for the tool. Here are all the switches it supports.

Flag Description
-u, --url Target to fetches (single target URL or list)
-e, --extension Show only certain extensions (comma-separated, e.g. js,php)
-c, --concurrency Concurrency level (default: 50)
--in-scope Show in-scope URLs/same host only
-o, --output Save fetched URLs output into file
-t, --timeout Maximum time (seconds) allowed for connection (default: 60)
-s, --silent Silent mode (suppress an errors)
-v, --verbose Verbose mode show error details unless you weren't use silent
-h, --help Display its helps

Examples

Single URL

▶ galer -u "http://domain.tld"

URLs from list

▶ galer -u /path/to/urls.txt

from Stdin

▶ cat urls.txt | galer

In case you want to chained with other tools:

▶ subfinder -d domain.tld -silent | httpx -silent | galer

Library

godoc

You can use galer as library.

▶ go get github.com/dwisiswant0/galer/pkg/galer

For example:

package main

import (
	"fmt"

	"github.com/dwisiswant0/galer/pkg/galer"
)

func main() {
	cfg := &galer.Config{
		Timeout: 60,
	}
	cfg = galer.New(cfg)

	run, err := cfg.Crawl("https://twitter.com")
	if err != nil {
		panic(err)
	}

	for _, url := range run {
		fmt.Println(url)
	}
}

TODOs

  • Enable to set extra HTTP headers
  • Provide randomly User-Agent
  • Bypass headless browser
  • Add exception for specific extensions

Help & Bugs

contributions welcome

If you are still confused or found a bug, please open the issue. All bug reports are appreciated, some features have not been tested yet due to lack of free time.

License

license

galer released under MIT. See LICENSE for more details.

Version

Current version is 0.0.2 and still development.

Pronunciation

id_ID/gäˈlər/ — kalau galer jangan dicium baunya, langsung cuci tangan, bego!

Acknowledgement

  • Omar Espino for the idea, that's why this tool was made!
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].