All Projects → andreaskoch → Gargantua

andreaskoch / Gargantua

Licence: apache-2.0
The fast website crawler

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Gargantua

Axegrinder
Crawl websites for accessibility issues from the command line.
Stars: ✭ 12 (-65.71%)
Mutual labels:  command-line, crawler
Google Images Download
Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!
Stars: ✭ 7,815 (+22228.57%)
Mutual labels:  command-line
Universityrecruitment Ssurvey
用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?
Stars: ✭ 30 (-14.29%)
Mutual labels:  crawler
Jc
CLI tool and python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.
Stars: ✭ 967 (+2662.86%)
Mutual labels:  command-line
Autocrawler
Google, Naver multiprocess image web crawler (Selenium)
Stars: ✭ 957 (+2634.29%)
Mutual labels:  crawler
Clistats
A command line interface tool to compute statistics from a file or the command line.
Stars: ✭ 33 (-5.71%)
Mutual labels:  command-line
Yori
Yori is a CMD replacement shell that supports backquotes, job control, and improves tab completion, file matching, aliases, command history, and more.
Stars: ✭ 948 (+2608.57%)
Mutual labels:  command-line
Trash
macOS command line tool to move files to trash
Stars: ✭ 35 (+0%)
Mutual labels:  command-line
Wonders
🌈 Declarative JavaScript framework to build command-line applications.
Stars: ✭ 34 (-2.86%)
Mutual labels:  command-line
Douyin Crawler
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
Stars: ✭ 33 (-5.71%)
Mutual labels:  crawler
Verticalize
Simple tool to verticalize text delimited files.
Stars: ✭ 32 (-8.57%)
Mutual labels:  command-line
Vw Crawler
🐞简单轻便的Java爬虫框架,只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。
Stars: ✭ 32 (-8.57%)
Mutual labels:  crawler
Shell Functools
Functional programming tools for the shell
Stars: ✭ 971 (+2674.29%)
Mutual labels:  command-line
Ps Clone
A clone of unix ps program
Stars: ✭ 30 (-14.29%)
Mutual labels:  command-line
Ncrawler
Web Crawler written in C#
Stars: ✭ 34 (-2.86%)
Mutual labels:  crawler
Toutiaocrawler
头条号爬虫案例
Stars: ✭ 30 (-14.29%)
Mutual labels:  crawler
Leboncoin Crawler
Crawler for leboncoin.fr
Stars: ✭ 32 (-8.57%)
Mutual labels:  crawler
Nodespider
[DEPRECATED] Simple, flexible, delightful web crawler/spider package
Stars: ✭ 33 (-5.71%)
Mutual labels:  crawler
Ustbcrawlers
那些年,我爬过的北科。一个由浅入深的定向爬虫教程。
Stars: ✭ 35 (+0%)
Mutual labels:  crawler
Diskover
File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
Stars: ✭ 977 (+2691.43%)
Mutual labels:  crawler

「 gargantua 」

The fast website crawler

You can use「 gargantua 」to quickly and easily

  • warm-up your frontend caches
  • perform small load-tests against your publicly available pages
  • measure response times
  • detect broken links

from your command line on Linux, macOS and Windows.

Animation: gargantua v0.1.0 crawling a website

Note: Press Q to stop the current crawling process.

Usage

Crawl www.sitemaps.org with 5 concurrent workers:

gargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5

see also: A short introduction video of gargantua on YouTube

Customize the user-agent

You can specify a customized user agent using the --user-agent argument:

gargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5 --user-agent "gargantua bot / iPhone"

Log all requests

You can specify a log file with the --log argument:

gargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5 --log "gargantua.log"
Date and time       #worker   Status Code     Bytes   Response Time   URL                                                          Parent URL
2020/11/05 09:23:14 #001:     200             4403    148.759000ms    https://www.sitemaps.org                                     https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #002:     200             4403    290.536000ms    http://www.sitemaps.org/                                     https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #003:     200            45077    283.243000ms    https://www.sitemaps.org/protocol.html                       https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #004:     404             1245    155.376000ms    https://www.sitemaps.org/protocol.htm                        https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #005:     200             4403    155.577000ms    https://www.sitemaps.org/index.html                          https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #001:     200             2591    286.451000ms    http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd    https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #003:     200            10839    143.738000ms    https://www.sitemaps.org/terms.html                          https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #005:     200            15681    141.580000ms    https://www.sitemaps.org/faq.html                            https://www.sitemaps.org/ko/protocol.html
2020/11/05 09:23:14 #002:     404             1245    286.175000ms    http://www.sitemaps.org/protocol.htm                         https://www.sitemaps.org/ko/faq.html

gargantua.log

Download

You can download binaries for Linux, macOS and Windows from github.com »andreaskoch » gargantua » releases:

Linux:

curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_linux_amd64 -o gargantua
chmod +x gargantua

macOS:

curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_darwin_amd64 -o gargantua
chmod +x gargantua

Windows:

curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_windows_amd64 -o gargantua.exe

Docker Image

There is also a docker image that you can use to download or run the latest version of gargantua:

andreaskoch/gargantua

docker run --rm andreaskoch/gargantua:latest \
       crawl \
       --verbose \
       --url https://www.sitemaps.org/sitemap.xml \
       --workers 5

Note: You will need the --verbose flag in order to prevent the command-line UI from loading. Otherwise gargantua will fail.

Roadmap

  • Increase the number of workers at runtime
  • Silent mode (only show statistics at the end)
  • CSV mode (print CSV output to stdout)
  • Web-UI
  • Save downloaded data to disk

License

「 gargantua 」is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].