A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

✭ 306

javascript crawler robot sitemap web-crawler

Toapi

Every web site provides APIs.

✭ 3,209

python html api json web flask crawler spider toapi

Go Dork

The fastest dork scanner written in Go.

✭ 274

go golang security crawler infosec bugbounty vulnerability-scanners

Hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

✭ 295

html parser crawler xml fast scraper html-parser xml-parser

Ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

✭ 293

javascript github data crawler github-api

Python Automation Scripts

Simple yet powerful automation stuffs.

✭ 292

python pdf crawler instagram images selenium-webdriver pdf-converter instagram-scraper beautifulsoup

Weixin Spider

微信公众号爬虫，公众号历史文章，文章评论，文章阅读及在看数据，可视化web页面，可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现，高效微信爬虫，微信公众号爬虫，历史文章，文章评论，数据更新。

✭ 287

python python3 api flask wechat crawler spider weixin article mitmproxy

Sasila

一个灵活、友好的爬虫框架

✭ 286

python framework http crawler scraping requests crawling

Gospider

golang实现的爬虫框架，使用者只需关心页面规则，提供web管理界面。基于colly开发。

✭ 285

go golang crawler spider

Crawlertutorial

爬蟲極簡教學（fetch, parse, search, multiprocessing, API）- PTT 為例

✭ 282

python api tutorial search crawler spider parse api-wrapper multiprocessing

Scrapy Crawlera

Crawlera middleware for Scrapy

✭ 281

python plugin proxy crawler scrapy scraping

Dotnetspider

DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework

✭ 3,233

C#HTML CSS javascript TSQL shell cross-platform crawler distributed dotnetcore

Hacker News Digest

📰 A responsive interface of Hacker News with summaries and thumbnails.

✭ 278

python html machine-learning crawler spider rss article content hacker-news topic

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

✭ 277

go elasticsearch crawler spider lightweight scraping web-scraping crawling web-crawler

Sitemap Generator

Easily create XML sitemaps for your website.

✭ 273

javascript google crawler seo sitemap

Rcrawler

An R web crawler and scraper

✭ 274

r crawler scraper webscraping

Arachni

Web Application Security Scanner Framework

✭ 2,942

javascript ruby HTML hacking crawler penetration-testing detection scanner analysis dom security-audit modular web-application xss audit vulnerability-detection sql-injection arachni scanners

Line Bot Tutorial

line-bot-tutorial use python flask

✭ 267

python tutorial bot crawler heroku line

Bt Btt

磁力網站U3C3介紹以及域名更新

✭ 261

crawler spider download bittorrent tracker magnet magnet-link

Weibo terminator workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

✭ 259

python nlp crawler scraper sentiment-analysis

Tumblr crawler

This is a Multi-thread crawler for Tumblr.

✭ 258

python crawler tumblr

Spidy

The simple, easy to use command line web crawler.

✭ 257

python python3 crawler crawling web-crawler

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

✭ 1,514

PHP SCSS Less Smarty PLpgSQL crawler spider crawling webcrawler

lightnovel epub

🍭 epub generator for (light)novels (轻) 小说 epub 生成器，支持站点：轻之国度、轻小说文库

✭ 89

python cli opencv crawler scraper ebook lightnovel epub novel uiautomator lk wenku8

galer

A fast tool to fetch URLs from HTML attributes by crawl-in.

✭ 138

shell go crawler spider extractor url-parser devtool url-extractor galer waybackurls

octopus

Recursive and multi-threaded broken link checker

✭ 19

javascript crawler links checker broken

301-360 of 615 crawler projects