A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-95.78%)

Mutual labels: crawler, spider, scraper

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-99.63%)

Mutual labels: crawler, spider, scraping

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (-99.66%)

Mutual labels: scraper, spider, scraping

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (-90.25%)

Mutual labels: crawler, spider, crawling

diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

Stars: ✭ 53 (-99.66%)

Mutual labels: scraper, scraping, crawling

Newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Stars: ✭ 11,545 (-25.68%)

Mutual labels: crawler, scraper, crawling

papercut

Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.

Stars: ✭ 15 (-99.9%)

Mutual labels: crawler, scraper, scraping

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Stars: ✭ 48 (-99.69%)

Mutual labels: crawler, spider, crawling

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-99.89%)

Mutual labels: crawler, scraper, spider

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (-84.6%)

Mutual labels: crawler, spider, scraper

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-99.67%)

Mutual labels: scraper, scraping, crawling

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (-97.76%)

Mutual labels: crawler, spider, scraper

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (-97.79%)

Mutual labels: crawler, spider, scraper

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (-97.66%)

Mutual labels: crawler, spider, crawling

Creeper

🐾 Creeper - The Next Generation Crawler Framework (Go)

Stars: ✭ 762 (-95.09%)

Mutual labels: crawler, spider, framework

Fbcrawl

A Facebook crawler

Stars: ✭ 536 (-96.55%)

Mutual labels: crawler, spider, scraper

Easy Scraping Tutorial

Simple but useful Python web scraping tutorial code.

Stars: ✭ 583 (-96.25%)

Mutual labels: crawler, scraping, crawling

Newcrawler

Free Web Scraping Tool with Java

Stars: ✭ 589 (-96.21%)

Mutual labels: crawler, spider, scraping

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (-98.64%)

Mutual labels: crawler, scraper, scraping

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (-93.41%)

Mutual labels: spider, scraper, scraping

Crawler

A high performance web crawler in Elixir.

Stars: ✭ 781 (-94.97%)

Mutual labels: crawler, spider, scraper

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (-95.9%)

Mutual labels: crawler, scraper, crawling

Scrapit

Scraping scripts for various websites.

Stars: ✭ 25 (-99.84%)

Mutual labels: crawler, spider, scraper

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (-97.06%)

Mutual labels: scraper, scraping, crawling

scrapy facebooker

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Stars: ✭ 22 (-99.86%)

Mutual labels: scraper, spider, scraping

scrapy-distributed

A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-99.76%)

Mutual labels: spider, scraping, crawling

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (-73.76%)

Mutual labels: crawler, scraper, scraping

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-99.56%)

Mutual labels: crawler, spider, crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-99.36%)

Mutual labels: crawler, scraping, crawling

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-99.31%)

Mutual labels: crawler, spider, scraper

Awesome Crawler

A collection of awesome web crawler,spider in different languages

Stars: ✭ 4,793 (-69.15%)

Mutual labels: crawler, spider, scraper

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (-97.42%)

Mutual labels: crawler, spider, scraper

Avbook

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

Stars: ✭ 8,133 (-47.65%)

Mutual labels: crawler, spider, scraper

Webmagic

A scalable web crawler framework for Java.

Stars: ✭ 10,186 (-34.43%)

Mutual labels: crawler, scraping, framework

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (-98.78%)

Mutual labels: crawler, spider, scraper

Seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

Stars: ✭ 117 (-99.25%)

Mutual labels: scraper, scraping

Decryptlogin

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (-88.02%)

Mutual labels: crawler, spider

Free proxy website

获取免费socks/https/http代理的网站集合

Stars: ✭ 119 (-99.23%)

Mutual labels: crawler, spider

Pspider

简单易用的Python爬虫框架，QQ交流群：597510560

Stars: ✭ 1,611 (-89.63%)

Mutual labels: crawler, spider

Examples Of Web Crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Stars: ✭ 10,724 (-30.97%)

Mutual labels: crawler, spider

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (-88.88%)

Mutual labels: scraping, crawling

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-99.21%)

Mutual labels: crawler, spider

Digger

Digger is a powerful and flexible web crawler implemented by pure golang

Stars: ✭ 130 (-99.16%)

Mutual labels: crawler, spider

Weibo Topic Spider

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

Stars: ✭ 128 (-99.18%)

Mutual labels: crawler, spider

Mm131

MM131网站图片爬取 🚨

Stars: ✭ 129 (-99.17%)

Mutual labels: crawler, spider

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-99.26%)

Mutual labels: crawler, spider

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-99.2%)

Mutual labels: crawler, crawling

Udemycoursegrabber

Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!

Stars: ✭ 137 (-99.12%)

Mutual labels: scraper, scraping

1-60 of 2266 similar projects

›

next*5