稳健高效的评分制-针对性- IP代理池 + API服务，可以自己插入采集器进行代理IP的爬取，针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库，支持MongoDB 4.0 使用 Python3.7（Scored IP proxy pool ,customise proxy data crawler can be added anytime）

Stars: ✭ 195 (-17.72%)

Mutual labels: crawler, spider, mongodb

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (+53.59%)

Mutual labels: crawler, spider, puppeteer

Marionette

Selenium alternative for Crystal. Browser manipulation without the Java overhead.

Stars: ✭ 119 (-49.79%)

Mutual labels: puppeteer, headless

Decryptlogin

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (+685.23%)

Mutual labels: crawler, spider

Docs

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

Stars: ✭ 118 (-50.21%)

Mutual labels: crawler, mongodb

Wendigo

A proper monster for front-end automated testing

Stars: ✭ 121 (-48.95%)

Mutual labels: puppeteer, headless

Apiproject

[https://www.sofineday.com], golang项目开发脚手架,集成最佳实践(gin+gorm+go-redis+mongo+cors+jwt+json日志库zap(支持日志收集到kafka或mongo)+消息队列kafka+微信支付宝支付gopay+api加密+api反向代理+go modules依赖管理+headless爬虫chromedp+makefile+二进制压缩+livereload热加载)

Stars: ✭ 124 (-47.68%)

Mutual labels: spider, headless

Yspider

yspider -- 轻量级爬虫系统

Stars: ✭ 125 (-47.26%)

Mutual labels: spider, mongodb

Scrapy demo

all kinds of scrapy demo

Stars: ✭ 128 (-45.99%)

Mutual labels: spider, mongodb

Amazonbigspider

😱Full Automatic Amazon Distributed Spider | 亚马逊分布式四国际站采集选款产品|账号admin,密码adminadmin

Stars: ✭ 140 (-40.93%)

Mutual labels: crawler, spider

Crawler China Mainland Universities

中国大陆大学列表爬虫

Stars: ✭ 143 (-39.66%)

Mutual labels: crawler, spider

Rendora

dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites

Stars: ✭ 1,853 (+681.86%)

Mutual labels: crawler, puppeteer

Fp Server

Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器，基于Tornado和Scrapy，在本地搭建属于自己的代理池

Stars: ✭ 154 (-35.02%)

Mutual labels: spider, proxy

Weibo Topic Spider

微博超级话题爬虫，微博词频统计+情感分析+简单分类，新增肺炎超话爬取数据

Stars: ✭ 128 (-45.99%)

Mutual labels: crawler, spider

Google Meet Scheduler

😴 Attends classes for you.

Stars: ✭ 150 (-36.71%)

Mutual labels: puppeteer, headless

Secret Agent

The web browser that's built for scraping.

Stars: ✭ 151 (-36.29%)

Mutual labels: proxy, puppeteer

Yispider

一款分布式爬虫平台，帮助你更好的管理和开发爬虫。内置一套爬虫定义规则（模版），可使用模版快速定义爬虫，也可当作框架手动开发爬虫。(兴趣使然的项目，用的不爽了就更新)

Stars: ✭ 158 (-33.33%)

Mutual labels: crawler, spider

Gain

Web crawling framework based on asyncio.

Stars: ✭ 2,002 (+744.73%)

Mutual labels: crawler, spider

Laravel Crawler Detect

A Laravel wrapper for CrawlerDetect - the web crawler detection library

Stars: ✭ 227 (-4.22%)

Mutual labels: crawler, spider

Examples Of Web Crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Stars: ✭ 10,724 (+4424.89%)

Mutual labels: crawler, spider

Proxybroker

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭

Stars: ✭ 2,767 (+1067.51%)

Mutual labels: crawler, proxy

Ncov2019 data crawler

疫情数据爬虫，2019新型冠状病毒数据仓库，轨迹数据，同乘数据，报道

Stars: ✭ 175 (-26.16%)

Mutual labels: crawler, spider

Baiducrawler

Sample of using proxies to crawl baidu search results.

Stars: ✭ 116 (-51.05%)

Mutual labels: crawler, proxy

Crawlab Lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-48.52%)

Mutual labels: crawler, spider

Pspider

简单易用的Python爬虫框架，QQ交流群：597510560

Stars: ✭ 1,611 (+579.75%)

Mutual labels: crawler, spider

Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

Stars: ✭ 125 (-47.26%)

Mutual labels: crawler, puppeteer

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-51.48%)

Mutual labels: crawler, spider

Go spider

[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

Stars: ✭ 1,745 (+636.29%)

Mutual labels: crawler, spider

Mm131

MM131网站图片爬取 🚨

Stars: ✭ 129 (-45.57%)

Mutual labels: crawler, spider

Reaction

Mailchimp Open Commerce is an API-first, headless commerce platform built using Node.js, React, GraphQL. Deployed via Docker and Kubernetes.

Stars: ✭ 11,588 (+4789.45%)

Mutual labels: mongodb, headless

Digger

Digger is a powerful and flexible web crawler implemented by pure golang