Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → qwertyuiop6 → Mm131

qwertyuiop6 / Mm131

MM131网站图片爬取 🚨

Programming Languages

139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Mm131

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

Stars: ✭ 1,549 (+1100.78%)

Mutual labels: crawler, spider

Golang爬虫爬取豆瓣电影Top250

Stars: ✭ 114 (-11.63%)

Mutual labels: crawler, spider

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-17.05%)

Mutual labels: crawler, spider

简单易用的Python爬虫框架，QQ交流群：597510560

Stars: ✭ 1,611 (+1148.84%)

Mutual labels: crawler, spider

Free proxy website

获取免费socks/https/http代理的网站集合

Stars: ✭ 119 (-7.75%)

Mutual labels: crawler, spider

Async Python 3.6+ web scraping micro-framework based on asyncio

Stars: ✭ 1,366 (+958.91%)

Mutual labels: crawler, spider

爬取北大法宝网http://www.pkulaw.cn/Case/

Stars: ✭ 113 (-12.4%)

Mutual labels: crawler, spider

Puppeteer Walker

a puppeteer walker 🕷 🕸

Stars: ✭ 78 (-39.53%)

Mutual labels: crawler, spider

APIs for loginning some websites by using requests.

Stars: ✭ 1,861 (+1342.64%)

Mutual labels: crawler, spider

Examples Of Web Crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Stars: ✭ 10,724 (+8213.18%)

Mutual labels: crawler, spider

抖音 SDK，数据采集，爬虫抓取不是梦

Stars: ✭ 99 (-23.26%)

Mutual labels: crawler, spider

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

Stars: ✭ 122 (-5.43%)

Mutual labels: crawler, spider

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

Stars: ✭ 98 (-24.03%)

Mutual labels: crawler, spider

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (+1073.64%)

Mutual labels: crawler, spider

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+865.89%)

Mutual labels: crawler, spider

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Stars: ✭ 105 (-18.6%)

Mutual labels: crawler, spider

python crawler spider

Stars: ✭ 70 (-45.74%)

Mutual labels: crawler, spider

Crawler examples

Some classic web crawler projects.一些经典的爬虫

Stars: ✭ 74 (-42.64%)

Mutual labels: crawler, spider

Bilibili member crawler

B站用户爬虫好耶~是爬虫

Stars: ✭ 115 (-10.85%)

Mutual labels: crawler, spider

Digger is a powerful and flexible web crawler implemented by pure golang

Stars: ✭ 130 (+0.78%)

Mutual labels: crawler, spider

View All Similar Projects ➔

MM131妹子图片批量下载爬虫py脚本

爬取网站:MM131

爬了2000套妹子图集　将近10万张，共8.5个G （图为我的腾讯云cos存储

最开始的版本其实是先解析页面再提取url链接逐个请求, 后来发现了图片的url规律： url变量只有末尾的: id/num

然后发现对req header请求头伪装一下UA用户代理和链接所在文档位置Referer 就可以直接就可以对图片进行请求,这就很舒服~

再配合上多进程+协程的一个库aiomultiprocess进行异步请求,concurrent包的futures线程池进行并发爬取,爬取速度效率大幅提升。

Usage:

1.安装依赖(Python3):

pip install -r requirements.txt

运行脚本,爬虫有两个版本
windows建议运行多线程版本: thread_mm131.py
~~linux/os x 运行多进程+协程版本: aio_mm131.py 或前者皆可~~

<=2019.3.23=>
更新依赖支持python3.7
<=2018 12.1=>
自动获取网站最新更新
终断下载后再次下载会继续上次的进度
自动选择不同系统合适的下载方法

只需

python main.py

来不及解释了，快上车！！

有问题可以提issue,欢迎老司机们 star,fork ~

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 129

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗