All Projects → qwertyuiop6 → Mm131

qwertyuiop6 / Mm131

MM131网站图片爬取 🚨

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mm131

Crawler Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Stars: ✭ 1,549 (+1100.78%)
Mutual labels:  crawler, spider
Douban Movie
Golang爬虫 爬取豆瓣电影Top250
Stars: ✭ 114 (-11.63%)
Mutual labels:  crawler, spider
Not Your Average Web Crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
Stars: ✭ 107 (-17.05%)
Mutual labels:  crawler, spider
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+1148.84%)
Mutual labels:  crawler, spider
Free proxy website
获取免费socks/https/http代理的网站集合
Stars: ✭ 119 (-7.75%)
Mutual labels:  crawler, spider
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+958.91%)
Mutual labels:  crawler, spider
Pkulaw spider
爬取北大法宝网http://www.pkulaw.cn/Case/
Stars: ✭ 113 (-12.4%)
Mutual labels:  crawler, spider
Puppeteer Walker
a puppeteer walker 🕷 🕸
Stars: ✭ 78 (-39.53%)
Mutual labels:  crawler, spider
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1342.64%)
Mutual labels:  crawler, spider
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (+8213.18%)
Mutual labels:  crawler, spider
Douyinsdk
抖音 SDK,数据采集,爬虫抓取不是梦
Stars: ✭ 99 (-23.26%)
Mutual labels:  crawler, spider
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-5.43%)
Mutual labels:  crawler, spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-24.03%)
Mutual labels:  crawler, spider
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1073.64%)
Mutual labels:  crawler, spider
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+865.89%)
Mutual labels:  crawler, spider
Baiduspider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 105 (-18.6%)
Mutual labels:  crawler, spider
Spider
python crawler spider
Stars: ✭ 70 (-45.74%)
Mutual labels:  crawler, spider
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Stars: ✭ 74 (-42.64%)
Mutual labels:  crawler, spider
Bilibili member crawler
B站用户爬虫 好耶~是爬虫
Stars: ✭ 115 (-10.85%)
Mutual labels:  crawler, spider
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (+0.78%)
Mutual labels:  crawler, spider

MM131妹子图片批量下载爬虫py脚本

爬取网站:MM131

爬了2000套妹子图集 将近10万张,共8.5个G (图为我的腾讯云cos存储

最开始的版本其实是先解析页面再提取url链接逐个请求, 后来发现了图片的url规律: url变量只有末尾的: id/num

然后发现对req header请求头伪装一下UA用户代理和链接所在文档位置Referer 就可以直接就可以对图片进行请求,这就很舒服~

再配合上多进程+协程的一个库aiomultiprocess进行异步请求,concurrent包的futures线程池进行并发爬取,爬取速度效率大幅提升。

Usage:

1.安装依赖(Python3):

pip install -r requirements.txt

运行脚本,爬虫有两个版本
windows建议 运行多线程版本: thread_mm131.py
linux/os x 运行 多进程+协程版本: aio_mm131.py 或前者皆可

  • <=2019.3.23=>
  • 更新依赖支持python3.7
  • <=2018 12.1=>
  • 自动获取网站最新更新
  • 终断下载后再次下载会继续上次的进度
  • 自动选择不同系统合适的下载方法

只需

python main.py

来不及解释了,快上车!!

有问题可以提issue,欢迎老司机们 star,fork ~

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].