qwertyuiop6 / Mm131
MM131网站图片爬取 🚨
Stars: ✭ 129
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Mm131
Crawler Detect
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
Stars: ✭ 1,549 (+1100.78%)
Mutual labels: crawler, spider
Not Your Average Web Crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
Stars: ✭ 107 (-17.05%)
Mutual labels: crawler, spider
Ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
Stars: ✭ 1,366 (+958.91%)
Mutual labels: crawler, spider
Decryptlogin
APIs for loginning some websites by using requests.
Stars: ✭ 1,861 (+1342.64%)
Mutual labels: crawler, spider
Examples Of Web Crawlers
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Stars: ✭ 10,724 (+8213.18%)
Mutual labels: crawler, spider
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-5.43%)
Mutual labels: crawler, spider
Gopa Abandoned
GOPA, a spider written in Go.(NOTE: this project moved to https://github.com/infinitbyte/gopa )
Stars: ✭ 98 (-24.03%)
Mutual labels: crawler, spider
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1073.64%)
Mutual labels: crawler, spider
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+865.89%)
Mutual labels: crawler, spider
Baiduspider
BaiduSpider,一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
Stars: ✭ 105 (-18.6%)
Mutual labels: crawler, spider
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Stars: ✭ 74 (-42.64%)
Mutual labels: crawler, spider
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (+0.78%)
Mutual labels: crawler, spider
MM131妹子图片批量下载爬虫py脚本
爬取网站:MM131
爬了2000套妹子图集 将近10万张,共8.5个G (图为我的腾讯云cos存储
最开始的版本其实是先解析页面再提取url链接逐个请求, 后来发现了图片的url规律: url变量只有末尾的: id/num
然后发现对req header请求头伪装一下UA用户代理和链接所在文档位置Referer 就可以直接就可以对图片进行请求,这就很舒服~
再配合上多进程+协程的一个库aiomultiprocess进行异步请求,concurrent包的futures线程池进行并发爬取,爬取速度效率大幅提升。
Usage:
1.安装依赖(Python3):
pip install -r requirements.txt
运行脚本,爬虫有两个版本
windows建议 运行多线程版本: thread_mm131.py
linux/os x 运行 多进程+协程版本: aio_mm131.py 或前者皆可
- <=2019.3.23=>
- 更新依赖支持python3.7
- <=2018 12.1=>
- 自动获取网站最新更新
- 终断下载后再次下载会继续上次的进度
- 自动选择不同系统合适的下载方法
只需
python main.py
来不及解释了,快上车!!
有问题可以提issue,欢迎老司机们 star,fork ~
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].