A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (+137.5%)

Mutual labels: spider, scrapy

scrapy helper

Dynamic configurable crawl (动态可配置化爬虫)

Stars: ✭ 84 (+425%)

Mutual labels: spider, scrapy

Web-Iota

Iota is a web scraper which can find all of the images and links/suburls on a webpage

Stars: ✭ 60 (+275%)

Mutual labels: spider, scrapy

python-fxxk-spider

收集各种免费的 Python 爬虫项目

Stars: ✭ 184 (+1050%)

Mutual labels: spider, scrapy

scrapy-admin

A django admin site for scrapy

Stars: ✭ 44 (+175%)

Mutual labels: spider, scrapy

View All Similar Projects ➔

PTT 爬蟲圖片下載器 (使用Scrapy) PttImageSpider

抓取PTT整個看板的圖片，並用文章標題作為資料夾的名稱

Demo Video - Linux V2 (demo )
Demo Video - Linux V1 (教學+demo )

特色

抓取PTT特定看板全部的圖片
使用文章標題作為資料夾的名稱
下載圖片速度非常快，1分鐘可抓600張圖片，平均每秒10張

使用方法

scrapy crawl ptt_img_spider

如需抓其他PTT的看板，需要修改路徑 PttImageSpider/PttImageSpider/spiders/pttspider.py 檔案裡的

start_urls = ["https://www.ptt.cc/bbs/AKB48/index.html"]

將網址修改為其他看板的網址，範例如下

start_urls = ["https://www.ptt.cc/bbs/NounenRena/index.html"]

執行畫面

有些看板的網頁頁數非常多，所以如果要強迫終止，可以按 Ctrl + Z 強制結束程式

輸出格式

執行環境

Ubuntu 12.04
Python 2.7.3
Scrapy 1.0.4

License

MIT license

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

twtrubiks / PttImageSpider

Programming Languages

Labels

Projects that are alternatives of or similar to PttImageSpider

PTT 爬蟲圖片下載器 (使用Scrapy) PttImageSpider

特色

使用方法

執行畫面

輸出格式

執行環境

License