All Projects → twtrubiks → PttImageSpider

twtrubiks / PttImageSpider

Licence: other
PTT 圖片下載器 (抓取整個看板的圖片,並用文章標題作為資料夾的名稱 ) (使用Scrapy)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PttImageSpider

PTT Beauty Spider
PTT 表特版爬蟲圖片下載器
Stars: ✭ 47 (+193.75%)
Mutual labels:  spider, download, ptt
OpenScraper
An open source webapp for scraping: towards a public service for webscraping
Stars: ✭ 80 (+400%)
Mutual labels:  spider, scrapy
photo-spider-scrapy
10 photo website spiders, 10 个国外图库的 scrapy 爬虫代码
Stars: ✭ 17 (+6.25%)
Mutual labels:  spider, scrapy
python-spider
python爬虫小项目【持续更新】【笔趣阁小说下载、Tweet数据抓取、天气查询、网易云音乐逆向、天天基金网查询、微博数据抓取(生成cookie)、有道翻译逆向、企查查免登陆爬虫、大众点评svg加密破解、B站用户爬虫、拉钩免登录爬虫、自如租房字体加密、知乎问答
Stars: ✭ 45 (+181.25%)
Mutual labels:  spider, scrapy
163Music
163music spider by scrapy.
Stars: ✭ 60 (+275%)
Mutual labels:  spider, scrapy
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+1912.5%)
Mutual labels:  spider, scrapy
ptt-web-crawler
PTT 網路版爬蟲
Stars: ✭ 20 (+25%)
Mutual labels:  scrapy, ptt
small-spider-project
日常爬虫
Stars: ✭ 14 (-12.5%)
Mutual labels:  spider, scrapy
Scrapy-Spiders
一个基于Scrapy的数据采集爬虫代码库
Stars: ✭ 34 (+112.5%)
Mutual labels:  spider, scrapy
V2EX Spider
V2EX爬虫
Stars: ✭ 21 (+31.25%)
Mutual labels:  spider, scrapy
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+37.5%)
Mutual labels:  spider, scrapy
devsearch
A web search engine built with Python which uses TF-IDF and PageRank to sort search results.
Stars: ✭ 52 (+225%)
Mutual labels:  spider, scrapy
NScrapy
NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider
Stars: ✭ 88 (+450%)
Mutual labels:  spider, scrapy
Scrapy IPProxyPool
免费 IP 代理池。Scrapy 爬虫框架插件
Stars: ✭ 100 (+525%)
Mutual labels:  spider, scrapy
toutiao
今日头条科技新闻接口爬虫
Stars: ✭ 17 (+6.25%)
Mutual labels:  spider, scrapy
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+137.5%)
Mutual labels:  spider, scrapy
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
Stars: ✭ 84 (+425%)
Mutual labels:  spider, scrapy
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
Stars: ✭ 60 (+275%)
Mutual labels:  spider, scrapy
python-fxxk-spider
收集各种免费的 Python 爬虫项目
Stars: ✭ 184 (+1050%)
Mutual labels:  spider, scrapy
scrapy-admin
A django admin site for scrapy
Stars: ✭ 44 (+175%)
Mutual labels:  spider, scrapy

PTT 爬蟲圖片下載器 (使用Scrapy) PttImageSpider

抓取PTT整個看板的圖片,並用文章標題作為資料夾的名稱

特色

  • 抓取PTT特定看板全部的圖片
  • 使用文章標題作為資料夾的名稱
  • 下載圖片速度非常快,1分鐘可抓600張圖片,平均每秒10張

使用方法

scrapy crawl ptt_img_spider

如需抓其他PTT的看板,需要修改路徑 PttImageSpider/PttImageSpider/spiders/pttspider.py 檔案裡的

start_urls = ["https://www.ptt.cc/bbs/AKB48/index.html"]

將網址修改為其他看板的網址,範例如下

start_urls = ["https://www.ptt.cc/bbs/NounenRena/index.html"]

執行畫面

alt tag

有些看板的網頁頁數非常多,所以如果要強迫終止,可以按 Ctrl + Z 強制結束程式

輸出格式

alt tag alt tag

執行環境

  • Ubuntu 12.04
  • Python 2.7.3
  • Scrapy 1.0.4

License

MIT license

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].