itning / DouBanReptile Licence: Apache-2.0 license
豆瓣租房小组多线程爬虫。爬取后自动按时间排序生成markdown文件。
Programming Languages go 31211 projects - #10 most used programming language
Projects that are alternatives of or similar to DouBanReptile tts-deckconverter Generate card decks for Tabletop Simulator.
Stars : ✭ 27 (-12.9%)
Mutual labels: fyne
douban 基于thinkphp5.1的豆瓣电影API
Stars : ✭ 106 (+241.94%)
Mutual labels: douban
Top15 [EOL] 使用 Top15 在你的网站中展示最近看过/读过/听过的电影/书/音乐!
Stars : ✭ 13 (-58.06%)
Mutual labels: douban
doubanIMDb IMDb + Rotten Tomatoes + Wikipedia on Douban Movie
Stars : ✭ 93 (+200%)
Mutual labels: douban
auto-click-auto-fill Auto Click Auto Fill on any web page
Stars : ✭ 111 (+258.06%)
Mutual labels: xpath
brackit Query processor with proven optimizations, ready to use for your document store to query semi-structured data with a JSONiq like extension of XQuery. Can also be used as an ad-hoc in-memory query processor.
Stars : ✭ 28 (-9.68%)
Mutual labels: xpath
douban-movie Get movie info from douban(豆瓣) and display in your terminal
Stars : ✭ 17 (-45.16%)
Mutual labels: douban
ToolsCollection No description or website provided.
Stars : ✭ 20 (-35.48%)
Mutual labels: douban
python-crawler 爬虫学习仓库,适合零基础的人学习,对新手比较友好
Stars : ✭ 37 (+19.35%)
Mutual labels: xpath
doubanrobot A simple robot for Douban.com
Stars : ✭ 34 (+9.68%)
Mutual labels: douban
shirokumacafe 白熊咖啡馆的豆瓣广播
Stars : ✭ 21 (-32.26%)
Mutual labels: douban
gosquito gosquito ("go" + "mosquito") is a pluggable tool for data gathering, data processing and data transmitting to various destinations.
Stars : ✭ 25 (-19.35%)
Mutual labels: xpath
dotnet-security-unit-tests A web application that contains several unit tests for the purpose of .NET security
Stars : ✭ 25 (-19.35%)
Mutual labels: xpath
exml Most simple Elixir wrapper for xmerl xpath
Stars : ✭ 23 (-25.81%)
Mutual labels: xpath
fontoxpath A minimalistic XPath 3.1 implementation in pure JavaScript
Stars : ✭ 97 (+212.9%)
Mutual labels: xpath
go-xmldom XML DOM processing for Golang, supports xpath query
Stars : ✭ 38 (+22.58%)
Mutual labels: xpath
OpenScraper An open source webapp for scraping: towards a public service for webscraping
Stars : ✭ 80 (+158.06%)
Mutual labels: xpath
PopClip-Extensions Extentions I made for PopClip.
Stars : ✭ 17 (-45.16%)
Mutual labels: douban
豆瓣租房爬虫
下载
https://github.com/itning/DouBanReptile/releases
构建
go build -ldflags=" -s -w -H windowsgui" -o ..\b in\m ain.exe DouBanReptile/cmd
爬取结果文件(markdown)建议使用typora 打开
截图
使用教程
确保C:\\Windows\\Fonts\\
目录下有simsun.ttc
字体文件
如何设置豆瓣群组链接?
首先搜索某个地区租房,例如:北京租房
点进去要爬取的某个小组,例如第一个:北京租房
将页面拉到最下面有个> 更多小组讨论
超链接,点进去
复制地址栏中地址(从/group开始复制到结尾),粘贴到软件设置豆瓣群组链接
有时候粘贴进软件会崩溃,不知道什么原因,建议把软件中原来的链接删除再粘贴进去。
将start=
后边的数字50
改成%d
完成
如何设置排除(包含)关键字?
排除关键字是标题和内容只要出现关键字就会排除掉该条租房信息。
例如默认是限女
这个关键字,只要租房信息中包含限女生入住
,只限女生
等出现限女
关键字的一律不爬。
多个关键字用|
分隔,注意是英文的。
例如:限女|短租|整租
,这三个关键字设置后,只要标题和内容出现这三个关键字软件就不会爬取。
包含关键字只适用于标题,例如包含关键字为A,标题中含A,但内容中不含,会爬取;内容含A,标题不含,不会爬取。
关于识别标题中的价格
使用正则\b\d{4}\b
识别标题中的价格信息,无法爬取少于1000元的信息。
关于爬取结果排序
先根据价格从小到大排序,价格相同根据发帖时间排序。
关于爬取结果文件(.md扩展名)如何打开
建建议下载软件:typora
如何设置cookie?
打开豆瓣小组,例如:https://www.douban.com/group/554566/discussion?start=0
按F12
打开开发者控制台,点击Console
控制台选项卡
输入document.cookie
回车,复制内容(注意前后双引号不要复制)
将复制的内容粘贴在程序中
测试
操作系统
测试结果
windows 7 sp1
OK
windows 10 1909
OK
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at
[email protected] .