itning / DouBanReptile

Licence: Apache-2.0 license

豆瓣租房小组多线程爬虫。爬取后自动按时间排序生成markdown文件。

Programming Languages

31211 projects - #10 most used programming language

powershell

5483 projects

Projects that are alternatives of or similar to DouBanReptile

tts-deckconverter

Generate card decks for Tabletop Simulator.

Stars: ✭ 27 (-12.9%)

Mutual labels: fyne

douban

基于thinkphp5.1的豆瓣电影API

Stars: ✭ 106 (+241.94%)

Mutual labels: douban

Top15

[EOL] 使用 Top15 在你的网站中展示最近看过/读过/听过的电影/书/音乐！

Stars: ✭ 13 (-58.06%)

Mutual labels: douban

doubanIMDb

IMDb + Rotten Tomatoes + Wikipedia on Douban Movie

Stars: ✭ 93 (+200%)

Mutual labels: douban

auto-click-auto-fill

Auto Click Auto Fill on any web page

Stars: ✭ 111 (+258.06%)

Mutual labels: xpath

douban-book-api

第三方豆瓣读书 api 接口

Stars: ✭ 44 (+41.94%)

Mutual labels: douban

brackit

Query processor with proven optimizations, ready to use for your document store to query semi-structured data with a JSONiq like extension of XQuery. Can also be used as an ad-hoc in-memory query processor.

Stars: ✭ 28 (-9.68%)

Mutual labels: xpath

douban-movie

Get movie info from douban(豆瓣) and display in your terminal

Stars: ✭ 17 (-45.16%)

Mutual labels: douban

ToolsCollection

No description or website provided.

Stars: ✭ 20 (-35.48%)

Mutual labels: douban

DoubanMovieJSON

豆瓣电影JSON数据

Stars: ✭ 60 (+93.55%)

Mutual labels: douban

python-crawler

爬虫学习仓库，适合零基础的人学习，对新手比较友好

Stars: ✭ 37 (+19.35%)

Mutual labels: xpath

doubanrobot

A simple robot for Douban.com

Stars: ✭ 34 (+9.68%)

Mutual labels: douban

shirokumacafe

白熊咖啡馆的豆瓣广播

Stars: ✭ 21 (-32.26%)

Mutual labels: douban

gosquito

gosquito ("go" + "mosquito") is a pluggable tool for data gathering, data processing and data transmitting to various destinations.

Stars: ✭ 25 (-19.35%)

Mutual labels: xpath

dotnet-security-unit-tests

A web application that contains several unit tests for the purpose of .NET security

Stars: ✭ 25 (-19.35%)

Mutual labels: xpath

exml

Most simple Elixir wrapper for xmerl xpath

Stars: ✭ 23 (-25.81%)

Mutual labels: xpath

fontoxpath

A minimalistic XPath 3.1 implementation in pure JavaScript

Stars: ✭ 97 (+212.9%)

Mutual labels: xpath

go-xmldom

XML DOM processing for Golang, supports xpath query

Stars: ✭ 38 (+22.58%)

Mutual labels: xpath

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (+158.06%)

Mutual labels: xpath

PopClip-Extensions

Extentions I made for PopClip.

Stars: ✭ 17 (-45.16%)

Mutual labels: douban

View All Similar Projects ➔

豆瓣租房爬虫

下载

https://github.com/itning/DouBanReptile/releases

构建

go build -ldflags="-s -w -H windowsgui" -o ..\bin\main.exe DouBanReptile/cmd

爬取结果文件（markdown）建议使用typora打开

截图

使用教程

确保C:\\Windows\\Fonts\\目录下有simsun.ttc字体文件

如何设置豆瓣群组链接？
1. 首先搜索某个地区租房，例如：北京租房
2. 点进去要爬取的某个小组，例如第一个：北京租房
3. 将页面拉到最下面有个> 更多小组讨论超链接，点进去
4. 复制地址栏中地址（从/group开始复制到结尾），粘贴到软件设置豆瓣群组链接
  
  有时候粘贴进软件会崩溃，不知道什么原因，建议把软件中原来的链接删除再粘贴进去。
5. 将start=后边的数字50改成%d
6. 完成
如何设置排除(包含)关键字？

排除关键字是标题和内容只要出现关键字就会排除掉该条租房信息。

例如默认是限女这个关键字，只要租房信息中包含限女生入住，只限女生等出现限女关键字的一律不爬。

多个关键字用|分隔，注意是英文的。

例如：限女|短租|整租，这三个关键字设置后，只要标题和内容出现这三个关键字软件就不会爬取。

包含关键字只适用于标题，例如包含关键字为A，标题中含A，但内容中不含，会爬取；内容含A，标题不含，不会爬取。
关于识别标题中的价格

使用正则\b\d{4}\b识别标题中的价格信息，无法爬取少于1000元的信息。
关于爬取结果排序

先根据价格从小到大排序，价格相同根据发帖时间排序。
关于爬取结果文件(.md扩展名)如何打开

建建议下载软件：typora
如何设置cookie？
1. 打开豆瓣小组，例如：https://www.douban.com/group/554566/discussion?start=0
2. 按F12打开开发者控制台，点击Console控制台选项卡
3. 输入document.cookie回车，复制内容（注意前后双引号不要复制）
4. 将复制的内容粘贴在程序中

测试

操作系统	测试结果
windows 7 sp1	OK
windows 10 1909	OK

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

itning / DouBanReptile

Programming Languages

Labels

Projects that are alternatives of or similar to DouBanReptile

豆瓣租房爬虫

下载

构建

截图

使用教程

测试