All Projects → BUPT-HJM → buptclass

BUPT-HJM / buptclass

Licence: other
A nodejs-spider that gets the infomation of empty classrooms in BUPT

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to buptclass

Fee-movie2.0
整合了几个常用的电影网站,获取资源更方便,更新中
Stars: ✭ 33 (+13.79%)
Mutual labels:  cheerio, superagent
receipt-manager-app
Receipt parser application written in dart.
Stars: ✭ 140 (+382.76%)
Mutual labels:  tesseract-ocr
Mirai
A website to stream Anime and read Manga for free.. Everything is scraped from sources online and we don't need to actually host any videos or images.
Stars: ✭ 38 (+31.03%)
Mutual labels:  cheerio
covid19-api
Covid19 Data API (JSON) - LIVE
Stars: ✭ 20 (-31.03%)
Mutual labels:  cheerio
web-crawljs
web crawler for Nodejs
Stars: ✭ 20 (-31.03%)
Mutual labels:  cheerio
Cheerio
Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
Stars: ✭ 24,616 (+84782.76%)
Mutual labels:  cheerio
website-to-json
Converts website to json using jQuery selectors
Stars: ✭ 37 (+27.59%)
Mutual labels:  cheerio
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (+65.52%)
Mutual labels:  tesseract-ocr
superdeno
Super-agent driven library for testing Deno HTTP servers.
Stars: ✭ 119 (+310.34%)
Mutual labels:  superagent
muninn
With a simple, flexible and maintainable configuration file, you can parse html and output json according to the schema you specify.
Stars: ✭ 38 (+31.03%)
Mutual labels:  cheerio
VueStudy
Vue.js学习系列示例代码及教程
Stars: ✭ 80 (+175.86%)
Mutual labels:  cheerio
scraper
A web scraper starter project
Stars: ✭ 18 (-37.93%)
Mutual labels:  cheerio
flask-ocr
use flask and tesseract to have a basic ocr, also you need opencv2, this code use opencv2 to have a basic image process
Stars: ✭ 27 (-6.9%)
Mutual labels:  tesseract-ocr
arachnod
High performance crawler for Nodejs
Stars: ✭ 17 (-41.38%)
Mutual labels:  cheerio
tesseract-ocr-re
Tesseract 4 OCR Runtime Environment - Docker Container
Stars: ✭ 94 (+224.14%)
Mutual labels:  tesseract-ocr
Personal-Chef
An Self learning AI Chatbot who doesnt let you waste food by recommending awesome Recipies
Stars: ✭ 24 (-17.24%)
Mutual labels:  cheerio
pasting
Publishing tool made in nodejs using deta.
Stars: ✭ 17 (-41.38%)
Mutual labels:  cheerio
MemePolice bot
This is a bot for r/PewdiepieSubmissions. Moderate harmful submissions by applying OCR on graphical content
Stars: ✭ 26 (-10.34%)
Mutual labels:  tesseract-ocr
scrape-github-trending
Tutorial for web scraping / crawling with Node.js.
Stars: ✭ 42 (+44.83%)
Mutual labels:  cheerio
HighlightTranslator
Highlight Translator can help you to translate the words quickly and accurately. By only highlighting, copying, or screenshoting the content you want to translate anywhere on your computer (ex. PDF, PPT, WORD etc.), the translated results will then be automatically displayed before you.
Stars: ✭ 54 (+86.21%)
Mutual labels:  tesseract-ocr

buptclass-spider

这是一个爬取北邮本部空闲自习室的爬虫。 欢迎fork、star~ 欢迎pull request~

个人博客爬虫介绍:http://bupt-hjm.github.io/2016/05/29/buptclass/

2016-06-02 代码更新

  • 新增spider2.js
  • 新增两个函数处理合并原先数据,优化spider1.js生成的数据冗余量过大的问题
  • 使用方法没有变化,支持的网站同步更新展现形式

原本

#### 优化后


使用方法:

第一步:安装依赖与所需支持

git clone后安装依赖npm install 由于涉及到node-tesseractgm,验证码识别与图像处理,还需要本地安装

下面是我整理的关于它们的链接

Tesseract 开源的 OCR 识别工具

graphicsmagick 图像处理工具

第二步: 添加配置信息

如果不打算用mongodb,可以把spider.js里的相关mongo删去或者不予理会(会抛出error不影响程序运行写入json)

下列配置在spider.js

//爬虫初始配置(教务系统登录的学号和密码必填)
var url="http://jwxt.bupt.edu.cn";//登录的链接
var db = monk('localhost/byr');//连接本地数据库
var sno = "*********";//此处输入学号
var password = "*********";//此处输入密码
//时间配置
var rule = new schedule.RecurrenceRule();
rule.hour = 10;
rule.minute = 0;
//(默认每天十点)
//时间配置也可不予理会,node一次程序会一开始就打开程序运行一次,之后才是看schedule

第三步: 运行程序

node spider.js

关于报错信息

{ [Error: Cannot find module '../build/Release/bson'] code: 'MODULE_NOT_FOUND' }
js-bson: Failed to load c++ bson extension, using pure JS version

报这个错可以参看Automattic/mongoose#2285 可以不予理会

{ [Error: socket hang up] code: 'ECONNRESET', response: undefined }

程序到识别出验证码后,没有出现登录成功,程序不运行,可等待一会即会报上面这个错,检查学号和密码是否输入正确


后期优化:

  • 优化代码风格,减少代码冗余
  • 更优雅地解决异步回调问题

关于该爬虫支持的网站

网站访问地址:

http://buptclass.com/

网站截图(pc端效果):

网站截图(手机端效果):

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].