Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lndj → Lcrawl

lndj / Lcrawl

Licence: mit

一只优雅的正方教务系统爬虫。

Labels

crawler

Projects that are alternatives of or similar to Lcrawl

Crawlerpack

Java 網路資料爬蟲包

Stars: ✭ 99 (-11.61%)

Mutual labels: crawler

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

Stars: ✭ 107 (-4.46%)

Mutual labels: crawler

Pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.

Stars: ✭ 109 (-2.68%)

Mutual labels: crawler

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-10.71%)

Mutual labels: crawler

Crawler

爬虫, http代理, 模拟登陆!

Stars: ✭ 106 (-5.36%)

Mutual labels: crawler

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+37706.25%)

Mutual labels: crawler

Douyinsdk

抖音 SDK，数据采集，爬虫抓取不是梦

Stars: ✭ 99 (-11.61%)

Mutual labels: crawler

Baiduspider

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Stars: ✭ 105 (-6.25%)

Mutual labels: crawler

Crawler Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

Stars: ✭ 1,549 (+1283.04%)

Mutual labels: crawler

Linkcrawler

Cross-platform persistent and distributed web crawler 🔗

Stars: ✭ 109 (-2.68%)

Mutual labels: crawler

Andvaranaut

A dungeon crawler

Stars: ✭ 103 (-8.04%)

Mutual labels: crawler

Skycaiji

蓝天采集器是一款免费的数据采集发布爬虫软件，采用php+mysql开发，可部署在云服务器，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统

Stars: ✭ 1,514 (+1251.79%)

Mutual labels: crawler

Fawkes

Fawkes is a tool to search for targets vulnerable to SQL Injection. Performs the search using Google search engine.

Stars: ✭ 108 (-3.57%)

Mutual labels: crawler

Ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

Stars: ✭ 1,366 (+1119.64%)

Mutual labels: crawler

Instagram Profilecrawl

💻 Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

Stars: ✭ 110 (-1.79%)

Mutual labels: crawler

Antispider

Stars: ✭ 99 (-11.61%)

Mutual labels: crawler

Webmagic

A scalable web crawler framework for Java.

Stars: ✭ 10,186 (+8994.64%)

Mutual labels: crawler

Graphquery

GraphQuery is a query language and execution engine tied to any backend service.

Stars: ✭ 112 (+0%)

Mutual labels: crawler

Google Play Scraper

Node.js scraper to get data from Google Play

Stars: ✭ 1,606 (+1333.93%)

Mutual labels: crawler

Lumberjack

An automated website accessibility scanner and cli

Stars: ✭ 109 (-2.68%)

Mutual labels: crawler

View All Similar Projects ➔

简介

使用PHP实现正方教务系统爬虫功能。

可能是正方教务系统最优雅的一只爬虫。

安装

使用 composer 进行安装： composer require lndj/lcrawl

要体验最新功能，可以执行：

git clone https://github.com/lndj/Lcrawl.git
cd Lcrawl
composer install

注意：请先安装 composer

Example

<?php

 // Require the composer autoload file
require './vendor/autoload.php';

$stu_id = '201201148';
//Notice: This is NOT test account!!!
$password = 'xxxxxxxx';

$user = ['stu_id' => $stu_id, 'stu_pwd' => $password];

$client = new Lcrawl('http://xuanke.lzjtu.edu.cn/', $user);

//登陆 -- 没有开启会话缓存，必须调用登陆方法。
$client->login();

//获取所有数据
$all = $client->setUa('Lcrawl Spider V2.0.2')->getAll();


// $client->getSchedule();
// $client->getCet();

在请求过程中，你还可以设置 Referer/Timeout 等 header 信息，直接采用链式调用即可。

会话缓存

在请求过程中，可以启用会话缓存功能，可以有效减少教务系统会话开启数量。

//实例化过程中传入第三个值
$client = new Lcrawl('http://xuanke.lzjtu.edu.cn/', $user, true);

$all = $client->setUa('Lcrawl Spider V2.0.2')->setTimeOut(3.0)->getAll();

高级用法

为达到在登陆一次后的一段时间内，不需要再次执行登陆操作便可直接获取数据，减少教务网请求量，可以使用会话缓存。

首先，在实例化Lcrawl时，传入第三个参数为 true 。例如：

//实例化过程中传入第三个值
$client = new Lcrawl('http://xuanke.lzjtu.edu.cn/', $user, true);

第三个参数即表示开启会话缓存。

本项目使用 doctrine/cache 来完成缓存工作，它支持基本目前所有的缓存引擎。

在我们的 Lcrawl 中的所有缓存默认使用文件缓存，缓存路径取决于PHP的临时目录，如果你需要自定义缓存，那么你需要做如下的事情：

use Doctrine\Common\Cache\RedisCache;

$cacheDriver = new RedisCache();

// 创建 redis 实例
$redis = new Redis();
$redis->connect('redis_host', 6379);

$cacheDriver->setRedis($redis);

//设置使用redis来缓存会话
$client->setCache($cacheDriver);

你可以参考doctrine/cache官方文档来替换掉应用中默认的缓存配置：

以 redis 为例

请先安装 redis 拓展：https://github.com/phpredis/phpredis

设置登录过程参数

本SDK默认使用的是 ysdx_default.aspx 为登陆的 uri 。若要使用其他 uri ，可自行抓包获取登录过程的参数。

在实例化时，传入第四个参数， $loginParam ,数组的key为固定值,value传入抓包过程中 POST body 的字段名即可。

$loginParam = [
    'viewstate' => '__VIEWSTATE', //隐藏域字段名称
    'stu_id' => 'TextBox1', //学号字段名称
    'passwod' => 'TextBox2', //密码字段
    'role' => 'RadioButtonList1', //角色
    'button' => 'Button1' //按钮
];

$client = new Lcrawl('http://xuanke.lzjtu.edu.cn/', $user, true, $loginParam);

//other code...

API

`getAll()`

获取所有数据,并发获取

`getSchedule()`

获取课表数据

`getGrade()`

获取成绩数据

`getCet()`

获取四六级数据

`getExam()`

获取考试安排数据

`setX()` 类

设置相应 Header 值，例如： setUa()/setTimeout()/setReferer() 等。

`getX()`类

Getter.

Laravel 中使用

在 Laravel 中框架使用 predis/predis ，那么我们就得使用 Doctrine\Common\Cache\PredisCache：

use Doctrine\Common\Cache\PredisCache;

$predis = app('redis')->connection();// connection($name), $name 默认为 `default`
$cacheDriver = new PredisCache($predis);

//设置使用redis来缓存会话
$client->setCache($cacheDriver);

上面提到的 app('redis')->connection($name) , 这里的 $name 是 laravel项目中配置文件 database.php 中 redis 配置名 default ：https://github.com/laravel/laravel/blob/master/config/database.php#L118 如果你使用的其它连接，对应传名称就好了。

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 112

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

lndj / Lcrawl

Labels

Projects that are alternatives of or similar to Lcrawl

简介

安装

Example

会话缓存

高级用法

设置登录过程参数

API

getAll()

getSchedule()

getGrade()

getCet()

getExam()

setX() 类

getX()类

Laravel 中使用

License

`getAll()`

`getSchedule()`

`getGrade()`

`getCet()`

`getExam()`

`setX()` 类

`getX()`类