All Projects → mike442144 → seenreq

mike442144 / seenreq

Licence: MIT license
Generate an object for testing if a request is sent, request is Mikeal's request.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to seenreq

wumpfetch
🚀🔗 A modern, lightweight, fast and easy to use Node.js HTTP client
Stars: ✭ 20 (-52.38%)
Mutual labels:  url, request, post
Not Your Average Web Crawler
A web crawler (for bug hunting) that gathers more than you can imagine.
Stars: ✭ 107 (+154.76%)
Mutual labels:  spider, request
cypress-upload-file-post-form
Solution for two Cypress testing use-cases I came across with: perform a direct http FORM request to the server containing a file and other parameters and upload a file into a form before submission
Stars: ✭ 59 (+40.48%)
Mutual labels:  request, post
Tieba-Birthday-Spider
百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福
Stars: ✭ 28 (-33.33%)
Mutual labels:  spider, post
github-base
Simple, opinionated node.js interface for creating basic apps with the GitHub API.
Stars: ✭ 58 (+38.1%)
Mutual labels:  url, request
http
Aplus Framework HTTP Library
Stars: ✭ 113 (+169.05%)
Mutual labels:  url, request
node-match-path
Matches a URL against a path. Parameters, wildcards, RegExp.
Stars: ✭ 30 (-28.57%)
Mutual labels:  url, request
UrlManager
Javascript class for getting and setting url parameters
Stars: ✭ 15 (-64.29%)
Mutual labels:  url
openapi4j
OpenAPI 3 parser, JSON schema and request validator.
Stars: ✭ 92 (+119.05%)
Mutual labels:  request
tiny-qiniu-request
tiny-qiniu for rc-upload or antd upload component `customRequest` property
Stars: ✭ 13 (-69.05%)
Mutual labels:  request
SpiderCard
蜘蛛纸牌 for mac
Stars: ✭ 29 (-30.95%)
Mutual labels:  spider
url-regex-safe
Regular expression matching for URL's. Maintained, safe, and browser-friendly version of url-regex. Resolves CVE-2020-7661 for Node.js servers.
Stars: ✭ 59 (+40.48%)
Mutual labels:  url
vuepress-plugin-autometa
Auto meta tags plugin for VuePress 1.x
Stars: ✭ 40 (-4.76%)
Mutual labels:  post
protocol
This repo contains the Spacemesh protocol specifications and related documentation
Stars: ✭ 29 (-30.95%)
Mutual labels:  post
crawlBaiduWenku
这可能是爬百度文库最全的项目了
Stars: ✭ 63 (+50%)
Mutual labels:  spider
woodpecker
woodpecker http client for Android
Stars: ✭ 17 (-59.52%)
Mutual labels:  post
spider
python 爬虫(amazon, confluence ...)
Stars: ✭ 21 (-50%)
Mutual labels:  spider
WaterPipe
URL routing framework, requests/responses handler, and HTTP client for PHP
Stars: ✭ 24 (-42.86%)
Mutual labels:  request
router
Fast router for PSR-15 request handlers
Stars: ✭ 18 (-57.14%)
Mutual labels:  request
mongoose-slug-updater
Schema-based slug plugin for Mongoose - single/compound - unique over collection/group - nested docs/arrays - relative/abs paths - sync on change: create/save/update/updateOne/updateMany/findOneAndUpdate tracked - $set operator - counter/shortId
Stars: ✭ 37 (-11.9%)
Mutual labels:  url

NPM

build status Dependency Status NPM download NPM quality

seenreq

A library to test if a url/request is crawled, usually used in a web crawler. Compatible with request and node-crawler. The 1.x or newer version has quite different APIs and is not compatible with 0.x versions. Please read the upgrade guide document.

Table of Contents

Quick Start

Installation

$ npm install seenreq --save

Basic Usage

const seenreq = require('seenreq')
, seen = new seenreq();

//url to be normalized
let url = "http://www.GOOGLE.com";
console.log(seen.normalize(url));//{ sign: "GET http://www.google.com/\r\n", options: {} }

//request options to be normalized
let option = {
    uri: 'http://www.GOOGLE.com',
    rupdate: false
};

console.log(seen.normalize(option));//{sign: "GET http://www.google.com/\r\n", options:{rupdate: false} }

seen.initialize().then(()=>{
    return seen.exists(url);
}).then( (rst) => {
    console.log(rst[0]);//false if ask for a `request` never see
    return seen.exists(opt);
}).then( (rst) => {
    console.log(rst[0]);//true if got same `request`
}).catch(e){
    console.error(e);
};

When you call exists, the module will do normalization itself first and then check if exists.

Use Redis

seenreq stores keys in memory by default, memory usage will soar as number of keys increases. Redis will solve this problem. Because seenreq uses ioredis as redis client, all ioredis' options are recived and supported. You should first install:

npm install seenreq-repo-redis --save

and then set repo to redis:

const seenreq = require('seenreq')
let seen = new seenreq({
    repo:'redis',// use redis instead of memory
    host:'127.0.0.1', 
    port:6379,
    clearOnQuit:false // clear redis cache or don't when calling dispose(), default true.
});

seen.initialize().then(()=>{
    //do stuff...
}).catch(e){
    console.error(e);
}

Use mongodb

It is similar with redis above:

npm install seenreq-repo-mongo --save
const seenreq = require('seenreq')
let seen = new seenreq({
    repo:'mongo',
    url:'mongodb://xxx/seenreq',
    collection: 'foor'
});

Class:seenreq

Instance of seenreq

seen.initialize()

Initialize the repo, returns a promise.

seen.normalize(uri|option[,options])

Returns normalized Object: {sign,options}.

seen.exists(uri|option|array[,options])

Returns a promise with an Boolean array, e.g. [true, false, true, false, false].

seen.dispose()

Dispose resources of repo. If you are using repo other than memory, like Redis you should call dispose to release connection. Returns a promise.

Options

  • removeKeys: Array, Ignore specified keys when doing normalization. For instance, there is a ts property in the url like http://www.xxx.com/index?ts=1442382602504 which is timestamp and it should be same whenever you visit.
  • stripFragment: Boolean, Remove the fragment at the end of the URL (Default true).
  • rupdate: Boolean, it is short for repo update. Store in repo so that seenreq can hit the same req next time (Default true).

RoadMap

  • add mysql repo to persist keys to disk.
  • add keys life time management.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].