ekalinin / robots.js

Licence: MIT License

Parser for robots.txt for node.js

Programming Languages

javascript

184084 projects - #8 most used programming language

Makefile

30231 projects

Projects that are alternatives of or similar to robots.js

nuxt-humans-txt

🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.

Stars: ✭ 27 (-57.81%)

Mutual labels: robots-txt, robots

aztarna

aztarna, a footprinting tool for robots.

Stars: ✭ 85 (+32.81%)

Mutual labels: robots

jsitemapgenerator

Java sitemap generator. This library generates a web sitemap, can ping Google, generate RSS feed, robots.txt and more with friendly, easy to use Java 8 functional style of programming

Stars: ✭ 38 (-40.62%)

Mutual labels: robots-txt

arguing-robots

🤖 Watch and hear macOS robots argue live in your terminal 🤖

Stars: ✭ 53 (-17.19%)

Mutual labels: robots

TikTok

Download public videos on TikTok using Python with Selenium

Stars: ✭ 37 (-42.19%)

Mutual labels: robots

openroberta-lab

The programming environment »Open Roberta Lab« by Fraunhofer IAIS enables children and adolescents to program robots. A variety of different programming blocks are provided to program motors and sensors of the robot. Open Roberta Lab uses an approach of graphical programming so that beginners can seamlessly start coding. As a cloud-based applica…

Stars: ✭ 98 (+53.13%)

Mutual labels: robots

ultimate-sitemap-parser

Ultimate Website Sitemap Parser

Stars: ✭ 118 (+84.38%)

Mutual labels: robots-txt

stoqs

Geospatial database visualization software for oceanographic measurement data

Stars: ✭ 31 (-51.56%)

Mutual labels: robots

multi robot traj planner

An Efficient Multi-Robot Trajectory Planner for Ground Vehicles.

Stars: ✭ 104 (+62.5%)

Mutual labels: robots

.NetCorePluginManager

.Net Core Plugin Manager, extend web applications using plugin technology enabling true SOLID and DRY principles when developing applications

Stars: ✭ 17 (-73.44%)

Mutual labels: robots-txt

Nasa-And-Spacex-Cooperation

Theme Outer Space

Stars: ✭ 41 (-35.94%)

Mutual labels: robots

RedBot

Design files and firmware files for the RedBot robotics board.

Stars: ✭ 22 (-65.62%)

Mutual labels: robots

penny

3 servos, 10 dollars hexapod

Stars: ✭ 26 (-59.37%)

Mutual labels: robots

robotstxt-webpack-plugin

A webpack plugin to generate a robots.txt file

Stars: ✭ 31 (-51.56%)

Mutual labels: robots-txt

youtube-video-maker

📹 A tool for automatic video creation and uploading on YouTube

Stars: ✭ 134 (+109.38%)

Mutual labels: robots

community-projects

Webots projects (PROTO files, controllers, simulation worlds, etc.) contributed by the community.

Stars: ✭ 20 (-68.75%)

Mutual labels: robots

robot hacking manual

Robot Hacking Manual (RHM). From robotics to cybersecurity. Papers, notes and writeups from a journey into robot cybersecurity.

Stars: ✭ 169 (+164.06%)

Mutual labels: robots

summit xl sim

Packages for the simulation of the Summit XL, Summit XL HL and Summit-X (including X-WAM) robots

Stars: ✭ 32 (-50%)

Mutual labels: robots

linorobot2

Autonomous mobile robots (2WD, 4WD, Mecanum Drive)

Stars: ✭ 97 (+51.56%)

Mutual labels: robots

blender-robotics-utils

Set of utilities for exporting/controlling your robot in Blender

Stars: ✭ 26 (-59.37%)

Mutual labels: robots

View All Similar Projects ➔

robots.js

robots.js — is parser for robots.txt files for node.js.

Installation

It's recommended to install via npm:

$ npm install -g robots

Usage

Here's an example of using robots.js:

var robots = require('robots')
  , parser = new robots.RobotsParser();

parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
  if(success) {
    parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access) {
      if (access) {
        // parse url
      }
    });
  }
});

Default crawler user-agent is:

Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0

Here's an example of using another user-agent and more detailed callback:

var robots = require('robots')
  , parser = new robots.RobotsParser(
                'http://nodeguide.ru/robots.txt',
                'Mozilla/5.0 (compatible; RobotTxtBot/1.0)',
                after_parse
            );
            
function after_parse(parser, success) {
  if(success) {
    parser.canFetch('*', '/doc/dailyjs-nodepad/', function (access, url, reason) {
      if (access) {
        console.log(' url: '+url+', access: '+access);
        // parse url ...
      }
    });
  }
};

Here's an example of getting list of sitemaps:

var robots = require('robots')
  , parser = new robots.RobotsParser();

parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
  if(success) {
    parser.getSitemaps(function(sitemaps) {
      // sitemaps — array
    });
  }
});

Here's an example of getCrawlDelay usage:

    var robots = require('robots')
      , parser = new robots.RobotsParser();

    // for example:
    //
    // $ curl -s http://nodeguide.ru/robots.txt
    //
    // User-agent: Google-bot
    // Disallow: / 
    // Crawl-delay: 2
    //
    // User-agent: *
    // Disallow: /
    // Crawl-delay: 2

    parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
      if(success) {
        var GoogleBotDelay = parser.getCrawlDelay("Google-bot");
        // ...
      }
    });

An example of passing options to the HTTP request:

var options = {
  headers:{
    Authorization:"Basic " + new Buffer("username:password").toString("base64")}
}

var robots = require('robots')
  , parser = new robots.RobotsParser(null, options);

parser.setUrl('http://nodeguide.ru/robots.txt', function(parser, success) {
  ...
});

API

RobotsParser — main class. This class provides a set of methods to read, parse and answer questions about a single robots.txt file.

setUrl(url, read) — sets the URL referring to a robots.txt file. by default, invokes read() method. If read is a function, it is called once the remote file is downloaded and parsed, and it takes in two arguments: the first is the parser itself, and the second is a boolean which is True if the the remote file was successfully parsed.
read(after_parse) — reads the robots.txt URL and feeds it to the parser
parse(lines) — parse the input lines from a robots.txt file
canFetch(userAgent, url, callback) — using the parsed robots.txt decide if userAgent can fetch url. Callback function: function callback(access, url, reason) { ... } where:
- access — can this url be fetched. true/false.
- url — target url
- reason — reason for access. Object:
  - type — valid values: 'statusCode', 'entry', 'defaultEntry', 'noRule'
  - entry — an instance of lib/Entry.js:. Only for types: 'entry', 'defaultEntry'
  - statusCode — http response status code for url. Only for type 'statusCode'
canFetchSync(userAgent, url) — using the parsed robots.txt decide if userAgent can fetch url. Return true/false.
getCrawlDelay(userAgent) — returns Crawl-delay for the certain userAgent
getSitemaps(sitemaps) — gets Sitemaps from parsed robots.txt
getDisallowedPaths(userAgent) — gets paths explictly disallowed for the user agent specified AND *

License

See LICENSE file.

Resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ekalinin / robots.js

Programming Languages

Labels

Projects that are alternatives of or similar to robots.js

robots.js

Installation

Usage

API

License

Resources