All Projects → seantomburke → sitemapper

seantomburke / sitemapper

Licence: MIT license
parses sitemaps for Node.JS

Programming Languages

javascript
184084 projects - #8 most used programming language
typescript
32286 projects

Projects that are alternatives of or similar to sitemapper

express-sitemap-xml
Serve sitemap.xml from a list of URLs in Express
Stars: ✭ 56 (-20%)
Mutual labels:  sitemap, sitemap-xml
ultimate-sitemap-parser
Ultimate Website Sitemap Parser
Stars: ✭ 118 (+68.57%)
Mutual labels:  sitemap, sitemap-xml
grav-plugin-sitemap
Grav Sitemap Plugin
Stars: ✭ 34 (-51.43%)
Mutual labels:  sitemap, sitemap-xml
X.Web.Sitemap
Simple sitemap generator for .NET
Stars: ✭ 66 (-5.71%)
Mutual labels:  sitemap, sitemap-xml
SitemapTools
A sitemap (sitemap.xml) querying and parsing library for .NET
Stars: ✭ 19 (-72.86%)
Mutual labels:  sitemap, sitemap-xml
section-matter
Like front-matter, but allows multiple sections in a single document.
Stars: ✭ 18 (-74.29%)
Mutual labels:  parse
mpq
Decoder/parser of Blizzard's MPQ archive file format
Stars: ✭ 28 (-60%)
Mutual labels:  parse
TIFeedParser
RSS Parser written in Swift
Stars: ✭ 18 (-74.29%)
Mutual labels:  parse
info-bot
🤖 A Versatile Telegram Bot
Stars: ✭ 37 (-47.14%)
Mutual labels:  parse
read-env
🔧 Transform environment variables into JSON object with sanitized values.
Stars: ✭ 60 (-14.29%)
Mutual labels:  parse
vuepress-plugin-sitemap
Sitemap generator plugin for vuepress.
Stars: ✭ 92 (+31.43%)
Mutual labels:  sitemap
lilt
LILT: noun, A characteristic rising and falling of the voice when speaking; a pleasant gentle accent.
Stars: ✭ 18 (-74.29%)
Mutual labels:  parse
how-much
💰 iOS price list app using Firebase, Realm & more
Stars: ✭ 22 (-68.57%)
Mutual labels:  parse
logparser
Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (+98.57%)
Mutual labels:  parse
shape-json
Module used to convert a flat json array into a nested json object with a predefined scheme
Stars: ✭ 31 (-55.71%)
Mutual labels:  parse
pfp-vim
A vim hex-editor plugin that uses 010 templates to parse binary data using pfp
Stars: ✭ 57 (-18.57%)
Mutual labels:  parse
guessit-rest
REST API for guessit
Stars: ✭ 15 (-78.57%)
Mutual labels:  parse
pp-toml
Paul's Parser for Tom's Own Minimal Language
Stars: ✭ 17 (-75.71%)
Mutual labels:  parse
parse-server-test-runner
A tool for programmatically starting Parse Server
Stars: ✭ 18 (-74.29%)
Mutual labels:  parse
json struct
json_struct is a single header only C++ library for parsing JSON directly to C++ structs and vice versa
Stars: ✭ 279 (+298.57%)
Mutual labels:  parse

Sitemap-parser

Code Scanning NPM Publish Version Bump Test Build Status Codecov CodeFactor GitHub license GitHub release date Inline docs LGTM Alerts LGTM Grade Libraries.io dependency status for latest release license Monthly Downloads npm version release

Parse through a sitemaps xml to get all the urls for your crawler.

Version 2

Installation

npm install sitemapper --save

Simple Example

const Sitemapper = require('sitemapper');

const sitemap = new Sitemapper();

sitemap.fetch('https://wp.seantburke.com/sitemap.xml').then(function(sites) {
  console.log(sites);
});

Examples in ES6

import Sitemapper from 'sitemapper';

(async () => {
  const Google = new Sitemapper({
    url: 'https://www.google.com/work/sitemap.xml',
    timeout: 15000, // 15 seconds
  });

  try {
    const { sites } = await Google.fetch();
    console.log(sites);
  } catch (error) {
    console.log(error);
  }
})();

// or

const sitemapper = new Sitemapper();
sitemapper.timeout = 5000;

sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
  .then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
  .catch(error => console.log(error));

Options

You can add options on the initial Sitemapper object when instantiating it.

  • requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent)
  • timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
  • url: (String) - Sitemap URL to crawl
  • debug: (Boolean) - Enables/Disables debug console logging. Default: False
  • concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
  • retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
  • rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True
  • lastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urls
const sitemapper = new Sitemapper({
  url: 'https://art-works.community/sitemap.xml',
  rejectUnauthorized: true,
  timeout: 15000,
  requestHeaders: {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
  }
});

An example using all available options:

const sitemapper = new Sitemapper({
  url: 'https://art-works.community/sitemap.xml',
  timeout: 15000,
  requestHeaders: {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
  },
  debug: true,
  concurrency: 2,
  retries: 1,
});

Examples in ES5

var Sitemapper = require('sitemapper');

var Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000 //15 seconds
});

Google.fetch()
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });


// or


var sitemapper = new Sitemapper();

sitemapper.timeout = 5000;
sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });

Version 1

npm install [email protected] --save

Simple Example

var Sitemapper = require('sitemapper');

var sitemapper = new Sitemapper();

sitemapper.getSites('https://wp.seantburke.com/sitemap.xml', function(err, sites) {
    if (!err) {
     console.log(sites);
    }
});
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].