All Projects → benbalter → sitemap-parser

benbalter / sitemap-parser

Licence: MIT License
Ruby Gem to parse sitemaps.org compliant sitemaps

Programming Languages

ruby
36898 projects - #4 most used programming language
shell
77523 projects

Sitemap Parser

Ruby Gem to parse sitemaps.org compliant sitemaps

Build Status Gem Version

Usage

Create a new instance of the Parser:

sitemap = SitemapParser.new "http://ben.balter.com/sitemap.xml"

Extract the URLs of the sitemap

sitemap.urls # => Array of Nokigiri XML::Node objects
sitemap.to_a # => Array of url strings

Options

Recurse nested sitemaps

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true})

Or if you only want to extract only sitemap urls maching a given pattern, you can provide a regex that will be used to match each page.

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true, url_regex: /sitemapregex/})

Typhoeus Options

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', { userpwd: "username:password" })
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].