All Projects → VIPnytt → SitemapParser

VIPnytt / SitemapParser

Licence: MIT license
XML Sitemap parser class compliant with the Sitemaps.org protocol.

Programming Languages

PHP
23972 projects - #3 most used programming language

Projects that are alternatives of or similar to SitemapParser

ultimate-sitemap-parser
Ultimate Website Sitemap Parser
Stars: ✭ 118 (+107.02%)
Mutual labels:  sitemap, xml-sitemap-parser
X.Web.Sitemap
Simple sitemap generator for .NET
Stars: ✭ 66 (+15.79%)
Mutual labels:  sitemap
React Router Sitemap
Generate sitemap.xml by React Router configuration
Stars: ✭ 189 (+231.58%)
Mutual labels:  sitemap
classicpress-seo
Classic SEO is the first SEO plugin built specifically to work with ClassicPress. A fork of Rank Math, the plugin contains many essential SEO tools to help optimize your website.
Stars: ✭ 18 (-68.42%)
Mutual labels:  sitemap
Sitemap Generator Cli
Creates an XML-Sitemap by crawling a given site.
Stars: ✭ 214 (+275.44%)
Mutual labels:  sitemap
Seo
SEO utilities including a unique field type, sitemap & redirect manager
Stars: ✭ 210 (+268.42%)
Mutual labels:  sitemap
Go Sitemap Generator
go-sitemap-generator is the easiest way to generate Sitemaps in Go
Stars: ✭ 152 (+166.67%)
Mutual labels:  sitemap
Silverstripe-SEO
A SilverStripe module to optimise the Meta, crawling, indexing, and sharing of your website content
Stars: ✭ 41 (-28.07%)
Mutual labels:  sitemap
php-sitemap
PHP Simple Sitemap Generator
Stars: ✭ 16 (-71.93%)
Mutual labels:  sitemap
vuepress-plugin-sitemap
Sitemap generator plugin for vuepress.
Stars: ✭ 92 (+61.4%)
Mutual labels:  sitemap
Bard
Developer friendly Bard that writes sitemap poetry in xml.
Stars: ✭ 47 (-17.54%)
Mutual labels:  sitemap
Sitemap
Google sitemap builder for Laravel
Stars: ✭ 243 (+326.32%)
Mutual labels:  sitemap
jsitemapgenerator
Java sitemap generator. This library generates a web sitemap, can ping Google, generate RSS feed, robots.txt and more with friendly, easy to use Java 8 functional style of programming
Stars: ✭ 38 (-33.33%)
Mutual labels:  sitemap
express-sitemap-xml
Serve sitemap.xml from a list of URLs in Express
Stars: ✭ 56 (-1.75%)
Mutual labels:  sitemap
sitemap-webpack-plugin
Webpack plugin to generate a sitemap.
Stars: ✭ 72 (+26.32%)
Mutual labels:  sitemap
Sitemap Generator Crawler
Script that generates a sitemap by crawling a given URL
Stars: ✭ 169 (+196.49%)
Mutual labels:  sitemap
sitemapper
parses sitemaps for Node.JS
Stars: ✭ 70 (+22.81%)
Mutual labels:  sitemap
siteshooter
📷 Automate full website screenshots and PDF generation with multiple viewport support.
Stars: ✭ 63 (+10.53%)
Mutual labels:  sitemap
grav-plugin-sitemap
Grav Sitemap Plugin
Stars: ✭ 34 (-40.35%)
Mutual labels:  sitemap
eventsourcing-go
Event Sourcing + CQRS using Golang Tutorial
Stars: ✭ 75 (+31.58%)
Mutual labels:  sitemap

Build Status Scrutinizer Code Quality Code Climate Test Coverage License Packagist Join the chat at https://gitter.im/VIPnytt/SitemapParser

XML Sitemap parser

An easy-to-use PHP library to parse XML Sitemaps compliant with the Sitemaps.org protocol.

The Sitemaps.org protocol is the leading standard and is supported by Google, Bing, Yahoo, Ask and many others.

SensioLabsInsight

Features

  • Basic parsing
  • Recursive parsing
  • String parsing
  • Custom User-Agent string
  • Proxy support

Formats supported

  • XML .xml
  • Compressed XML .xml.gz
  • Robots.txt rule sheet robots.txt
  • Line separated text (disabled by default)

Requirements:

Installation

The library is available for install via Composer. Just add this to your composer.json file:

{
    "require": {
        "vipnytt/sitemapparser": "^1.0"
    }
}

Then run composer update.

Getting Started

Basic example

Returns an list of URLs only.

use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser();
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getURLs() as $url => $tags) {
        echo $url . '<br>';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}

Advanced

Returns all available tags, for both Sitemaps and URLs.

use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'Sitemap<br>';
        echo 'URL: ' . $url . '<br>';
        echo 'LastMod: ' . $tags['lastmod'] . '<br>';
        echo '<hr>';
    }
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '<br>';
        echo 'LastMod: ' . $tags['lastmod'] . '<br>';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '<br>';
        echo 'Priority: ' . $tags['priority'] . '<br>';
        echo '<hr>';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}

Recursive

Parses any sitemap detected while parsing, to get an complete list of URLs

use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parseRecursive('http://www.google.com/robots.txt');
    echo '<h2>Sitemaps</h2>';
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'URL: ' . $url . '<br>';
        echo 'LastMod: ' . $tags['lastmod'] . '<br>';
        echo '<hr>';
    }
    echo '<h2>URLs</h2>';
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '<br>';
        echo 'LastMod: ' . $tags['lastmod'] . '<br>';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '<br>';
        echo 'Priority: ' . $tags['priority'] . '<br>';
        echo '<hr>';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}

Parsing of line separated text strings

Note: This is disabled by default to avoid false positives when expecting XML, but fetches plain text instead.

To disable strict standards, simply pass this configuration to constructor parameter #2: ['strict' => false].

use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent', ['strict' => false]);
    $parser->parse('https://www.xml-sitemaps.com/urllist.txt');
    foreach ($parser->getSitemaps() as $url => $tags) {
            echo $url . '<br>';
    }
    foreach ($parser->getURLs() as $url => $tags) {
            echo $url . '<br>';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}

Additional examples

Even more examples available in the examples directory.

Configuration

Available configuration options, with their default values:

$config = [
    'strict' => true, // (bool) Disallow parsing of line-separated plain text
    'guzzle' => [
        // GuzzleHttp request options
        // http://docs.guzzlephp.org/en/latest/request-options.html
    ],
];
$parser = new SitemapParser('MyCustomUserAgent', $config);

If an User-agent also is set using the GuzzleHttp request options, it receives the highest priority and replaces the other User-agent.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].