All Projects → khromov → Sitemap Cache Warmer

khromov / Sitemap Cache Warmer

Visits pages based on a sitemap to keep your cache warm

Projects that are alternatives of or similar to Sitemap Cache Warmer

Crypto Touchbar
A script for BetterTouchTool which allows you to track to price of different cryptocurrencies on the touchbar.
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Platform Client
Ushahidi Platform Client, version 3+
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Prettier action
GitHub action for running prettier on your projects pull requests
Stars: ✭ 77 (+2.67%)
Mutual labels:  hacktoberfest
Parse Server Push Adapter
Official Push adapter for parse-server
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Shopware
Shopware 5 Repository - For Shopware 6 visit https://github.com/shopware/platform
Stars: ✭ 1,197 (+1496%)
Mutual labels:  hacktoberfest
Ts Extended Cheatsheet
An extended cheatsheet about TypeScript
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest
Awesome Flutter Layouts
Collection of cool Layouts built with Flutter to Inspire Other UI developers and explore the possibilities of Flutter.
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Smalltalk
Promise-based Alert, Confirm and Prompt replacement
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest
Django Helpdesk
A Django application to manage tickets for an internal helpdesk. Formerly known as Jutda Helpdesk.
Stars: ✭ 1,198 (+1497.33%)
Mutual labels:  hacktoberfest
Mrml
Implementation of mjml in rust
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest
Mattermost Integration Gitlab
GitLab Integration Service for Mattermost
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Ebwiki
repository of police abuse cases against people of color
Stars: ✭ 73 (-2.67%)
Mutual labels:  hacktoberfest
Pipe Rename
Rename your files using your favorite text editor
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest
Volume approximation
Practical volume computation and sampling in high dimensions
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Hassio Addons
The repository for my Home Assistant Supervisor Add-ons.
Stars: ✭ 71 (-5.33%)
Mutual labels:  hacktoberfest
Yii Web
Yii web components
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Dotnet
[MIRROR] Newer mono, .NET languages, and libraries
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Nimoy
A testing and specification framework for Python 3
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest
Awesome Sde Id Medium
😎 Daftar akun Medium.com keren dari para pegiat software engineering di Indonesia
Stars: ✭ 75 (+0%)
Mutual labels:  hacktoberfest
Edualgo
A simple python package having modules of different algorithms to use in educational purposes.
Stars: ✭ 76 (+1.33%)
Mutual labels:  hacktoberfest

Sitemap Cache Warmer

This PHP script crawls URL:s based on a sitemap. It is used to keep your cache warm by visiting all the pages in your sitemap at regular intervals. It supports sub-sitemap (Sitemap index).

Usage

Rename config.php.example to config.php and change the key parameter to a secret value. Upload the file onto your web host, preferably into its own folder (for example, /warm-cache)

Once you have uploaded this file onto your web host, you can visit the following URL to traverse a sitemap and visit all its URL:s:

http://example.com/warm-cache/warm.php?key=SECRET_KEY&url=http://example.com/sitemap.xml&sleep=0
Available parameters

key - Secret key, as entered in config.php (Required) url - URL to the root sitemap, usually /sitemap.xml (Required) sleep - Amount of time to sleep between each request in seconds. Used for throttling on slow hosts. (Optional, default is to not throttle.) from - Number of the url to start with. (Optional, default is 0). to - Number of the url to stop. Useful to test some URLs on a heavy sitemap (Optional, default is till the end of the sitemap)

Scheduling the crawl

You will need to use CRON to schedule the crawls as often as you wish. Here's an example using cURL and crontab to crawl once every hour:

0 * * * * curl "http://example.com/warm-cache/warm.php?key=SECRET_KEY&url=http://example.com/sitemap.xml&sleep=0" >/dev/null 2>&1

If your host provides a CRON URL visiting function, all you need to do is enter the URL, as described in the "Usage" section.

Output

The script will provide a JSON output with stats about the crawl, example:

{
    "status": "OK",
    "message": "Processed sitemap: http://example.com/sitemap.xml",
    "count" : 4,
    "duration": 9.5575199127197,
    "log": [
        "Processed sub-sitemap: http://example.com/post-sitemap.xml",
        "Processed sub-sitemap: http://example.com/page-sitemap.xml",
    ],
    "visited_urls": [
        "http://example.com/page1/",
        "http://example.com/page2/",
        "http://example.com/page3/",
        "http://example.com/page4/",
    ]
}

Reporting unaccessible pages

You can set up mail alert when some URLs cannot be accessed. Just modify your config.php like this:

<?php
return array(
    'key' => '9f316c95a356aab49cf5e4fcf3418295' // Secret key to allow traversing sitemaps
    'reportProblematicUrls' => true,
    'reportProblematicUrlsTo' => "[email protected]"
);

URL is reported whencannot be opened with file_get_contents(). Proper handling of status codes will be added soon.

Using the CLI

Also you can launch the script from the CLI to bypass the common errors of timeout (504) from an Nginx server.

php /whatever/you/have/uploaded/it/warm.php url=http://example.com/sitemap.xml sleep=0 key=SECRET_KEY
php /whatever/you/have/uploaded/it/warm.php url=http://example.com/sitemap.xml sleep=0 key=SECRET_KEY from=10 to=100
php /whatever/you/have/uploaded/it/warm.php url=http://example.com/sitemap.xml sleep=0 key=SECRET_KEY to=25

Crawl strategies

If you employ time-based static page cache, you can schedule your crawls to coincide with half the cache expiration time.

For example, if your expiration time is one hour (3600 seconds), you can schedule the crawls to take place every thirty minutes (1800 seconds).

If you have a lot of pages and few visitors, this may cause increased load on the server. For low-traffic deployments, use a long cache expiration time (24 hours or more) and invalidate cache when page content changes.

Requirements

  • SimpleXML
  • allow_url_fopen in php.ini (Enabled on most hosts)

Compatibility

The plugin has been tested with the WordPress plugins Yoast WordPress SEO and Google XML Sitemaps. It should work with any sitemap which conforms to the sitemap standard.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].