All Projects β†’ veliovgroup β†’ jazeee-meteor-spiderable

veliovgroup / jazeee-meteor-spiderable

Licence: other
Fork of Meteor Spiderable with longer timeout, caching, better server handling

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to jazeee-meteor-spiderable

Meteor-flow-router-title
Change document.title on the fly within flow-router
Stars: ✭ 23 (-30.3%)
Mutual labels:  meteor-package, seo, meteorjs
Meteor-logger-mongo
πŸƒ Meteor Logging: Store application log messages in MongoDB
Stars: ✭ 20 (-39.39%)
Mutual labels:  meteor-package, meteorjs
flow-router
🚦 Carefully extended flow-router for Meteor
Stars: ✭ 191 (+478.79%)
Mutual labels:  meteor-package, meteorjs
Meteor-logger-file
πŸ”– Meteor Logging: Store application log messages into file (FS)
Stars: ✭ 24 (-27.27%)
Mutual labels:  meteor-package, meteorjs
siteshooter
πŸ“· Automate full website screenshots and PDF generation with multiple viewport support.
Stars: ✭ 63 (+90.91%)
Mutual labels:  phantomjs, seo
awesome-blaze
πŸ”₯A curated list of awesome things related to Blaze
Stars: ✭ 29 (-12.12%)
Mutual labels:  meteor-package, meteorjs
hypersubs
an upgraded version of Meteor subscribe, which helps optimize data and performance!
Stars: ✭ 13 (-60.61%)
Mutual labels:  meteor-package, meteorjs
Meteor-logger
🧾 Meteor isomorphic logger. Store application logs in File (FS), MongoDB, or print in Console
Stars: ✭ 51 (+54.55%)
Mutual labels:  meteor-package, meteorjs
svelte-meteor-data
Reactively track Meteor data inside Svelte components
Stars: ✭ 14 (-57.58%)
Mutual labels:  meteor-package, meteorjs
spiderable-middleware
πŸ€– Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks
Stars: ✭ 29 (-12.12%)
Mutual labels:  meteor-package, seo
Meteor-Template-helpers
Template helpers for Session, logical operations and debug
Stars: ✭ 35 (+6.06%)
Mutual labels:  meteor-package, meteorjs
classicpress-seo
Classic SEO is the first SEO plugin built specifically to work with ClassicPress. A fork of Rank Math, the plugin contains many essential SEO tools to help optimize your website.
Stars: ✭ 18 (-45.45%)
Mutual labels:  seo
yrewrite scheme
Stellt verschiedene URL-Schemes fΓΌr YRewrite mehrsprachig zur VerfΓΌgung
Stars: ✭ 33 (+0%)
Mutual labels:  seo
vuepress-plugin-sitemap
Sitemap generator plugin for vuepress.
Stars: ✭ 92 (+178.79%)
Mutual labels:  seo
seo renderer
A Flutter Web Plugin to display Text Widget as Html for SEO purpose
Stars: ✭ 103 (+212.12%)
Mutual labels:  seo
text-generator
Golang text generator for generate SEO texts
Stars: ✭ 18 (-45.45%)
Mutual labels:  seo
Client-Storage
πŸ—„ Bulletproof persistent Client storage, works with disabled Cookies and/or localStorage
Stars: ✭ 15 (-54.55%)
Mutual labels:  meteor-package
MeteorCandy-meteor-admin-dashboard-devtool
The Fast, Secure and Scalable Admin Panel / Dashboard for Meteor.js
Stars: ✭ 50 (+51.52%)
Mutual labels:  meteorjs
Meteor-Cookies
πŸͺ Isomorphic bulletproof cookie functions for client and server
Stars: ✭ 41 (+24.24%)
Mutual labels:  meteor-package
example-app
Example app showcasing fulls1z3's Angular libraries
Stars: ✭ 27 (-18.18%)
Mutual labels:  seo

spiderable-longer-timeout

About

This is a fork of the standard meteor spiderable package, with some merged code from ongoworks:spiderable package. Primarily, this lengthens the timeout to 30 seconds and size limit to 10MB. All results will be cached to Mongo collection, by default for 3 hours (180 minutes).

This package will ignore all SSL error in favor of page fetching.

This package supports "real response-code" and "real headers", this means if your route returns 301 response code with some headers the package will return the same headers. This package also has support for JavaScript redirects.

phantomjs and consequently this package doesn't support ES6 (ECMAScript 2015), if you're not compiling ES6 to ES5, or using NPM packages written in ES6 (Meteor doesn't compile NPM packages) it will result in blank pages after rendering. There is no easy way to solve it with drop-in package/solution. We recommend to solve it with prerendering by ostr.io, which has ES6 (ECMAScript 2015) support and can be installed with one command.

This package tested with iron-router, flow-router, and flow-router-extra with and without next packages:

This package has build-in caching mechanism, by default it stores results for 3 hours, to change storing period set Spiderable.cacheLifetimeInMinutes to other value in minutes.

Installation

meteor add jazeee:spiderable-longer-timeout

ES6 import

import { Spiderable } from 'meteor/jazeee:spiderable-longer-timeout';

Setup:

SPIDERABLE_FLAGS environment variable

Issues like select: Invalid argument can be easily solved with additional phantomjs process flags (arguments). Default flags:

phantomjs --load-images=no --ssl-protocol=TLSv1 --ignore-ssl-errors=true --web-security=false

SSL/TLS issues:

SPIDERABLE_FLAGS="--ssl-protocol=any"

Caching - minor speed increase (make sure /data/phantomjs directory exists and writable):

SPIDERABLE_FLAGS="--disk-cache=true --disk-cache-path=/data/phantomjs"

Cookies and localStorage (make sure /data/phantomjs directory exists and writable):

SPIDERABLE_FLAGS="--cookies-file=/data/phantomjs/cookies.txt --local-storage-path=/data/phantomjs"

AppCache (make sure /data/phantomjs directory exists and writable):

SPIDERABLE_FLAGS="--offline-storage-path=/data/phantomjs"

XHR and parent <-> child window access:

SPIDERABLE_FLAGS="--local-to-remote-url-access=true"

All flags (make sure /data/phantomjs directory and /data/phantomjs/cookies.txt file exists and writable):

SPIDERABLE_FLAGS="--load-images=false --ssl-protocol=any --ignore-ssl-errors=true --disk-cache=true --disk-cache-path=/data/phantomjs --cookies-file=/data/phantomjs/cookies.txt --local-storage-path=/data/phantomjs --local-to-remote-url-access=true --offline-storage-path=/data/phantomjs --web-security=false"

Usage:

# To start process with env.var
SPIDERABLE_FLAGS="--load-images=false --ssl-protocol=any --ignore-ssl-errors=true" meteor

# Set temporary env.var
export SPIDERABLE_FLAGS="--load-images=false --ssl-protocol=any --ignore-ssl-errors=true"

Within Phusion Passenger:

server {
  passenger_env_var SPIDERABLE_FLAGS "--load-images=false --ssl-protocol=any --ignore-ssl-errors=true";
}
isReadyForSpiderable {Boolean}

On server and client, this instructs Spiderable that everything is ready. Spiderable will wait for Meteor.isReadyForSpiderable to be true, which allows for finer control about when content is ready to be published.

Router.onAfterAction( function () {
  if (this.ready()) {
    Meteor.isReadyForSpiderable = true;
  }
});

Options

userAgentRegExps {[RegExp]}

An array of Regular Expressions, of bot's user agents that we want to serve statically, but do not obey the _escaped_fragment_ protocol. Optionally set or extend Spiderable.userAgentRegExps list.

Spiderable.userAgentRegExps.push(/^vkShare/i);

Default Bots:

  • /360spider/i
  • /adsbot-google/i
  • /ahrefsbot/i
  • /applebot/i
  • /baiduspider/i
  • /bingbot/i
  • /duckduckbot/i
  • /facebookbot/i
  • /facebookexternalhit/i
  • /google-structured-data-testing-tool/i
  • /googlebot/i
  • /instagram/i
  • /kaz\.kz_bot/i
  • /linkedinbot/i
  • /mail\.ru_bot/i
  • /mediapartners-google/i
  • /mj12bot/i
  • /msnbot/i
  • /msrbot/i
  • /oovoo/i
  • /orangebot/i
  • /pinterest/i
  • /redditbot/i
  • /sitelockspider/i
  • /skypeuripreview/i
  • /slackbot/i
  • /sputnikbot/i
  • /tweetmemebot/i
  • /twitterbot/i
  • /viber/i
  • /vkshare/i
  • /whatsapp/i
  • /yahoo/i
  • /yandex/
cacheLifetimeInMinutes (Cache TTL) {Number}

How long cached Spiderable results should be stored (in minutes). Note:

  • Should be set before Meteor.startup
  • Value should be {Number} in minutes
  • To set a new cache lifetime you need to drop index on createdAt_1.
  • Default value: 180 (3 hours)
Spiderable.cacheLifetimeInMinutes = 60; // 1 hour in minutes

If you want to change your cache lifetime, first - drop the cache index. To drop the cache index, run in Mongo console:

db.SpiderableCacheCollection.dropIndex('createdAt_1');
/* or */
db.SpiderableCacheCollection.dropIndexes();
ignoredRoutes {[String]}

Spiderable.ignoredRoutes - is array of strings, routes that we want to serve statically, but do not obey the _escaped_fragment_ protocol. This is a server only parameter. For more info see this thread.

Spiderable.ignoredRoutes.push('/cdn/storage/Files/');
customQuery {Boolean|String}

Spiderable.customQuery - additional get query will be appended to http request. This option may help to build different client's logic for requests from phantomjs and normal users

  • If true - Spiderable will append ___isRunningPhantomJS___=true to the query
  • If String - Spiderable will append String=true to the query
Spiderable.customQuery = true;
// or
Spiderable.customQuery = '_fromPhantom_'

// Usage:
Router.onAfterAction( function () {
  if(Meteor.isClient && _.has(this.params.query, '___isRunningPhantomJS___') {
    Session.set('___isRunningPhantomJS___', true);
  }
});
debug {Boolean}

Show/hide server's console messages, set Spiderable.debug to true to show server's console messages

  • Default value: false
Spiderable.debug = true;
bufferSize {Number}

Memory allocation for PhantomJS (in bytes). Note:

  • Should be set before Meteor.startup
  • Value should be {Number} in bytes
  • Default value: 10485760 (10MB)
Spiderable.bufferSize = 10 * 1024 * 1024; // 10MB in bytes
requestTimeout {Number}

Request timeout length. Note:

  • Should be set before Meteor.startup
  • Value should be {Number} in milliseconds
  • Default value: 30000 (30 seconds)
Spiderable.requestTimeout = 30 * 1000; // 30 seconds in minutes
Response statuses

You able to send any response status from phantomjs, this behavior may be easily controlled via special HTML/JADE comment:

  • 201 - <!-- response:status-code=201 -->
  • 401 - <!-- response:status-code=401 -->
  • 403 - <!-- response:status-code=403 -->
  • 500 - <!-- response:status-code=500 -->

This directive accepts any 3-digit value, so you may return any standard or custom response code.

Enable default 404 response if you're using Iron-Router
  • Create template which you prefer to return, when page is not found
  • Set iron router's notFoundTemplate
  • Include a comment <!-- response:status-code=404 --> on your template. This way, we can ensure spiderable sends a 404 status code in the response headers
  • Enable iron router's dataNotFound plugin. See below or read more about iron-router plugins
Router.configure({
  notFoundTemplate: '_404'
});

Router.plugin('dataNotFound', {
  notFoundTemplate: Router.options.notFoundTemplate
});
template(name="_404")
  // response:status-code=404
  h1 404
  h3 Oops, page not found
  p Sorry, page you're requested is not exists or was deleted
<template name="_404">
  <!--response:status-code=404-->
  <h1>404</h1>
  <h3>Oops, page not found</h3>
  <p>Sorry, page you're requested is not exists or was deleted</p>
</template>
Enable default 404 response if you're using Flow-Router
  • Create template which you prefer to return, when page is not found
  • Include a comment <!-- response:status-code=404 --> on your template. This way, we can ensure spiderable sends a 404 status code in the response headers
  • Set flow router's notFound property. See below or read more about flow-router not found routes
// With layout
FlowRouter.notFound = {
  action() {
    BlazeLayout.render('_layout', {content: '_404'});
  }
}

// Without layout
FlowRouter.notFound = {
  action() {
    BlazeLayout.render('_404');
  }
}
template(name="_404")
  // response:status-code=404
  h1 404
  h3 Oops, page not found
  p Sorry, page you're requested is not exists or was deleted
<template name="_404">
  <!--response:status-code=404-->
  <h1>404</h1>
  <h3>Oops, page not found</h3>
  <p>Sorry, page you're requested is not exists or was deleted</p>
</template>
Supported redirects
window.location.href = 'http://example.com/another/page';
window.location.replace 'http://example.com/another/page';

Router.go('/another/page');
Router.current().redirect('/another/page');
Router.route('/one', function () {
  this.redirect('/another/page');
});

Important

Set Meteor.isReadyForSpiderable to true when your route is finished, in order to publish. Deprecated Meteor.isRouteComplete=true, but it will work until at least 2015-12-31 after which I'll remove it... See code for details

Install PhantomJS on your server

If you deploy your application with meteor bundle, you must install phantomjs (http://phantomjs.org) somewhere in your $PATH. If you use Meteor Up, then meteor deploy can do this for you.

Spiderable.originalRequest is also set to the http request. See issue 1.

Testing

Test your site by appending a query to your URLs: URL?_escaped_fragment_= as in http://your.site.com/path_escaped_fragment_=

curl

curl your localhost or host name, if you on production, like:

curl http://localhost:3000/?_escaped_fragment_=
curl http://localhost:3000/ -A googlebot
Google Tools: Fetch as Google

Use Fetch as Google tools to scan your site. Tips:

  • Observe your server logs using tail -f or mup logs -f
  • Fetch as Google and observe that it takes 3-5 minutes before displaying results.
    • Use an uncommon URL to help you identify your request in the logs. Consider adding an extra URL query parameter. For example:
# Simple test with test=1 query
curl "http://localhost:3002/blogs?_escaped_fragment_=&test=1"
# Set the date in the query, which will show up in Meteor logs, with a unique date. (Turn on `Spiderable.debug=true`)
TEST=`date "+%Y%m%d-%H%M%S"`; echo $TEST; curl "http://localhost:3000/blogs?_escaped_fragment_=&test=${TEST}"

Interpreting Fetch as Google results:

  • The tool will not actually hit your server right away.
  • It appears to provide a simple scan result without the extra ?_escaped_fragment_= component.
  • Wait several minutes more. Google appears to request the page, which will show up in your logs as Spiderable successfully completed.
  • Search on Google using site:your.site.com
  • Make sure Google lists all relevant pages.
  • Look at Google's cached version of the pages, to make sure it is fully rendered.
  • Make sure that Google sees the pages with all data subscriptions complete.
Testing PhantomJS

PhantomJS can be temperamental, and can be a challenge to work with.

If PhantomJS is failing on your server, you can try running it directly to help debug what is broken.

On the server console, try running phantomjs --version

Also, you can run this package's PhantomJS script. In order to do so, you'd need to find the phantom_script.js file.

# Find phantom_script.js
PHANTOM_SCRIPT=$(find /opt/YOUR_WEB_APP/app/ -name phantom_script.js)
# Verify that you found just one
echo ${PHANTOM_SCRIPT}
# Try running phantomjs with that script
phantomjs --load-images=no --ssl-protocol=TLSv1 --ignore-ssl-errors=true --web-security=false ${PHANTOM_SCRIPT}    http://localhost
# Verify that it succeeded (should return 0)
echo $?

From Meteor's original Spiderable documentation. See notes specific to this branch (above).

spiderable is part of Webapp. It's one possible way to allow web search engines to index a Meteor application. It uses the AJAX Crawling specification published by Google to serve HTML to compatible spiders (Google, Bing, Yandex, and more).

When a spider requests an HTML snapshot of a page the Meteor server runs the client half of the application inside phantomjs, a headless browser, and returns the full HTML generated by the client code.

In order to have links between multiple pages on a site visible to spiders, apps must use real links (eg <a href="https://github.com/about">) rather than simply re-rendering portions of the page when an element is clicked. Apps should render their content based on the URL of the page and can use HTML5 pushState to alter the URL on the client without triggering a page reload. See the Todos example for a demonstration.

When running your page, spiderable will wait for all publications to be ready. Make sure that all of your publish functions either return a cursor (or an array of cursors), or eventually call this.ready(). Otherwise, the phantomjs executions will fail.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].