All Projects â†’ Mediashare â†’ Spider

Mediashare / Spider

Licence: MIT license
💫 Spider is a PHP library with easily module integration for crawling website that allows you to scrape informations.

Programming Languages

PHP
23972 projects - #3 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to Spider

silverstripe-seo
An all-in-one SEO module for SilverStripe 4.1+
Stars: ✭ 35 (+150%)
Mutual labels:  seo-optimization
framework
A stylish PHP application framework crafted using Slim, Twig, Eloquent and Sentinel designed to get you from clone to production in a matter of minutes.
Stars: ✭ 56 (+300%)
Mutual labels:  seo-optimization
Rendora
dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites
Stars: ✭ 1,853 (+13135.71%)
Mutual labels:  seo-optimization
pagespeedParseR
pagespeedParseR is an R wrapper for Google Pagespeed Insights API, that also enables convenient parsing
Stars: ✭ 20 (+42.86%)
Mutual labels:  seo-optimization
php-text-generator
Fast SEO text generator on a mask.
Stars: ✭ 19 (+35.71%)
Mutual labels:  seo-optimization
spiderable-middleware
🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks
Stars: ✭ 29 (+107.14%)
Mutual labels:  seo-optimization
drupal 8 unset html head link
🤖 Module for unset any wrong HTML links (like rel="delete-form", rel="edit-form", etc.) from head on Drupal 8.x websites. This is trust way to grow up position in SERP Google, Yandex, etc.
Stars: ✭ 19 (+35.71%)
Mutual labels:  seo-optimization
stweet
Advanced python library to scrap Twitter (tweets, users) from unofficial API
Stars: ✭ 287 (+1950%)
Mutual labels:  scrapper
people-also-ask
People also ask Google scraper. Get as many questions as you need to optimize your site for voice or new content ideas or answering questions about your desired topic.
Stars: ✭ 39 (+178.57%)
Mutual labels:  seo-optimization
Google-rank-tracker
SEO: Python script + shell script and cronjob to check ranks on a daily basis
Stars: ✭ 124 (+785.71%)
Mutual labels:  seo-optimization
magento2-module-seo
Magento 2 Module for Search Engine Optimization
Stars: ✭ 100 (+614.29%)
Mutual labels:  seo-optimization
Silverstripe-SEO
A SilverStripe module to optimise the Meta, crawling, indexing, and sharing of your website content
Stars: ✭ 41 (+192.86%)
Mutual labels:  seo-optimization
event-jekyll-theme
Jekyll Theme package for your event
Stars: ✭ 119 (+750%)
Mutual labels:  seo-optimization
awesome-search-engine-optimization
A curated list of backlink, social signal opportunities, and link building strategies and tactics to help improve search engine results and ranking.
Stars: ✭ 82 (+485.71%)
Mutual labels:  seo-optimization
SeoTags
SeoTags create all SEO tags you need such as meta, link, twitter card (twitter:), open graph (og:), and JSON-LD schema (structred data).
Stars: ✭ 113 (+707.14%)
Mutual labels:  seo-optimization
poke
A simple tool to check your site for broken links, media, iframes, stylesheets, scripts, forms or metadata.
Stars: ✭ 24 (+71.43%)
Mutual labels:  seo-optimization
SEO-Manager-Electron
Generates SEO Report Easily
Stars: ✭ 24 (+71.43%)
Mutual labels:  seo-optimization
DNZ.SEOChecker
SEO Checker and Recommander Plugin (like wordpress Yoast) for ASP.NET Core.
Stars: ✭ 18 (+28.57%)
Mutual labels:  seo-optimization
shopee-inventory-bot
"I Make dropshiper's job easier" ~ Python Shopee Inventory Bot
Stars: ✭ 21 (+50%)
Mutual labels:  scrapper
ecommercetools
EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
Stars: ✭ 41 (+192.86%)
Mutual labels:  seo-optimization

Spider

💫 Spider is a PHP library with easily module integration for crawling website that allows you to scrape informations.

Spider is a crawler of website modulable write in PHP. The tool allows you to retrieve information and execute code on website pages. It can be useful for SEO or security audit purposes. Users have the possibility to use the modules created by the community or to create their own modules (written in PHP via a web interface).

What is a Crawler?

A crawler is an indexing robot, it automatically explores the pages of a website. Using a crawler can have several interests:

  • Information search & retrieval
  • Validation of the SEO of your website
  • Integration test
  • Execution of PHP code on several pages in an automated way

Features

  • Get all links from website
  • Check HTTP response
  • Create your own Modules (Crawl & execute your PHP code)
  • No database, Pure PHP
  • Output json file
  • Use default modules from the kernel for basic SEO audit. (Metadata, Images, HttpCode, Links...)

Libraries

I would be happy to receive your ideas and contributions to the project 😃

Getting started

Installation

Composer Usage

Use Spider library in your project & create your own modules.

composer require mediashare/spider
Usage
<?php
// ./index.php
require 'vendor/autoload.php';
use Mediashare\Spider\Entity\Config;
use Mediashare\Spider\Entity\Url;
use Mediashare\Spider\Spider;

// Website Config
$config = new Config();
$config->setWebspider(true); // Crawl all website
$config->setPathRequires(['/Kernel/']); // Not crawl other path
$config->setPathExceptions(['/CodeSnippet/']); // Not crawl this path
// Modules
$config->setReportsDir(__DIR__.'/reports/'); // Reports path
$config->setModulesDir(__DIR__.'/modules/'); // Modules path
$config->enableDefaultModule(true); // Enable default SEO kernel modules
$config->removeModule('FileDownload'); // Disable Module
// Prompt Console / Dump
$config->setVerbose(true); // Prompt verbose output
$config->setJson(false); // Prompt json output

// Url
$url = new Url('https://mediashare.fr');

// Run Spider
$spider = new Spider($url, $config);
$result = $spider->run();

Github

git clone https://github.com/Mediashare/Spider
cd Spider
composer install
Execute the code from the console.
bin/console spider:run https://mediashare.fr

Binary file

curl -O https://raw.githubusercontent.com/Mediashare/Spider/master/spider.phar
chmod 755 spider.phar
Execute the code from the console.
./spider.phar spider:run https://mediashare.fr

Modules

Modules are tools created by the community to add features when crawling a website. Adding a module to a crawler allows the automation of code execution on one or more pages of a website. More information...

Requierements

  • The name of your class needs to be the same as the name of the .php file.
  • The entry point for executing modules is the run() function, so it is mandatory to have a run() function in your module.

Documentation

DomCrawler is symfony component for DOM navigation for HTML and XML documents. You can retrieve Documentation Here.

Create own module to execute actions with data scraped.

bin/console spider:module Href
<?php
// ./modules/Href.php
namespace Mediashare\Modules;

class Href {
    public $dom;
    public function run() { 
        $links = [];
        foreach($this->dom->filter('a') as $link) {
            if (!empty($link)) {
                $href = rtrim(ltrim($link->getAttribute('href')));
                if ($href) {
                    if (isset($links[$href])) {
                        $links[$href]['counter']++;
                    } else {
                        $links[$href]['counter'] = 1;
                    }
                }
            }
        }
        return $links;
    }
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].