Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vantoozz → Proxy Scraper

vantoozz / Proxy Scraper

Licence: mit

Library for scraping free proxies lists

Labels

scraper proxy-list

Projects that are alternatives of or similar to Proxy Scraper

Proxy Scraper

Proxy-Scraper is simple Perl script for scraping proxies from multiple websites.

Stars: ✭ 24 (-69.23%)

Mutual labels: scraper, proxy-list

proxy-scraper

⭐️ A proxy scraper made using Protractor | Proxy list Updates every three hour 🔥

Stars: ✭ 201 (+157.69%)

Mutual labels: scraper, proxy-list

Karate

Webscraper

Stars: ✭ 45 (-42.31%)

Mutual labels: scraper

Jd Autobuy

Python爬虫，京东自动登录，在线抢购商品

Stars: ✭ 1,174 (+1405.13%)

Mutual labels: scraper

Proxy List

A list of free, public, forward proxy servers. UPDATED DAILY!

Stars: ✭ 1,125 (+1342.31%)

Mutual labels: proxy-list

Scrapstagram

An Instagram Scrapper

Stars: ✭ 50 (-35.9%)

Mutual labels: scraper

Pastebin Scraper

Live-scraping pastebin to fight boredom.

Stars: ✭ 66 (-15.38%)

Mutual labels: scraper

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+1212.82%)

Mutual labels: scraper

Pittapi

An API to easily get data from the University of Pittsburgh

Stars: ✭ 74 (-5.13%)

Mutual labels: scraper

Bad Robo

🐙 Get Daily 400-500 Real Followers 👽 [BadRobo] is Best Instagram Bot Available Now with All Features!. Our BOT did not violate any of Instagram's rules, so you don't have to worry about getting ACTION BLOCK!

Stars: ✭ 59 (-24.36%)

Mutual labels: scraper

Skraper

Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, IFunny, VK, Pikabu)

Stars: ✭ 72 (-7.69%)

Mutual labels: scraper

Warta Scrap

Indonesia Index News Crawler, including 10 online media

Stars: ✭ 57 (-26.92%)

Mutual labels: scraper

Pitchfork Npm

An Unofficial Pitchfork Music API client for Node.js

Stars: ✭ 50 (-35.9%)

Mutual labels: scraper

Pitchfork

🎶 Unofficial python API for pitchfork.com reviews.

Stars: ✭ 67 (-14.1%)

Mutual labels: scraper

Social Scraper

Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt

Stars: ✭ 47 (-39.74%)

Mutual labels: scraper

Goscraper

Golang pkg to quickly return a preview of a webpage (title/description/images)

Stars: ✭ 72 (-7.69%)

Mutual labels: scraper

Repository.kodibae

Kodi Bae Repository - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]

Stars: ✭ 45 (-42.31%)

Mutual labels: scraper

Tangerine

Tangerine Bank scraper

Stars: ✭ 54 (-30.77%)

Mutual labels: scraper

Scrape

Distributed Scraper

Stars: ✭ 65 (-16.67%)

Mutual labels: scraper

Instascrape

🚀 A fast and lightweight utility and Python library for downloading posts, stories, and highlights from Instagram.

Stars: ✭ 76 (-2.56%)

Mutual labels: scraper

View All Similar Projects ➔

Proxy Scraper

Library for scraping free proxies lists written in PHP

Quick start

composer require vantoozz/proxy-scraper:~3 guzzlehttp/guzzle:~7 guzzlehttp/psr7 hanneskod/classtools

<?php declare(strict_types=1);

use function Vantoozz\ProxyScraper\proxyScraper;

require_once __DIR__ . '/vendor/autoload.php';

foreach (proxyScraper()->get() as $proxy) {
    echo $proxy . "\n";
}

Older versions

This is version 3 of the library. For version 2 please check v2 branch; for version 1 please check v1 branch.

Upgrade

How to upgrade

Setup

The library requires a PSR-18 compatible HTTP client. To use the library you have to install any of them, e.g.:

composer require guzzlehttp/guzzle:~7 guzzlehttp/psr7

All available clients are listed on Packagist: https://packagist.org/providers/psr/http-client-implementation.

Then install proxy-scraper library itself:

composer require vantoozz/proxy-scraper:~3

Usage

Auto-configuration

The simplest way to start using the library is to use proxyScraper() function which instantiates and configures all the scrapers.

Please note, auto-configuration function in addition to guzzlehttp/guzzle:~7 and guzzlehttp/psr7 requires hanneskod/classtools dependency.

composer require guzzlehttp/guzzle:~7 guzzlehttp/psr7 hanneskod/classtools

<?php declare(strict_types=1);

use function Vantoozz\ProxyScraper\proxyScraper;

require_once __DIR__ . '/vendor/autoload.php';

foreach (proxyScraper()->get() as $proxy) {
    echo $proxy . "\n";
}

HTTP Client

In not using auto-configuration you will need an HTTP client.

The library provides guzzleHttpClient() function creating and configuring the client.

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Exceptions\ScraperException;

use function Vantoozz\ProxyScraper\guzzleHttpClient;
use function Vantoozz\ProxyScraper\proxyScraper;

require_once __DIR__ . '/vendor/autoload.php';

$httpClient = guzzleHttpClient();

$scraper = proxyScraper($httpClient);

try {
    echo $scraper->get()->current()->getIpv4(). "\n";
} catch (ScraperException $e) {
    echo $e->getMessage() . "\n";
}

You can create own HTTP client by implementing HttpClientInterface:

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Exceptions\ScraperException;
use Vantoozz\ProxyScraper\HttpClient\HttpClientInterface;

use function Vantoozz\ProxyScraper\proxyScraper;

require_once __DIR__ . '/vendor/autoload.php';

$httpClient = new class implements HttpClientInterface {
    /**
     * @param string $uri
     * @return string
     */
    public function get(string $uri): string
    {
        return "some string";
    }
};

$scraper = proxyScraper($httpClient);

try {
    echo $scraper->get()->current()->getIpv4(). "\n";
} catch (ScraperException $e) {
    echo $e->getMessage() . "\n";
}

Of course, you may manually configure the scraper and underlying HTTP client:

Single scraper

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Scrapers;

use function Vantoozz\ProxyScraper\guzzleHttpClient;

require_once __DIR__ . '/vendor/autoload.php';

$scraper = new Scrapers\UsProxyScraper(guzzleHttpClient());

foreach ($scraper->get() as $proxy) {
    echo $proxy . "\n";
}

Composite scraper

You can easily get data from many scrapers at once:

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Scrapers;

use function Vantoozz\ProxyScraper\guzzleHttpClient;

require_once __DIR__ . '/vendor/autoload.php';

$httpClient = guzzleHttpClient();

$compositeScraper = new Scrapers\CompositeScraper;

$compositeScraper->addScraper(new Scrapers\FreeProxyListScraper($httpClient));
$compositeScraper->addScraper(new Scrapers\CoolProxyScraper($httpClient));
$compositeScraper->addScraper(new Scrapers\SocksProxyScraper($httpClient));

foreach ($compositeScraper->get() as $proxy) {
    echo $proxy . "\n";
}

Error handling

Sometimes things go wrong. This example shows how to handle errors while getting data from many scrapers:

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Exceptions\ScraperException;
use Vantoozz\ProxyScraper\Ipv4;
use Vantoozz\ProxyScraper\Port;
use Vantoozz\ProxyScraper\Proxy;
use Vantoozz\ProxyScraper\Scrapers;

require_once __DIR__ . '/vendor/autoload.php';

$compositeScraper = new Scrapers\CompositeScraper;

// Set exception handler
$compositeScraper->handleScraperExceptionWith(function (ScraperException $e) {
    echo 'An error occurred: ' . $e->getMessage() . "\n";
});

// Fake scraper throwing an exception
$compositeScraper->addScraper(new class implements Scrapers\ScraperInterface {
    public function get(): Generator
    {
        throw new ScraperException('some error');
    }
});

// Fake scraper with no exceptions
$compositeScraper->addScraper(new class implements Scrapers\ScraperInterface {
    public function get(): Generator
    {
        yield new Proxy(new Ipv4('192.168.0.1'), new Port(8888));
    }
});

//Run composite scraper
foreach ($compositeScraper->get() as $proxy) {
    echo $proxy . "\n";
}

Will output

An error occurred: some error
192.168.0.1:8888

In the same manner you may configure exceptions handling for the scraper created with proxyScraper() function as it returns an instance of CompositeScraper:

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Exceptions\ScraperException;
use function Vantoozz\ProxyScraper\proxyScraper;

require_once __DIR__ . '/vendor/autoload.php';

$scraper = proxyScraper();

$scraper->handleScraperExceptionWith(function (ScraperException $e) {
    echo 'An error occurs: ' . $e->getMessage() . "\n";
});

Validating proxies

Validation steps may be added:

<?php declare(strict_types = 1);

use Vantoozz\ProxyScraper\Exceptions\ValidationException;
use Vantoozz\ProxyScraper\Ipv4;
use Vantoozz\ProxyScraper\Port;
use Vantoozz\ProxyScraper\Proxy;
use Vantoozz\ProxyScraper\Scrapers;
use Vantoozz\ProxyScraper\Validators;

require_once __DIR__ . '/vendor/autoload.php';

$scraper = new class implements Scrapers\ScraperInterface
{
    public function get(): \Generator
    {
        yield new Proxy(new Ipv4('104.202.117.106'), new Port(1234));
        yield new Proxy(new Ipv4('192.168.0.1'), new Port(8888));
    }
};

$validator = new Validators\ValidatorPipeline;
$validator->addStep(new Validators\Ipv4RangeValidator);

foreach ($scraper->get() as $proxy) {
    try {
        $validator->validate($proxy);
        echo '[OK] ' . $proxy . "\n";
    } catch (ValidationException $e) {
        echo '[Error] ' . $e->getMessage() . ': ' . $proxy . "\n";
    }
}

Will output

[OK] 104.202.117.106:1234
[Error] IPv4 is in private range: 192.168.0.1:8888

Metrics

A Proxy object may have metrics (metadata) associated with.

By default, Proxy object has source metric:

<?php declare(strict_types=1);

use Vantoozz\ProxyScraper\Proxy;
use Vantoozz\ProxyScraper\Scrapers;

use function Vantoozz\ProxyScraper\guzzleHttpClient;

require_once __DIR__ . '/vendor/autoload.php';

$scraper = new Scrapers\UsProxyScraper(guzzleHttpClient());

/** @var Proxy $proxy */
$proxy = $scraper->get()->current();

foreach ($proxy->getMetrics() as $metric) {
    echo $metric->getName() . ': ' . $metric->getValue() . "\n";
}

Will output

source: Vantoozz\ProxyScraper\Scrapers\UsProxyScraper

Note. Examples use Guzzle as HTTP client.

Testing

Unit tests

./vendor/bin/phpunit --testsuite=unit

Integration tests

./vendor/bin/phpunit --testsuite=integration

System tests

php ./tests/systemTests.php

Upgrade from version 2

The biggest difference from version 2 is the HTTP client configuration.

Instead of

$httpClient = new \Vantoozz\ProxyScraper\HttpClient\Psr18HttpClient(
    new \Http\Adapter\Guzzle6\Client(new \GuzzleHttp\Client([
        'connect_timeout' => 2,
        'timeout' => 3,
    ])),
    new \Http\Message\MessageFactory\GuzzleMessageFactory
);

the client should be instantiated like

$httpClient = \Vantoozz\ProxyScraper\guzzleHttpClient();

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 78

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗