All Projects → spatie → Robots Txt

spatie / Robots Txt

Licence: mit
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

Labels

Projects that are alternatives of or similar to Robots Txt

Fontobfuscator
字体混淆服务
Stars: ✭ 125 (-11.97%)
Mutual labels:  crawler
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+8030.28%)
Mutual labels:  crawler
Go spider
[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+1128.87%)
Mutual labels:  crawler
Sina Weibo Album Downloader
Multithreading download all HD photos / pictures from someone's Sina Weibo album.
Stars: ✭ 125 (-11.97%)
Mutual labels:  crawler
Mm131
MM131网站图片爬取 🚨
Stars: ✭ 129 (-9.15%)
Mutual labels:  crawler
4chan Downloader
Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation
Stars: ✭ 136 (-4.23%)
Mutual labels:  crawler
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-14.08%)
Mutual labels:  crawler
Amazonbigspider
😱Full Automatic Amazon Distributed Spider | 亚马逊分布式四国际站采集选款产品|账号admin,密码adminadmin
Stars: ✭ 140 (-1.41%)
Mutual labels:  crawler
Red hawk
All in one tool for Information Gathering, Vulnerability Scanning and Crawling. A must have tool for all penetration testers
Stars: ✭ 1,898 (+1236.62%)
Mutual labels:  crawler
Koreanewscrawler
대량의 뉴스 데이터를 수집하기 위해 만들어진 뉴스 크롤러입니다.
Stars: ✭ 138 (-2.82%)
Mutual labels:  crawler
Kuaishou Crawler
As you can see, a kuaishou crawler
Stars: ✭ 126 (-11.27%)
Mutual labels:  crawler
Digger
Digger is a powerful and flexible web crawler implemented by pure golang
Stars: ✭ 130 (-8.45%)
Mutual labels:  crawler
Onegram
This repository is no longer maintained.
Stars: ✭ 137 (-3.52%)
Mutual labels:  crawler
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-11.97%)
Mutual labels:  crawler
Search
An Open Source Search Engine
Stars: ✭ 139 (-2.11%)
Mutual labels:  crawler
Black Widow
GUI based offensive penetration testing tool (Open Source)
Stars: ✭ 124 (-12.68%)
Mutual labels:  crawler
Goclone
Website Cloner - Utilizes powerful Go routines to clone websites to your computer within seconds.
Stars: ✭ 134 (-5.63%)
Mutual labels:  crawler
Oddish
To crawl all csgo skins from website.
Stars: ✭ 139 (-2.11%)
Mutual labels:  crawler
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-2.82%)
Mutual labels:  crawler
Zhihu Spider
一个获取知乎用户主页信息的多线程Python爬虫程序。
Stars: ✭ 137 (-3.52%)
Mutual labels:  crawler

Parse robots.txt, robots meta and headers

Latest Version on Packagist GitHub Tests Action Status Quality Score Total Downloads StyleCI

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers.

Support us

We invest a lot of resources into creating best in class open source packages. You can support us by buying one of our paid products.

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on our contact page. We publish all received postcards on our virtual postcard wall.

Installation

You can install the package via composer:

composer require spatie/robots-txt

Usage

$robots = Robots::create();

$robots->mayIndex('https://www.spatie.be/nl/admin');

$robots->mayFollowOn('https://www.spatie.be/nl/admin');

You can also specify a user agent:

$robots = Robots::create('UserAgent007');

By default, Robots will look for a robots.txt file on https://host.com/robots.txt. Another location can be specified like so:

$robots = Robots::create()
    ->withTxt('https://www.spatie.be/robots-custom.txt');

$robots = Robots::create()
    ->withTxt(__DIR__ . '/public/robots.txt');

Testing

composer test

Changelog

Please see CHANGELOG for more information what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

Postcardware

You're free to use this package, but if it makes it to your production environment we highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using.

Our address is: Spatie, Kruikstraat 22, 2018 Antwerp, Belgium.

We publish all received postcards on our company website.

Credits

License

The MIT License (MIT). Please see License File for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].