All Projects → scrapinghub → Scrapyrt

scrapinghub / Scrapyrt

Licence: bsd-3-clause
HTTP API for Scrapy spiders

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scrapyrt

Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+705.18%)
Mutual labels:  crawler, scraper, crawling
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+659.34%)
Mutual labels:  crawler, scraper, crawling
Scrapoxy
Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Stars: ✭ 1,322 (+107.54%)
Mutual labels:  crawler, scraper, scrapy
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-84.3%)
Mutual labels:  crawler, scrapy, crawling
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+2338.78%)
Mutual labels:  crawler, scraper, crawling
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+23.86%)
Mutual labels:  crawler, scraper, crawling
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+1712.4%)
Mutual labels:  crawler, scraper, crawling
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (-30.93%)
Mutual labels:  crawler, scraper, crawling
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (-8.48%)
Mutual labels:  crawler, scrapy, crawling
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (-70.17%)
Mutual labels:  crawler, scraper, scrapy
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-73.16%)
Mutual labels:  crawler, scraper, crawling
bots-zoo
No description or website provided.
Stars: ✭ 59 (-90.74%)
Mutual labels:  crawler, scraper, crawling
Ruiji.net
crawler framework, distributed crawler extractor
Stars: ✭ 220 (-65.46%)
Mutual labels:  crawler, scraper, scrapy
Fbcrawl
A Facebook crawler
Stars: ✭ 536 (-15.86%)
Mutual labels:  crawler, scraper, scrapy
Gosint
OSINT Swiss Army Knife
Stars: ✭ 401 (-37.05%)
Mutual labels:  crawler, scraper
Bookcorpus
Crawl BookCorpus
Stars: ✭ 443 (-30.46%)
Mutual labels:  crawler, scraper
Advanced Web Scraping Tutorial
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
Stars: ✭ 384 (-39.72%)
Mutual labels:  scraper, scrapy
Scrapedin
LinkedIn Scraper (currently working 2020)
Stars: ✭ 453 (-28.89%)
Mutual labels:  crawler, scraper
Wechatsogou
基于搜狗微信搜索的微信公众号爬虫接口
Stars: ✭ 5,220 (+719.47%)
Mutual labels:  crawler, scrapy
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-42.86%)
Mutual labels:  crawler, crawling

========================== Scrapyrt (Scrapy realtime)

.. image:: https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg :target: https://github.com/scrapinghub/scrapyrt/actions

.. image:: https://img.shields.io/pypi/pyversions/scrapyrt.svg :target: https://pypi.python.org/pypi/scrapyrt

.. image:: https://img.shields.io/pypi/v/scrapyrt.svg :target: https://pypi.python.org/pypi/scrapyrt

.. image:: https://img.shields.io/pypi/l/scrapyrt.svg :target: https://pypi.python.org/pypi/scrapyrt

.. image:: https://img.shields.io/pypi/dm/scrapyrt.svg :target: https://pypistats.org/packages/scrapyrt :alt: Downloads count

.. image:: https://readthedocs.org/projects/scrapyrt/badge/?version=latest :target: https://scrapyrt.readthedocs.io/en/latest/api.html

Introduction

HTTP server which provides API for scheduling Scrapy <https://scrapy.org/>_ spiders and making requests with spiders.

Features

  • Allows you to easily add HTTP API to your existing Scrapy project
  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported out of the box.
  • You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

Note

  • Project is not a replacement for Scrapyd <https://scrapyd.readthedocs.io/en/stable/>_ or Scrapy Cloud <https://www.zyte.com/scrapy-cloud/>_ or other infrastructure to run long running crawls
  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return response

Getting started

To install Scrapyrt::

pip install scrapyrt

Now you can run Scrapyrt from within Scrapy project by just typing::

scrapyrt

in Scrapy project directory.

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won't find one. Note that you need to have all your project requirements installed.

Scrapyrt supports endpoint /crawl.json that can be requested with two methods: GET and POST.

To run sample toscrape-css spider from Quotesbot <https://github.com/scrapy/quotesbot>_ parsing page about famous quotes::

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

To run same spider only allowing one request and parsing url with callback parse_foo::

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/&callback=parse_foo&max_requests=1"

Documentation

Documentation is available on readthedocs <http://scrapyrt.readthedocs.org/en/latest/index.html>_.

Support

Open source support is provided here in Github. Please create a question issue_ (ie. issue with "question" label).

Commercial support is also available by Zyte_.

.. _create a question issue: https://github.com/scrapinghub/scrapyrt/issues/new?labels=question .. _Zyte: http://zyte.com

License

ScrapyRT is offered under BSD 3-Clause license <https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)>_.

Development

Development taking place on Github <https://github.com/scrapinghub/scrapyrt>_.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].