All Projects → ljanyst → scrapy-do

ljanyst / scrapy-do

Licence: BSD-3-Clause license
A daemon for scheduling Scrapy spiders

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to scrapy-do

service-systemd
Setup a node.js app as systemd service.
Stars: ✭ 35 (-41.67%)
Mutual labels:  daemon
gronx
Lightweight, fast and dependency-free Cron expression parser (due checker), task scheduler and/or daemon for Golang (tested on v1.13 and above) and standalone usage
Stars: ✭ 206 (+243.33%)
Mutual labels:  daemon
conceal-explorer
Conceal Explorer - CCX Block Explorer
Stars: ✭ 26 (-56.67%)
Mutual labels:  daemon
WSD-python
Web Services for Devices (WSD) tools and utilities for cross platform support
Stars: ✭ 22 (-63.33%)
Mutual labels:  daemon
daemonize
Template code for writing UNIX-daemons.
Stars: ✭ 33 (-45%)
Mutual labels:  daemon
ggr-ui
The missing /status API for Ggr
Stars: ✭ 37 (-38.33%)
Mutual labels:  daemon
google-music-manager-uploader
Google Music Manager Uploader module / Easily upload MP3 (folder) to Google Music
Stars: ✭ 21 (-65%)
Mutual labels:  daemon
ccxx
This is a cross-platform library software library about c, c ++, unix4, posix. Include gtest, benchmark, cmake, process lock, daemon, libuv, lua, cpython, re2, json, yaml, mysql, redis, opencv, qt, lz4, oci ... https://hub.docker.com/u/oudream
Stars: ✭ 31 (-48.33%)
Mutual labels:  daemon
ProxySwitcher
Easily enable / disable WiFi proxy on a jailbroken iOS device
Stars: ✭ 55 (-8.33%)
Mutual labels:  daemon
deswappify-auto
automatically swap-in pages when enough memory is available
Stars: ✭ 30 (-50%)
Mutual labels:  daemon
sway-alttab
Simple Alt-Tab daemon for SwayWM/i3. Switches back to previous focused window on Alt-Tab or SIGUSR1
Stars: ✭ 36 (-40%)
Mutual labels:  daemon
installer
remote.it command line installer tool
Stars: ✭ 21 (-65%)
Mutual labels:  daemon
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-38.33%)
Mutual labels:  scrapy-framework
cephgeorep
An efficient unidirectional remote backup daemon for CephFS.
Stars: ✭ 27 (-55%)
Mutual labels:  daemon
git-slack-notify
Sends Slack notifications for new commits in Git repositories
Stars: ✭ 12 (-80%)
Mutual labels:  daemon
numad
numad for debian/ubuntu
Stars: ✭ 15 (-75%)
Mutual labels:  daemon
touchtest
MacOS Touch Bar Control Strip daemon
Stars: ✭ 22 (-63.33%)
Mutual labels:  daemon
dxhd
daky's X11 Hotkey Daemon
Stars: ✭ 80 (+33.33%)
Mutual labels:  daemon
break-time
break timer that forces you to take a break
Stars: ✭ 13 (-78.33%)
Mutual labels:  daemon
g910-gkey-macro-support
GKey support for Logitech G910 Keyboard on Linux
Stars: ✭ 85 (+41.67%)
Mutual labels:  daemon

Scrapy Do

https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master PyPI Version

Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It's been inspired by scrapyd but written from scratch. It comes with a REST API, a command line client, and an interactive web interface.

Quick Start

  • Install scrapy-do using pip:

    $ pip install scrapy-do
  • Start the daemon in the foreground:

    $ scrapy-do -n scrapy-do
  • Open another terminal window, download the Scrapy's Quotesbot example, and push the code to the server:

    $ git clone https://github.com/scrapy/quotesbot.git
    $ cd quotesbot
    $ scrapy-do-cl push-project
    +----------------+
    | quotesbot      |
    |----------------|
    | toscrape-css   |
    | toscrape-xpath |
    +----------------+
  • Schedule some jobs:

    $ scrapy-do-cl schedule-job --project quotesbot \
        --spider toscrape-css --when 'every 5 to 15 minutes'
    +--------------------------------------+
    | identifier                           |
    |--------------------------------------|
    | 0a3db618-d8e1-48dc-a557-4e8d705d599c |
    +--------------------------------------+
    
    $ scrapy-do-cl schedule-job --project quotesbot --spider toscrape-css
    +--------------------------------------+
    | identifier                           |
    |--------------------------------------|
    | b3a61347-92ef-4095-bb68-0702270a52b8 |
    +--------------------------------------+
  • See what's going on:

    Active Jobs

    The web interface is available at http://localhost:7654 by default.

Building from source

Both of the steps below require nodejs to be installed.

  • Check if things work fine:

    $ pip install -rrequirements-dev.txt
    $ tox
  • Build the wheel:

    $ python setup.py bdist_wheel

ChangeLog

Version 0.5.0

  • Rewrite the log handling functionality to resolve duplication issues
  • Bump the JavaScript dependencies to resolve browser caching issues
  • Make the error message on failed spider listing more descriptive (Bug #28)
  • Make sure that the spider descriptions and payloads get handled properly on restart (Bug #24)
  • Clarify the documentation on passing arguments to spiders (Bugs #23 and #27)

Version 0.4.0

  • Migration to the Bootstrap 4 UI
  • Make it possible to add a short description to jobs
  • Make it possible to specify user-defined payload in each job that is passed on as a parameter to the python crawler
  • UI updates to support the above
  • New log viewers in the web UI
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].