All Projects → binux → Pyspider

binux / Pyspider

Licence: apache-2.0
A Powerful Spider(Web Crawler) System in Python.

Programming Languages

113752 projects - #7 most used programming language
150731 projects - #8 most used programming language
54445 projects
40938 projects


Projects that are alternatives of or similar to Pyspider

Sitemap Generator Cli
Creates an XML-Sitemap by crawling a given site.
Stars: ✭ 214 (-98.6%)
Mutual labels:  crawler
👾 Fast and simple video download library and CLI tool written in Go
Stars: ✭ 16,369 (+7.4%)
Mutual labels:  crawler
Strong Web Crawler
Stars: ✭ 238 (-98.44%)
Mutual labels:  crawler
Chromium for spider
dynamic crawler for web vulnerability scanner
Stars: ✭ 220 (-98.56%)
Mutual labels:  crawler
A Swift Web Crawler 🕷
Stars: ✭ 225 (-98.52%)
Mutual labels:  crawler
Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具
Stars: ✭ 227 (-98.51%)
Mutual labels:  crawler
Web crawler.
Stars: ✭ 214 (-98.6%)
Mutual labels:  crawler
Magic google
Google search results crawler, get google search results that you need
Stars: ✭ 247 (-98.38%)
Mutual labels:  crawler
Laravel Crawler Detect
A Laravel wrapper for CrawlerDetect - the web crawler detection library
Stars: ✭ 227 (-98.51%)
Mutual labels:  crawler
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (-98.44%)
Mutual labels:  crawler
crawler framework, distributed crawler extractor
Stars: ✭ 220 (-98.56%)
Mutual labels:  crawler
Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
Stars: ✭ 224 (-98.53%)
Mutual labels:  crawler
码云仓库链接:AJay13/ECommerceCrawlers Github 仓库链接:DropsDevopsOrg/ECommerceCrawlers 项目展示平台链接:
Stars: ✭ 3,073 (-79.84%)
Mutual labels:  crawler
Python Lambda Chrome Automation (naming pending)
Stars: ✭ 219 (-98.56%)
Mutual labels:  crawler
Fast Lianjia Crawler
直接通过链家 API 抓取数据的极速爬虫,宇宙最快~~ 🚀
Stars: ✭ 247 (-98.38%)
Mutual labels:  crawler
Jd mask robot
Stars: ✭ 216 (-98.58%)
Mutual labels:  crawler
Awesome Java Crawler
Stars: ✭ 228 (-98.5%)
Mutual labels:  crawler
Be nice on the web
Stars: ✭ 253 (-98.34%)
Mutual labels:  crawler
免登录下载微博图片 爬虫 Download Weibo Images without Logging-in
Stars: ✭ 247 (-98.38%)
Mutual labels:  crawler
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Stars: ✭ 231 (-98.48%)
Mutual labels:  crawler

pyspider Build Status Coverage Status

A Powerful Spider(Web Crawler) System in Python.

  • Write script in Python
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend
  • RabbitMQ, Redis and Kombu as message queue
  • Task priority, retry, periodical, recrawl by age, etc...
  • Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

Release notes:

Sample Code

from pyspider.libs.base_handler import *

class Handler(BaseHandler):
    crawl_config = {

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),


WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.





  • a visual scraping interface like portia


Licensed under the Apache License, Version 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected]