Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → python-testing-crawler → Python Testing Crawler

python-testing-crawler / Python Testing Crawler

Licence: mpl-2.0

A crawler for automated functional testing of a web application

Programming Languages

python

139335 projects - #7 most used programming language

Labels

django testing flask crawler

Projects that are alternatives of or similar to Python Testing Crawler

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

Stars: ✭ 57 (-16.18%)

Mutual labels: crawler, django, flask

Python24

网上搜集的自学python语言的资料集合,包括整套代码和讲义集合，这是至今为止所开放网上能够查找到的最新视频教程，网上找不到其他最新的python整套视频了,. 具体的无加密的mp4视频教程和讲义集合可以在更新的Readme文件中找到，下载直接打开就能播放，项目从零基础的Python教程到深度学习，总共30章节，其中包含Python基础中的飞机大战项目，WSGI项目，Flask新经资讯项目， Django的电商项目(本应该的美多商城项目因为使用的是Vue技术，所以替换为Django天天生鲜项目)等等，希望能够帮助大家。资源搜集劳神费力，能帮到你的话是我的福分,望大家多多支持,喜欢本仓库的话，记得Star哦。

Stars: ✭ 650 (+855.88%)

Mutual labels: django, flask

Pygmy

An open-source, feature rich & extensible url-shortener + analytics written in Python 🍪

Stars: ✭ 569 (+736.76%)

Mutual labels: django, flask

Webargs

A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.

Stars: ✭ 1,145 (+1583.82%)

Mutual labels: django, flask

Toapi

Every web site provides APIs.

Stars: ✭ 3,209 (+4619.12%)

Mutual labels: crawler, flask

Apm Agent Python

Official Python agent for the Elastic APM

Stars: ✭ 301 (+342.65%)

Mutual labels: django, flask

Mixer

Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

Stars: ✭ 743 (+992.65%)

Mutual labels: django, flask

Admin Dashboards

Admin Dashboards - Open-Source and Free | AppSeed

Stars: ✭ 275 (+304.41%)

Mutual labels: django, flask

Dbworld Search

🔍 简单的搜索引擎, django 框架

Stars: ✭ 39 (-42.65%)

Mutual labels: crawler, django

Yesterday I Learned

Brainfarts are caused by the rupturing of the cerebral sphincter.

Stars: ✭ 50 (-26.47%)

Mutual labels: django, flask

Channelstream

Channelstream is a websocket communication server for web applications

Stars: ✭ 52 (-23.53%)

Mutual labels: django, flask

Turkce Python Kaynaklari

Türkçe olarak hazırlanmış Python programlama dili ile ilgili içeriklerin derlendiği sayfa.

Stars: ✭ 295 (+333.82%)

Mutual labels: django, flask

Weixin Spider

微信公众号爬虫，公众号历史文章，文章评论，文章阅读及在看数据，可视化web页面，可部署于Windows服务器。基于Python3之flask/mysql/redis/mitmproxy/pywin32等实现，高效微信爬虫，微信公众号爬虫，历史文章，文章评论，数据更新。

Stars: ✭ 287 (+322.06%)

Mutual labels: crawler, flask

Minimal Django

A lightweight Django project - because Django can be nearly as simple as a microframework

Stars: ✭ 490 (+620.59%)

Mutual labels: django, flask

Python Articles

Monthly Series - Top 10 Python Articles

Stars: ✭ 288 (+323.53%)

Mutual labels: django, flask

Chartkick.py

Create beautiful Javascript charts with minimal code

Stars: ✭ 695 (+922.06%)

Mutual labels: django, flask

Woid

Simple news aggregator displaying top stories in real time

Stars: ✭ 204 (+200%)

Mutual labels: crawler, django

Zappa

Serverless Python

Stars: ✭ 224 (+229.41%)

Mutual labels: django, flask

Heroku Buildpack Python

The official Heroku buildpack for Python apps.

Stars: ✭ 849 (+1148.53%)

Mutual labels: django, flask

Lagom

📦 Autowiring dependency injection container for python 3

Stars: ✭ 61 (-10.29%)

Mutual labels: django, flask

View All Similar Projects ➔

Python Testing Crawler 🐍 :stethoscope: 🕷

A crawler for automated functional testing of a web application

Crawling a server-side-rendered web application is a low cost way to get low quality test coverage of your JavaScript-light web application.

If you have only partial test coverage of your routes, but still want to protect against silly mistakes, then this is for you.

Features:

Selectively spider pages and resources, or just request them
Submit forms, and control what values to send
Ignore links by source using CSS selectors
Fail fast or collect many errors
Configurable using straightforward rules

Works with the test clients for Flask (inc Flask-WebTest), Django and WebTest.

Why should I use this?

Here's an example: Flaskr, the Flask tutorial application has 166 lines of test code to achieve 100% test coverage.

Using Python Testing Crawler in a similar way to the Usage example below, we can hit 73% with very little effort. Disclaimer: Of course! It's not the same quality or utility of testing! But it is better than no tests, a complement to hand-written unit or functional tests and a useful stopgap.

Installation

$ pip install python-testing-crawler

Usage

Create a crawler using your framework's existing test client, tell it where to start and what rules to obey, then set it off:

from python_testing_crawler import Crawler
from python_testing_crawler import Rule, Request, Ignore, Allow

def test_crawl_all():
    client = ## ... existing testing client
    ## ... any setup ...
    crawler = Crawler(
        client=my_testing_client,
        initial_paths=['/'],
        rules=[
            Rule("a", '/.*', "GET", Request()),
        ]
    )
    crawler.crawl()

This will crawl all anchor links to relative addresses beginning "/". Any exceptions encountered will be collected and presented at the end of the crawl. For more power see the Rules section below.

If you need to authorise the client's session, e.g. login, then you should that before creating the Crawler.

It is also a good idea to create enough data, via fixtures or otherwise, to expose enough endpoints.

How do I setup a test client?

It depends on your framework:

Crawler Options

Param	Description
`initial_paths`	list of paths/URLs to start from
`rules`	list of Rules to control the crawler; see below
`path_attrs`	list of attribute names to extract paths/URLs from; defaults to "href" -- include "src" if you want to check e.g. `<link>`, `<script>` or even `<img>`
`ignore_css_selectors`	any elements matching this list of CSS selectors will be ignored when extracting links
`ignore_form_fields`	list of form input names to ignore when determining the identity/uniqueness of a form. Include CSRF token field names here.
`max_requests`	Crawler will raise an exception if this limit is exceeded
`capture_exceptions`	upon encountering an exception, keep going and fail at the end of the crawl instead of during (default `True`)
`output_summary`	print summary statistics and any captured exceptions and tracebacks at the end of the crawl (default `True`)
`should_process_handlers`	list of "should process" handlers; see Handlers section
`check_response_handlers`	list of "check response" handlers; see Handlers section

Rules

The crawler has to be told what URLs to follow, what forms to post and what to ignore, using Rules.

Rules are made of four parameters:

Rule(<source element regex>, <target URL/path regex>, <HTTP method>, <action to take>)

These are matched against every HTML element that the crawler encounters, with the last matching rule winning.

Actions must be one of the following objects:

Request(only=False, params=None) -- follow a link or submit a form
- only=True will retrieve a page/resource but not spider its links.
- the dict params allows you to specify overrides for a form's default values
Ignore() -- do nothing / skip
Allow(status_codes) -- allow a HTTP status in the supplied list, i.e. do not consider it an error.

Example Rules

Follow all local/relative links

HYPERLINKS_ONLY_RULE_SET = [
    Rule('a', '/.*', 'GET', Request()),
    Rule('area', '/.*', 'GET', Request()),
]

Request but do not spider all links

REQUEST_ONLY_EXTERNAL_RULE_SET = [
    Rule('a', '.*', 'GET', Request(only=True)),
    Rule('area', '.*', 'GET', Request(only=True)),
]

This is useful for finding broken links. You can also check <link> tags from the <head> if you include the following rule plus set the Crawler's path_attrs to ("HREF", "SRC").

Rule('link', '.*', 'GET', Request())

Submit forms with GET or POST

SUBMIT_GET_FORMS_RULE_SET = [
    Rule('form', '.*', 'GET', Request())
]

SUBMIT_POST_FORMS_RULE_SET = [
    Rule('form', '.*', 'POST', Request())
]

Forms are submitted with their default values, unless overridden using Request(params={...}) for a specific form target or excluded using (globally) using the ignore_form_fields parameter to Crawler (necessary for e.g. CSRF token fields).

Allow some routes to fail

PERMISSIVE_RULE_SET = [
    Rule('.*', '.*', 'GET', Allow([*range(400, 600)])),
    Rule('.*', '.*', 'POST', Allow([*range(400, 600)]))
]

If any HTTP error (400-599) is encountered for any request, allow it; do not error.

Crawl Graph

The crawler builds up a graph of your web application. It can be interrogated via crawler.graph when the crawl is finished.

See the graph module for the defintion of Node objects.

Handlers

Two hooks points are provided. These operate on Node objects (see above).

Whether to process a Node

Using should_process_handlers, you can register functions that take a Node and return a bool of whether the Crawler should "process" -- follow a link or submit a form -- or not.

Whether a response is acceptable

Using check_response_handlers, you can register functions that take a Node and response object (specific to your test client) and return a bool of whether the response should constitute an error.

If your function returns True, the Crawler with throw an exception.

Examples

There are currently Flask and Django examples in the tests.

See https://github.com/python-testing-crawler/flaskr for an example of integrating into an existing application, using Flaskr, the Flask tutorial application.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 68

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗