All Projects → SerpentAI → Requests Respectful

SerpentAI / Requests Respectful

Licence: other
Minimalist Requests wrapper to work within rate limits of any amount of services simultaneously. Parallel processing friendly.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Requests Respectful

Turkce Python Kaynaklari
Türkçe olarak hazırlanmış Python programlama dili ile ilgili içeriklerin derlendiği sayfa.
Stars: ✭ 295 (-29.26%)
Mutual labels:  requests
Robotframework Requests
Robot Framework keyword library wrapper for requests
Stars: ✭ 345 (-17.27%)
Mutual labels:  requests
Mechanicalsoup
A Python library for automating interaction with websites.
Stars: ✭ 3,863 (+826.38%)
Mutual labels:  requests
Renrenbackup
A backup tool for renren.com
Stars: ✭ 309 (-25.9%)
Mutual labels:  requests
Node Request Retry
💂 Wrap NodeJS request module to retry http requests in case of errors
Stars: ✭ 330 (-20.86%)
Mutual labels:  requests
Proxy requests
a class that uses scraped proxies to make http GET/POST requests (Python requests)
Stars: ✭ 357 (-14.39%)
Mutual labels:  requests
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (-31.41%)
Mutual labels:  requests
Khttp
Kotlin HTTP requests library. Similar to Python requests.
Stars: ✭ 410 (-1.68%)
Mutual labels:  requests
Webspider
在线地址: http://119.23.223.90:8000
Stars: ✭ 340 (-18.47%)
Mutual labels:  requests
Hammer
An Elixir rate-limiter with pluggable backends
Stars: ✭ 366 (-12.23%)
Mutual labels:  rate-limiting
Begoneads
BeGoneAds is a script that puts some popular hosts file lists into the systems hosts file as a adblocker measure.
Stars: ✭ 314 (-24.7%)
Mutual labels:  requests
Ex rated
ExRated, the Elixir OTP GenServer with the naughty name that allows you to rate-limit calls to any service that requires it.
Stars: ✭ 328 (-21.34%)
Mutual labels:  rate-limiting
Requests Threads
🎭 Twisted Deferred Thread backend for Requests.
Stars: ✭ 366 (-12.23%)
Mutual labels:  requests
Annon.api
Configurable API gateway that acts as a reverse proxy with a plugin system.
Stars: ✭ 306 (-26.62%)
Mutual labels:  rate-limiting
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (-9.11%)
Mutual labels:  requests
Dianping textmining
大众点评评论文本挖掘,包括点评数据爬取、数据清洗入库、数据分析、评论情感分析等的完整挖掘项目
Stars: ✭ 289 (-30.7%)
Mutual labels:  requests
Cpr
C++ Requests: Curl for People, a spiritual port of Python Requests.
Stars: ✭ 4,200 (+907.19%)
Mutual labels:  requests
Drissionpage
A module that integrates selenium and requests session, encapsulates common page operations, can achieve seamless switching between the two modes.
Stars: ✭ 409 (-1.92%)
Mutual labels:  requests
Many requests
Dead easy interface for executing many HTTP requests asynchronously. Also provides helper functions for executing embarrassingly parallel async coroutines.
Stars: ✭ 384 (-7.91%)
Mutual labels:  requests
Ratelimit
API Rate Limit Decorator
Stars: ✭ 365 (-12.47%)
Mutual labels:  rate-limiting

requests-respectful

If you know Python, you know Requests. Requests is love. Requests is life. Depending on your use cases, you may come across scenarios where you need to use Requests a lot. Services you consume may have rate-limiting policies in place or you may just happen to be in a good mood and feel like being a good Netizen. This is where requests-respectful can come in handy.

requests-respectful:

  • Is a minimalist wrapper on top of Requests to work within rate limits of any amount of services simultaneously
  • Can scale out of a single thread, single process or even a single machine
  • Enables maximizing your allowed requests without ever going over set limits and having to handle the fallout
  • Proxies Requests HTTP verb methods (for minimal code changes)
  • Works with both Python 2 and 3 and is fully tested
  • Is cool (hopefully?)

Typical requests call

import requests
response = requests.get("http://github.com", params={"foo": "bar"})

Magic requests-respectful call - requests verb methods are proxied!

from requests_respectful import RespectfulRequester

rr = RespectfulRequester()

# This can be done elsewhere but the realm needs to be registered!
rr.register_realm("Github", max_requests=100, timespan=60)

response = rr.get("http://github.com", params={"foo": "bar"}, realms=["Github"], wait=True)

Conservative requests-respectful call - pass a lambda with a requests method call

import requests
from requests_respectful import RespectfulRequester

rr = RespectfulRequester()

# This can be done elsewhere but the realm needs to be registered!
rr.register_realm("Github", max_requests=100, timespan=60)

request_func = lambda: requests.get("http://github.com", params={"foo": "bar"})
response = rr.request(request_func, realms=["Github"], wait=True)

Requirements

  • Redis > 2.8.0 (See FAQ if you are rolling your eyes)

Installation

pip install requests-respectful

Configuration

Default Configuration Values

{
    "redis": {
        "host": "localhost",
        "port": 6379,
        "database": 0
    },
    "safety_threshold": 10,
    "requests_module_name": "requests"
}

Configuration Keys

  • redis: Provides the host, portand database of the Redis instance
  • safety_threshold: A rate-limited exception will be raised at (realm_max_requests - safety_threshold). Prevents going over the limit of services in scenarios where a large amount of requests are issued in parallel
  • requests_module_name: Provides the name of the Requests module used in the request lambdas. Should not need to be changed unless you import Requests as another name.

Overriding Configuration Values

With requests-respectful.config.yml

The library auto-detects the presence of a YAML file named requests-respectful.config.yml at the root of your project and will attempt to load configuration values from it.

Example:

requests-respectful.config.yml

redis:
	host: 0.0.0.0
    port: 6379
    database: 5

safety_threshold: 25

With the configure() class method

If you don't like having an extra file lying around, the library can also be configured at runtime using the configure() class method.

RespectfulRequester.configure(
	redis={"host": "0.0.0.0", "port": 6379, "database": 5},
    safety_threshold=25
)

In both cases, the resulting active configuration would be:

RespectfulRequester._config()

Out[1]: {
    "redis": {
        "host": "0.0.0.0",
        "port": 6379,
        "database": 5
    },
    "safety_threshold": 25,
    "requests_module_name": "requests"
}

Usage

In your quest to use requests-respectful, you should only ever have to bother with one class: RespectfulRequester. Instance this class and you can perform all important operations.

Before each example, it is assumed that the following code has already been executed.

from requests_respectful import RespectfulRequester
rr = RespectfulRequester()

Realms

Realms are simply named containers that are provided with a maximum requesting rate. You are responsible of the management (i.e. CRUD) of your realms.

Realms track the HTTP requests that are performed under them and will raise a catchable rate limit exception if you are over their allowed requesting rate.

Fetching the list of Realms

rr.fetch_registered_realms()

This returns a list of currently registered realm names.

Registering a Realm

rr.register_realm("Google", max_requests=10, timespan=1)
rr.register_realm("Github", max_requests=100, timespan=60)
rr.register_realm("Twitter", max_requests=150, timespan=300)

# OR
realm_tuples = [
    ["Google", 10, 1],
    ["Github", 100, 60],
    ["Twitter", 150, 300]
]

rr.register_realms(realm_tuples)

Either of these registers 3 realms:

  • Google at a maximum requesting rate of 10 requests per second
  • Github at a maximum requesting rate of 100 requests per minute
  • Twitter at a maximum requesting rate of 150 requests per 5 minutes

Updating a Realm

rr.update_realm("Google", max_requests=25, timespan=5)

This updates the maximum requesting rate of Google to 25 requests per 5 seconds.

Getting the maximum requests value of a Realm

rr.realm_max_requests("Google")

This would return 25.

Getting the timespan value of a Realm

rr.realm_timespan("Google")

This would return 5.

Unregistering a Realm

rr.unregister_realm("Google")

This would unregister the Google realm, preventing further queries from executing on it.

Unregistering multiple Realms

rr.unregister_realms(["Google", "Github", "Twitter"])

This would unregister all 3 realms in one operation, preventing further queries from executing on them.

Requesting

Using Requests HTTP verb methods

The library supports proxying calls to the 7 Requests HTTP verb methods (DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT). This is literally a Requests method so go crazy with your params, body, headers, auth etc. kwargs. The only major difference is that a realm kwarg is expected. A wait boolean kwargs can also be provided (the behavior is explained later).

These are all valid calls:

rr.get("http://httpbin.org", realms=["HTTPBin"])
rr.post('http://httpbin.org/post', data = {'key':'value'}, realms=["HTTPBin"], wait=True)
rr.put('http://httpbin.org/put', data = {'key':'value'}, realms=["HTTPBin"])
rr.delete('http://httpbin.org/delete', realms=["HTTPBin"])

If not rate-limited, these would return your usual requests.Response object.

Using a request lamba

If you are a purist and prefer not using fancy proxying, you are also allowed to create a lambda of your Requests call and pass it to the request() instance method.

request_func = lambda: requests.post('http://httpbin.org/post', data = {'key':'value'})
rr.request(request_func, realms=["HTTPBin"], wait=True)

If not rate-limited, this would return your usual requests.Response object.

Multiple realms per request

Starting in 0.2.0, you can have a single request count against multiple realms. The kwarg has been changed from realm to realms and works as you would expect it to.

rr.get("http://httpbin.org", realms=["HTTPBin", "HTTPBinUser123", "HTTPBinServer3"])

The kwarg realm has been deprecated on requesting instance methods. It will still work with a warning until 0.3.0

Handling exceptions

Executing these calls will either return a requests.Response object with the results of the HTTP call or raise a RequestsRespectfulRateLimitedError exception. This means that you'll likely want to catch and handle that exception.

from requests_respectful import RequestsRespectfulRateLimitedError

try:
	response = rr.get("http://httpbin.org", realm="HTTPBin")
except RequestsRespectfulRateLimitedError:
	pass # Possibly requeue that call or wait.

The wait kwarg

Both ways of requesting accept a wait kwarg that defaults to False. If switched on and the realm is currently rate-limited, the process will block, wait until it is safe to send requests again and perform the requests then. Waiting is perfectly fine for scripts or smaller operations but is discouraged for large, multi-realm, parallel tasks (i.e. Background Tasks like Celery workers).

Tests

  • Exist? Yes
  • Exhaustive? Yes
  • Facepalm tactics? Yes - Redis calls aren't mocked and google.com gets a few friendly calls

Run them with python -m pytest tests --spec

FAQ

Whoa, whoa, whoa! Redis?!

Yes. The use of Redis allows for requests-respectful to go multi-thread, multi-process and even multi-machine while still respecting the maximum requesting rates of registered realms. Operations like Redis' SETEX are key in designing and working with rate-limiting systems. If you are doing Python development, there is a decent chance you already work with Redis as it is one of the two options to use as Celery's backend and one of the 2 major caching options in Web development. If not, you can always keep things clean and use a Docker Container or even build it from source. Redis has kept a consistent record over the years of being lightweight, solid software.

How is this different than other throttling libraries?

  • Most other libraries will ask you to specify an interval at which to send requests and will literally loop over request()...time.sleep(interval). This one will allow to send as many as you want, as fast as you want, as long as you are under the maximum requesting rate of your realm.
  • Other libraries don't have the concept of realms and separate requesting rate rules.
  • Other libraries don't scale outside of the process.
  • Most other libraries don't integrate this neatly with Requests

Roadmap / Contribution Ideas

  • Provide some introspection methods to get live realm stats
  • Create a curses realm stats monitor
  • Provide real-life use cases
  • Read the Docs RST Documentation
  • Mock out the Redis calls in the tests
  • Mock out the Requests calls in the tests
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].