banditml / offline-policy-evaluation

Licence: other

Implementations and examples of common offline policy evaluation methods in Python.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to offline-policy-evaluation

causal-ml

Must-read papers and resources related to causal inference and machine (deep) learning

Stars: ✭ 387 (+97.45%)

Mutual labels: counterfactual-learning

mathematics-statistics-for-data-science

Mathematical & Statistical topics to perform statistical analysis and tests; Linear Regression, Probability Theory, Monte Carlo Simulation, Statistical Sampling, Bootstrapping, Dimensionality reduction techniques (PCA, FA, CCA), Imputation techniques, Statistical Tests (Kolmogorov Smirnov), Robust Estimators (FastMCD) and more in Python and R.

Stars: ✭ 56 (-71.43%)

Mutual labels: importance-sampling

awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl)

Stars: ✭ 578 (+194.9%)

Mutual labels: off-policy-evaluation

pypmc

Clustering with variational Bayes and population Monte Carlo

Stars: ✭ 46 (-76.53%)

Mutual labels: importance-sampling

adapt

Awesome Domain Adaptation Python Toolbox

Stars: ✭ 46 (-76.53%)

Mutual labels: importance-sampling

bisml

Implementation of the paper: Adaptive BRDF-Oriented Multiple Importance Sampling of Many Lights

Stars: ✭ 26 (-86.73%)

Mutual labels: importance-sampling

minicore

Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.

Stars: ✭ 28 (-85.71%)

Mutual labels: importance-sampling

Offline policy evaluation

Implementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this tutorial.

Installation

pip install offline-evaluation

Usage

from ope.methods import doubly_robust

Get some historical logs generated by a previous policy:

df = pd.DataFrame([
	{"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
	{"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
	{"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10},
	{"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
	{"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20},
	{"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10},
 ])

Define a function that computes P(action | context) under the new policy:

def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

Conduct the evaluation:

doubly_robust.evaluate(df, action_probabilities)
> {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

This means the new policy is significantly worse than the logging policy. Instead of A/B testing this new policy online, it would be better to test some other policies offline first.

See examples for more detailed tutorials.

Supported methods

Inverse propensity scoring
Direct method
Doubly robust (paper)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

banditml / offline-policy-evaluation

Programming Languages

Labels

Projects that are alternatives of or similar to offline-policy-evaluation

Offline policy evaluation

Installation

Usage

Supported methods