All Projects → banditml → offline-policy-evaluation

banditml / offline-policy-evaluation

Licence: other
Implementations and examples of common offline policy evaluation methods in Python.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to offline-policy-evaluation

causal-ml
Must-read papers and resources related to causal inference and machine (deep) learning
Stars: ✭ 387 (+97.45%)
Mutual labels:  counterfactual-learning
mathematics-statistics-for-data-science
Mathematical & Statistical topics to perform statistical analysis and tests; Linear Regression, Probability Theory, Monte Carlo Simulation, Statistical Sampling, Bootstrapping, Dimensionality reduction techniques (PCA, FA, CCA), Imputation techniques, Statistical Tests (Kolmogorov Smirnov), Robust Estimators (FastMCD) and more in Python and R.
Stars: ✭ 56 (-71.43%)
Mutual labels:  importance-sampling
awesome-offline-rl
An index of algorithms for offline reinforcement learning (offline-rl)
Stars: ✭ 578 (+194.9%)
Mutual labels:  off-policy-evaluation
pypmc
Clustering with variational Bayes and population Monte Carlo
Stars: ✭ 46 (-76.53%)
Mutual labels:  importance-sampling
adapt
Awesome Domain Adaptation Python Toolbox
Stars: ✭ 46 (-76.53%)
Mutual labels:  importance-sampling
bisml
Implementation of the paper: Adaptive BRDF-Oriented Multiple Importance Sampling of Many Lights
Stars: ✭ 26 (-86.73%)
Mutual labels:  importance-sampling
minicore
Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.
Stars: ✭ 28 (-85.71%)
Mutual labels:  importance-sampling

Offline policy evaluation

PyPI version Downloads

Implementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this tutorial.

Installation

pip install offline-evaluation

Usage

from ope.methods import doubly_robust

Get some historical logs generated by a previous policy:

df = pd.DataFrame([
	{"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
	{"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
	{"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10},
	{"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
	{"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20},
	{"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10},
 ])

Define a function that computes P(action | context) under the new policy:

def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

Conduct the evaluation:

doubly_robust.evaluate(df, action_probabilities)
> {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

This means the new policy is significantly worse than the logging policy. Instead of A/B testing this new policy online, it would be better to test some other policies offline first.

See examples for more detailed tutorials.

Supported methods

  • Inverse propensity scoring
  • Direct method
  • Doubly robust (paper)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].