Jake
- Jun 3
- 3 min read

An Obvious Improvement to an OB Workflow

In a previous post I outlined one way of incorporating objective Bayesian constraints into a simple regression problem consisting of a few steps:

Fit a standard regressor of your choosing.
Use conformal estimation to obtain coverage bounds.
Continue with your Bayesian analysis as usual, including constraints for the coverage bounds.

This is fine, but one obvious question is, "I don't want to have to fit one model just to throw it away and fit the one I actually want."

This is true, and the solution should have been obvious to me to begin with: Let's use the Bayesian model itself as the basis for the conformal estimation.

Revisiting the Linear Regression Example

import statsmodels.api as sm
import pymc as pm
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
import numpy as np
import arviz as az

rng = np.random.RandomState(314)
n_points = 50
x = 12 * rng.rand(n_points)

# True line: y=3x-17
y = 3 * x - 17 + 9 * rng.randn(n_points)
x_data, y_data = x, y
plt.scatter(x, y)

And, let's continue with our train, test, calibration splits:

x_train, x_test, y_train, y_test = train_test_split(x, y)
x_prop_train, x_cal, y_prop_train, y_cal = train_test_split(x_train, y_train)

Now, unlike the previous post, we will not be first creating a standard MLE model. Instead, we will use our Bayesian model as the underlying conformal estimator.

To make things easy, let's wrap some of the PyMC stuff into an Sklearn-like interface. This makes using the crepes library simple.

class BayesLinearRegressor:
    def __init__(self):
        self.model = pm.Model()
        self.idata = None
    
    def fit(self, X, y, n_samples=1000, tune=1000):
        with self.model:
            x = pm.Data('x', X)
            intercept = pm.Normal('intercept', mu=0, sigma=100)
            slope = pm.Normal('slope', mu=0, sigma=100)
            y_ = slope * x + intercept 
        
            y_out_sigma = pm.HalfNormal('y_sig', sigma=100)
            y_out = pm.Normal('y', mu=y_, sigma=y_out_sigma, observed=y, shape=x.shape)

            self.idata = pm.sample(n_samples, tune=tune, progressbar=None)
        
    def predict(self, X):
        with self.model:
            self.model.set_data('x', X)
            predict_samples = pm.sample_posterior_predictive(self.idata, var_names=['y'], progressbar=None)
            predictions = az.extract(predict_samples.posterior_predictive.y).median(dim='sample').to_dataframe().y.values
            return predictions

The margins on this blog aren't great, but basically the class holds a PyMC Model and trace information and also provides fit() and predict() functions. Predict in this case generates a point estimate from a sampled posterior by taking the median. (Change this as you see fit!)

Now, let's take an initial look at what this regression line looks like:

m = BayesLinearRegressor()
m.fit(x_prop_train, y_prop_train)

x_plot = np.linspace(0, 12, 12)
y_plot = m.predict(x_plot)

plt.scatter(x_data, y_data)
plt.plot(x_plot, y_plot, color="red")

Now, we can conformalize on this point estimator using crepes!

from crepes import WrapRegressor
lr = WrapRegressor(BayesLinearRegressor())
lr.fit(x_prop_train, y_prop_train)

And we're going to include the adaptive DifficultyEstimator as well:

from crepes.extras import DifficultyEstimator
de = DifficultyEstimator()
de.fit(x_prop_train[:, None], y_prop_train, k=25)
sigmas_cal = de.apply(x_cal[:, None])
lr.calibrate(x_cal, y_cal, sigmas=sigmas_cal)

sigmas_test = de.apply(x_plot[:, None])
pred_ints = lr.predict_int(x_plot, sigmas=sigmas_test,
	confidence=0.90)

Just some minor organization before plotting our result!

import pandas as pd
df_conf = pd.DataFrame()
df_conf['x'] = x_plot
df_conf['lb'] = pred_ints[:, 0]
df_conf['ub'] = pred_ints[:, 1]

y_plot = m.predict(x_plot)
plt.fill_between(df_conf.x, df_conf.lb, df_conf.ub, alpha=0.2, 
	color="yellow")
plt.scatter(x_data, y_data)
plt.plot(x_plot, y_plot, color='red')

Some Thoughts

I think that it's pretty cool that one can so simply get frequentist coverage intervals from a Bayesian model, all thanks to conformal prediction! Maybe this isn't so surprising since conformal estimation makes very minimal demands on its underlying estimator.

I do wonder about how this should be interpreted however. After all, we still have our full Bayesian posterior distribution, not depicted here. From an objective Bayesian perspective, these coverage intervals are supposed to be bounds relevant to our posterior distribution.

So, I suppose it would be correct, once you have these confidence intervals, to go back to your posterior samples and reject all the samples that now fail these confidence bounds, then recalculate your posterior accordingly, especially if one wants to then conduct inverse inference about model parameters. (I'm currently unsure whether this means that one should then repeat the conformal procedure -- you won't gain any additional coverage, but you might get slightly smaller intervals, if that's something you care about.)

But if your interest is just on the target y variable, perhaps you don't need touch your posterior samples but just clamp your inferences to the coverage interval, as appropriate. This should be equivalent to the objective Bayesian constraining all uncovered regions to zero density, particularly if one is interested in maximum a posteriori estimates. Just be sure to rescale your credence interval levels to account for the confidence level of the bounding interval.

At the very least, performing this procedure gives some empirical performance bounds for whatever Bayesian model you generate, regardless if one chooses a calibration procedure thereafter. This can help with communicating results to stakeholders that value different things from a model's performance and behavior.

An Obvious Improvement to an OB Workflow

Revisiting the Linear Regression Example

Some Thoughts

Recent Posts

Comments