Jake
- Jun 5
- 7 min read

Simulating a Statistics Head-to-Head Rumble: Round Two

In the previous post, I considered a quasi-Dutch Book scenario inspired by a LinkedIn poster. In the investigation of that scenario, I discovered that the fact that the stakes were fixed and positive, together with the selection bias of the bookie, introduced an apparent bias into the performances of each of the statistical methods: namely the game incentivizes the bookie to take bets where a bettor underestimates the 'true', long-run value of the bet. The statistical methods themselves weren't biased in this way, but they simply wouldn't be chosen by the bookie except in the cases where they happened to do so by chance. This effect skews the previous results as indications of the long-run performance of those methods.

In order to correct this tendency and better align with proper Dutch Books, we should introduce the notion of a negative bet for the bookie.

Negative Bets

By negative bet, I mean a procedure whereby, instead of a bettor purchasing a bet at their fair bet price for potential stakes, I mean an equivalent procedure where the bookie gives the bettor the difference between the stakes and their fair bet price under the agreement that the bookie will claw back the entire stakes if the bet fails. (Note that this is identical to how traditional Dutch Books punish subadditive 'probabilities' with negative stakes.)

To understand this equivalence, consider the following: Let S be the stakes. Let p be the bettor's estimation of the bias of the coin. Let q be the 'true' bias of the coin. Then:

(S-pS) - (1-q)S

S-pS-S+qS

-pS+qS

(q-p)S

Where the (S-pS) term is the initial giving or loan by the bookie, and (1-q)S is the long-run average of how often the bookie claws back the stakes.

This last line is the negative of (p-q)S, which is zero at a perfectly calibrated bet. In the previous simulation, we had the case where the bookie tended to take bets where p-q > 0, that is where q underestimated p. In the case of a negative bet, the bookie will be incentivized oppositely, to take bets where q overestimates p.

By allowing the bookie to make positive and negative bets equally randomly (~Bern(0.5)), we further improve the scenario by removing the previous bias introduced by the behavior of the bookie. Note that this alteration makes it one step further from the original scenario considered, but nonetheless hopefully provides a clearer view into the underlying issues at play.

Hypotheses Redux

The hypothesis here is that by introducing negative bets, the selection bias against, for example, the Frequentist Confidence Upper Bound bettor is corrected and that the bookie's average rate of gain no longer places the Oracle bettor in the middle of performers but instead places it at the lowest position, being the hardest to exploit.

Results and Discussion

Considering the results below, my hypothesis seems at least partially incorrect. While the results obtained indeed seem to correct some of the selection bias in the previous scenario (as depicted in graphs not displayed here), it does not much improve the rankings of FUB in terms of win rate or the Oracle in terms of the bookie gain rate.

I suspect that some selection bias still exists relative to the betting strategies, as the bookie is still able to skip placing bets, but this is the kind of behavior we want in the bookie, so this does not seem entirely eliminable for our purposes -- we're interested in understanding the exploitablility of these strategies, after all.

What we do see, however, is that, again, OB and OB-adjacent methods are highly competitive in terms of head-to-head wins against all other methods, and that OB and OB-adjacent methods are not particularly exploitable when compared to frequentist methods, including the MLE (plug-in).

We also again see the problems that can arise with subjectivist Bayesian methods, which can still perform poorly even with empirical calibration, depending on the extremity of their prior. Most practicing Bayesians today have some rules of thumb against such unwarranted prior confidence, but OB provides a more particular and well-motivated set of desiderata for selection.

Closing Remarks

I've found the last few informing experiments to be enlightening and educational. It's certainly interesting to see which parts of the purports of theory and intuition bear out in practice. As I continue to learn more about various statistical methodologies, I think it'll revisit such simulations in the future to test long-run expected results, as that criteria is relevant. Recall that subjectivist Bayesian methods are not particularly interested in long-run expected performance as a criterion of evaluation in the first place, so complaining about the poor long-run performance of subjectivist statistical methods is a bit misplaced. But as Bayesian methods make more headway into the sciences, Bayesian practitioners need to recognize that, especially in scientific and industrial contexts, long-run performance is relevant. As such, subjectivist methods may not be appropriate in those cases. This is why contemporary, scientifically-minded Bayesianism has moved in more objectivist directions for the last several decades.

The classic objections to frequentism still stand as strong as ever, however. Frequentist methods offer empirical reliability but they answer the wrong questions in the first place. After all, when Joe gets screened for cancer, he doesn't want to know how often people like Joe have cancer: he wants to know how likely it is he has cancer. Insofar as frequentist information is used for managing practical decisions, a sleight of hand takes place, yielding implicit Bayesian readings despite being unsupported by frequentist interpretations. Objective Bayesianism provides a coherent framework that makes sense of why this sleight of hand is actually rational and where frequentist information fits within a broader Bayesian rationality. OB allows Bayesians to take up available empirical information that subjectivists leave on the table and allows frequentists to answer the correct questions for decision making without losing their long-run performance bounds.

A relevant question from a practical perspective, though, is that since frequentism is at least approximately correct in most scientific contexts, is there a practical benefit to going through the trouble of producing OB results? That, however, will be a question for another time.

References

Berger, J., Bernardo, J., & Sun, D. (2024). Objective Bayesian Inference. WORLD SCIENTIFIC.

Williamson, J. (2010). In defence of objective Bayesianism, Oxford University Press.

Appendix

Head-to-Head Results

Random

Oracle

Uncalibrated Bayes

Extreme Sub Bayes

Freq Plug In

Freq LB

Freq UB

Freq Random

Calibrated Bayes

High Alpha Cal Bayes

Calib Ext Sub Bayes

High Alpha CESB

Objective Bayes

High Alpha OB

Random

Oracle

Random

FLB

FUB

HCB

CESB

HCESB

HOB

Oracle

Oracle

Uncalibrated Bayes

Oracle

FLB

HCB

HOB

Extreme Sub Bayes

Random

Oracle

FLB

FUB

HCB

CESB

HCESB

HOB

Freq Plug In

Oracle

Inc

Freq LB

FLB

Oracle

FLB

Inc

FLB

Inc

FLB

Inc

FLB

Inc

Freq UB

FUB

Oracle

FUB

FLB

HCB

Inc

HOB

Freq Random

Oracle

Inc

HCB

HOB

Calibrated Bayes

Oracle

FLB

HCB

Inc

HOB

High Alpha Cal Bayes

HCB

Oracle

HCB

Inc

HCB

Inc

Calib Ext Sub Bayes

CESB

Oracle

CESB

FLB

Inc

HCB

CESB

HOB

High Alpha CESB

HCESB

Oracle

HCESB

FLB

Inc

HCB

CESB

HOB

Objective Bayes

Oracle

Inc

HCB

Inc

High Alpha OB

HOB

Oracle

HOB

Inc

HOB

Inc

HOB

Inc

Win Counts

Method	Win Counts	Win Counts with Inc
Oracle	13	13
Freq Plug In	9	10.5
High Alpha Cal Bayes	9	10.5
High Alpha OB	8	10
Freq LB	7	9.5
Objective Bayes	7	8.5
Calibrated Bayes	7	7.5
Freq Random	5	5.5
Uncalibrated Bayes	5	5
Calib Ext Sub Bayes	3	3.5
Freq UB	2	3
High Alpha CESB	2	2.5
Random	1	1
Extreme Sub Bayes	0	0

Bookie Gain Rates

	Random	Oracle	Uncalibrated Bayes	Extreme Sub Bayes	Freq Plug In	Freq LB	Freq UB	Freq Random	Calibrated Bayes	High Alpha Cal Bayes	Calib Ext Sub Bayes	High Alpha CESB	Objective Bayes	High Alpha OB
Random		35.55	36.92	41.67	36.36	31.11	43.75	36.67	37.14	36.67	37.50	37.50	36.67	36.67
Oracle	35.55		9.44	36.00	8.33	6.67	2.00	12.50	9.17	8.57	16.67	16.67	8.80	8.33
Uncalibrated Bayes	36.92	9.44		36.67	2.06	3.86	18.46	10.00	0.53	2.00	16.25	16.67	0.55	2.06
Extreme Sub Bayes	41.67	36.00	36.67		36.36	27.50	50.00	38.33	36.67	36.36	20.00	20.00	36.36	36.00
Freq Plug In	36.36	8.33	2.06	36.36		0.00	18.18	8.93	1.56	0.00	18.75	15.83	1.56	0.00
Freq LB	31.11	6.67	3.86	27.50	0.00		0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
Freq UB	43.75	2.00	18.46	50.00	18.18	0.00		18.33	20.50	17.92	34.00	34.00	18.33	18.33
Freq Random	36.67	12.50	10.00	38.33	8.93	0.00	18.33		9.29	8.67	17.14	17.14	9.29	8.89
Calibrated Bayes	37.14	9.17	0.53	36.67	1.56	0.00	20.50	9.29		1.50	16.67	15.71	0.00	1.50
High Alpha Cal Bayes	36.67	8.57	2.00	36.36	0.00	0.00	17.92	8.67	1.50		15.71	15.71	1.50	0.00
Calib Ext Sub Bayes	37.50	16.67	16.25	20.00	18.75	0.00	34.00	17.14	16.67	15.71		0.00	13.75	16.25
High Alpha CESB	37.50	16.67	16.67	20.00	15.83	0.00	34.00	17.14	15.71	15.71	0.00		16.67	15.83
Objective Bayes	36.67	8.80	0.55	36.36	1.56	0.00	18.33	9.29	0.00	1.50	13.75	16.67		0.00
High Alpha OB	36.67	8.33	2.06	36.00	0.00	0.00	18.33	8.89	1.50	0.00	16.25	15.83	0.00

Average Bookie Gain Rate

Method	Rate
Freq LB	5.318071
Objective Bayes	11.03681
High Alpha OB	11.06667
High Alpha Cal Bayes	11.12413
Freq Plug In	11.37961
Calibrated Bayes	11.55629
Uncalibrated Bayes	11.95981
Oracle	13.74633
Freq Random	15.01339
High Alpha CESB	17.05671
Calib Ext Sub Bayes	17.13004
Freq UB	22.60077
Extreme Sub Bayes	34.7634
Random	37.24441

Simulating a Statistics Head-to-Head Rumble: Round Two

Negative Bets

Hypotheses Redux

Results and Discussion

Closing Remarks

References

Appendix

Head-to-Head Results

Win Counts

Bookie Gain Rates

Average Bookie Gain Rate

Recent Posts

Commentaires