In the previous post, I considered a quasi-Dutch Book scenario inspired by a LinkedIn poster. In the investigation of that scenario, I discovered that the fact that the stakes were fixed and positive, together with the selection bias of the bookie, introduced an apparent bias into the performances of each of the statistical methods: namely the game incentivizes the bookie to take bets where a bettor underestimates the 'true', long-run value of the bet. The statistical methods themselves weren't biased in this way, but they simply wouldn't be chosen by the bookie except in the cases where they happened to do so by chance. This effect skews the previous results as indications of the long-run performance of those methods.
In order to correct this tendency and better align with proper Dutch Books, we should introduce the notion of a negative bet for the bookie.
Negative Bets
By negative bet, I mean a procedure whereby, instead of a bettor purchasing a bet at their fair bet price for potential stakes, I mean an equivalent procedure where the bookie gives the bettor the difference between the stakes and their fair bet price under the agreement that the bookie will claw back the entire stakes if the bet fails. (Note that this is identical to how traditional Dutch Books punish subadditive 'probabilities' with negative stakes.)
To understand this equivalence, consider the following: Let S be the stakes. Let p be the bettor's estimation of the bias of the coin. Let q be the 'true' bias of the coin. Then:
(S-pS) - (1-q)S
S-pS-S+qS
-pS+qS
(q-p)S
Where the (S-pS) term is the initial giving or loan by the bookie, and (1-q)S is the long-run average of how often the bookie claws back the stakes.
This last line is the negative of (p-q)S, which is zero at a perfectly calibrated bet. In the previous simulation, we had the case where the bookie tended to take bets where p-q > 0, that is where q underestimated p. In the case of a negative bet, the bookie will be incentivized oppositely, to take bets where q overestimates p.
By allowing the bookie to make positive and negative bets equally randomly (~Bern(0.5)), we further improve the scenario by removing the previous bias introduced by the behavior of the bookie. Note that this alteration makes it one step further from the original scenario considered, but nonetheless hopefully provides a clearer view into the underlying issues at play.
Hypotheses Redux
The hypothesis here is that by introducing negative bets, the selection bias against, for example, the Frequentist Confidence Upper Bound bettor is corrected and that the bookie's average rate of gain no longer places the Oracle bettor in the middle of performers but instead places it at the lowest position, being the hardest to exploit.
Results and Discussion
Considering the results below, my hypothesis seems at least partially incorrect. While the results obtained indeed seem to correct some of the selection bias in the previous scenario (as depicted in graphs not displayed here), it does not much improve the rankings of FUB in terms of win rate or the Oracle in terms of the bookie gain rate.
I suspect that some selection bias still exists relative to the betting strategies, as the bookie is still able to skip placing bets, but this is the kind of behavior we want in the bookie, so this does not seem entirely eliminable for our purposes -- we're interested in understanding the exploitablility of these strategies, after all.
What we do see, however, is that, again, OB and OB-adjacent methods are highly competitive in terms of head-to-head wins against all other methods, and that OB and OB-adjacent methods are not particularly exploitable when compared to frequentist methods, including the MLE (plug-in).
We also again see the problems that can arise with subjectivist Bayesian methods, which can still perform poorly even with empirical calibration, depending on the extremity of their prior. Most practicing Bayesians today have some rules of thumb against such unwarranted prior confidence, but OB provides a more particular and well-motivated set of desiderata for selection.
Closing Remarks
I've found the last few informing experiments to be enlightening and educational. It's certainly interesting to see which parts of the purports of theory and intuition bear out in practice. As I continue to learn more about various statistical methodologies, I think it'll revisit such simulations in the future to test long-run expected results, as that criteria is relevant. Recall that subjectivist Bayesian methods are not particularly interested in long-run expected performance as a criterion of evaluation in the first place, so complaining about the poor long-run performance of subjectivist statistical methods is a bit misplaced. But as Bayesian methods make more headway into the sciences, Bayesian practitioners need to recognize that, especially in scientific and industrial contexts, long-run performance is relevant. As such, subjectivist methods may not be appropriate in those cases. This is why contemporary, scientifically-minded Bayesianism has moved in more objectivist directions for the last several decades.
The classic objections to frequentism still stand as strong as ever, however. Frequentist methods offer empirical reliability but they answer the wrong questions in the first place. After all, when Joe gets screened for cancer, he doesn't want to know how often people like Joe have cancer: he wants to know how likely it is he has cancer. Insofar as frequentist information is used for managing practical decisions, a sleight of hand takes place, yielding implicit Bayesian readings despite being unsupported by frequentist interpretations. Objective Bayesianism provides a coherent framework that makes sense of why this sleight of hand is actually rational and where frequentist information fits within a broader Bayesian rationality. OB allows Bayesians to take up available empirical information that subjectivists leave on the table and allows frequentists to answer the correct questions for decision making without losing their long-run performance bounds.
A relevant question from a practical perspective, though, is that since frequentism is at least approximately correct in most scientific contexts, is there a practical benefit to going through the trouble of producing OB results? That, however, will be a question for another time.
References
Berger, J., Bernardo, J., & Sun, D. (2024). Objective Bayesian Inference. WORLD SCIENTIFIC.
Williamson, J. (2010). In defence of objective Bayesianism, Oxford University Press.
Appendix
Head-to-Head Results
Random | Oracle | Uncalibrated Bayes | Extreme Sub Bayes | Freq Plug In | Freq LB | Freq UB | Freq Random | Calibrated Bayes | High Alpha Cal Bayes | Calib Ext Sub Bayes | High Alpha CESB | Objective Bayes | High Alpha OB | |
Random |
| Oracle | UB | Random | FP | FLB | FUB | FR | CB | HCB | CESB | HCESB | OB | HOB |
Oracle | Oracle |
| Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle | Oracle |
Uncalibrated Bayes | UB | Oracle |
| UB | FP | FLB | UB | UB | CB | HCB | UB | UB | OB | HOB |
Extreme Sub Bayes | Random | Oracle | UB |
| FP | FLB | FUB | FR | CB | HCB | CESB | HCESB | OB | HOB |
Freq Plug In | FP | Oracle | FP | FP |
| Inc | FP | FP | FP | Inc | FP | FP | FP | Inc |
Freq LB | FLB | Oracle | FLB | FLB | Inc |
| FLB | Inc | FLB | Inc | FLB | FLB | Inc | Inc |
Freq UB | FUB | Oracle | UB | FUB | FP | FLB |
| FR | CB | HCB | Inc | Inc | OB | HOB |
Freq Random | FR | Oracle | UB | FR | FP | Inc | FR |
| CB | HCB | FR | FR | OB | HOB |
Calibrated Bayes | CB | Oracle | CB | CB | FP | FLB | CB | CB |
| HCB | CB | CB | Inc | HOB |
High Alpha Cal Bayes | HCB | Oracle | HCB | HCB | Inc | Inc | HCB | HCB | HCB |
| HCB | HCB | HCB | Inc |
Calib Ext Sub Bayes | CESB | Oracle | UB | CESB | FP | FLB | Inc | FR | CB | HCB |
| CESB | OB | HOB |
High Alpha CESB | HCESB | Oracle | UB | HCESB | FP | FLB | Inc | FR | CB | HCB | CESB |
| OB | HOB |
Objective Bayes | OB | Oracle | OB | OB | FP | Inc | OB | OB | Inc | HCB | OB | OB |
| Inc |
High Alpha OB | HOB | Oracle | HOB | HOB | Inc | Inc | HOB | HOB | HOB | Inc | HOB | HOB | Inc |
|
Win Counts
Method | Win Counts | Win Counts with Inc |
Oracle | 13 | 13 |
Freq Plug In | 9 | 10.5 |
High Alpha Cal Bayes | 9 | 10.5 |
High Alpha OB | 8 | 10 |
Freq LB | 7 | 9.5 |
Objective Bayes | 7 | 8.5 |
Calibrated Bayes | 7 | 7.5 |
Freq Random | 5 | 5.5 |
Uncalibrated Bayes | 5 | 5 |
Calib Ext Sub Bayes | 3 | 3.5 |
Freq UB | 2 | 3 |
High Alpha CESB | 2 | 2.5 |
Random | 1 | 1 |
Extreme Sub Bayes | 0 | 0 |
Bookie Gain Rates
Random | Oracle | Uncalibrated Bayes | Extreme Sub Bayes | Freq Plug In | Freq LB | Freq UB | Freq Random | Calibrated Bayes | High Alpha Cal Bayes | Calib Ext Sub Bayes | High Alpha CESB | Objective Bayes | High Alpha OB | |
Random |
| 35.55 | 36.92 | 41.67 | 36.36 | 31.11 | 43.75 | 36.67 | 37.14 | 36.67 | 37.50 | 37.50 | 36.67 | 36.67 |
Oracle | 35.55 |
| 9.44 | 36.00 | 8.33 | 6.67 | 2.00 | 12.50 | 9.17 | 8.57 | 16.67 | 16.67 | 8.80 | 8.33 |
Uncalibrated Bayes | 36.92 | 9.44 |
| 36.67 | 2.06 | 3.86 | 18.46 | 10.00 | 0.53 | 2.00 | 16.25 | 16.67 | 0.55 | 2.06 |
Extreme Sub Bayes | 41.67 | 36.00 | 36.67 |
| 36.36 | 27.50 | 50.00 | 38.33 | 36.67 | 36.36 | 20.00 | 20.00 | 36.36 | 36.00 |
Freq Plug In | 36.36 | 8.33 | 2.06 | 36.36 |
| 0.00 | 18.18 | 8.93 | 1.56 | 0.00 | 18.75 | 15.83 | 1.56 | 0.00 |
Freq LB | 31.11 | 6.67 | 3.86 | 27.50 | 0.00 |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Freq UB | 43.75 | 2.00 | 18.46 | 50.00 | 18.18 | 0.00 |
| 18.33 | 20.50 | 17.92 | 34.00 | 34.00 | 18.33 | 18.33 |
Freq Random | 36.67 | 12.50 | 10.00 | 38.33 | 8.93 | 0.00 | 18.33 |
| 9.29 | 8.67 | 17.14 | 17.14 | 9.29 | 8.89 |
Calibrated Bayes | 37.14 | 9.17 | 0.53 | 36.67 | 1.56 | 0.00 | 20.50 | 9.29 |
| 1.50 | 16.67 | 15.71 | 0.00 | 1.50 |
High Alpha Cal Bayes | 36.67 | 8.57 | 2.00 | 36.36 | 0.00 | 0.00 | 17.92 | 8.67 | 1.50 |
| 15.71 | 15.71 | 1.50 | 0.00 |
Calib Ext Sub Bayes | 37.50 | 16.67 | 16.25 | 20.00 | 18.75 | 0.00 | 34.00 | 17.14 | 16.67 | 15.71 |
| 0.00 | 13.75 | 16.25 |
High Alpha CESB | 37.50 | 16.67 | 16.67 | 20.00 | 15.83 | 0.00 | 34.00 | 17.14 | 15.71 | 15.71 | 0.00 |
| 16.67 | 15.83 |
Objective Bayes | 36.67 | 8.80 | 0.55 | 36.36 | 1.56 | 0.00 | 18.33 | 9.29 | 0.00 | 1.50 | 13.75 | 16.67 |
| 0.00 |
High Alpha OB | 36.67 | 8.33 | 2.06 | 36.00 | 0.00 | 0.00 | 18.33 | 8.89 | 1.50 | 0.00 | 16.25 | 15.83 | 0.00 |
|
Average Bookie Gain Rate
Method | Rate |
Freq LB | 5.318071 |
Objective Bayes | 11.03681 |
High Alpha OB | 11.06667 |
High Alpha Cal Bayes | 11.12413 |
Freq Plug In | 11.37961 |
Calibrated Bayes | 11.55629 |
Uncalibrated Bayes | 11.95981 |
Oracle | 13.74633 |
Freq Random | 15.01339 |
High Alpha CESB | 17.05671 |
Calib Ext Sub Bayes | 17.13004 |
Freq UB | 22.60077 |
Extreme Sub Bayes | 34.7634 |
Random | 37.24441 |
Commentaires