top of page
  • Writer's pictureJake

Simulating a Statistics Head-to-Head Rumble: Round Two

In the previous post, I considered a quasi-Dutch Book scenario inspired by a LinkedIn poster. In the investigation of that scenario, I discovered that the fact that the stakes were fixed and positive, together with the selection bias of the bookie, introduced an apparent bias into the performances of each of the statistical methods: namely the game incentivizes the bookie to take bets where a bettor underestimates the 'true', long-run value of the bet. The statistical methods themselves weren't biased in this way, but they simply wouldn't be chosen by the bookie except in the cases where they happened to do so by chance. This effect skews the previous results as indications of the long-run performance of those methods.


In order to correct this tendency and better align with proper Dutch Books, we should introduce the notion of a negative bet for the bookie.


Negative Bets


By negative bet, I mean a procedure whereby, instead of a bettor purchasing a bet at their fair bet price for potential stakes, I mean an equivalent procedure where the bookie gives the bettor the difference between the stakes and their fair bet price under the agreement that the bookie will claw back the entire stakes if the bet fails. (Note that this is identical to how traditional Dutch Books punish subadditive 'probabilities' with negative stakes.)


To understand this equivalence, consider the following: Let S be the stakes. Let p be the bettor's estimation of the bias of the coin. Let q be the 'true' bias of the coin. Then:

(S-pS) - (1-q)S

S-pS-S+qS

-pS+qS

(q-p)S

Where the (S-pS) term is the initial giving or loan by the bookie, and (1-q)S is the long-run average of how often the bookie claws back the stakes.


This last line is the negative of (p-q)S, which is zero at a perfectly calibrated bet. In the previous simulation, we had the case where the bookie tended to take bets where p-q > 0, that is where q underestimated p. In the case of a negative bet, the bookie will be incentivized oppositely, to take bets where q overestimates p.


By allowing the bookie to make positive and negative bets equally randomly (~Bern(0.5)), we further improve the scenario by removing the previous bias introduced by the behavior of the bookie. Note that this alteration makes it one step further from the original scenario considered, but nonetheless hopefully provides a clearer view into the underlying issues at play.


Hypotheses Redux


The hypothesis here is that by introducing negative bets, the selection bias against, for example, the Frequentist Confidence Upper Bound bettor is corrected and that the bookie's average rate of gain no longer places the Oracle bettor in the middle of performers but instead places it at the lowest position, being the hardest to exploit.


Results and Discussion


Considering the results below, my hypothesis seems at least partially incorrect. While the results obtained indeed seem to correct some of the selection bias in the previous scenario (as depicted in graphs not displayed here), it does not much improve the rankings of FUB in terms of win rate or the Oracle in terms of the bookie gain rate.


I suspect that some selection bias still exists relative to the betting strategies, as the bookie is still able to skip placing bets, but this is the kind of behavior we want in the bookie, so this does not seem entirely eliminable for our purposes -- we're interested in understanding the exploitablility of these strategies, after all.


What we do see, however, is that, again, OB and OB-adjacent methods are highly competitive in terms of head-to-head wins against all other methods, and that OB and OB-adjacent methods are not particularly exploitable when compared to frequentist methods, including the MLE (plug-in).


We also again see the problems that can arise with subjectivist Bayesian methods, which can still perform poorly even with empirical calibration, depending on the extremity of their prior. Most practicing Bayesians today have some rules of thumb against such unwarranted prior confidence, but OB provides a more particular and well-motivated set of desiderata for selection.


Closing Remarks


I've found the last few informing experiments to be enlightening and educational. It's certainly interesting to see which parts of the purports of theory and intuition bear out in practice. As I continue to learn more about various statistical methodologies, I think it'll revisit such simulations in the future to test long-run expected results, as that criteria is relevant. Recall that subjectivist Bayesian methods are not particularly interested in long-run expected performance as a criterion of evaluation in the first place, so complaining about the poor long-run performance of subjectivist statistical methods is a bit misplaced. But as Bayesian methods make more headway into the sciences, Bayesian practitioners need to recognize that, especially in scientific and industrial contexts, long-run performance is relevant. As such, subjectivist methods may not be appropriate in those cases. This is why contemporary, scientifically-minded Bayesianism has moved in more objectivist directions for the last several decades.


The classic objections to frequentism still stand as strong as ever, however. Frequentist methods offer empirical reliability but they answer the wrong questions in the first place. After all, when Joe gets screened for cancer, he doesn't want to know how often people like Joe have cancer: he wants to know how likely it is he has cancer. Insofar as frequentist information is used for managing practical decisions, a sleight of hand takes place, yielding implicit Bayesian readings despite being unsupported by frequentist interpretations. Objective Bayesianism provides a coherent framework that makes sense of why this sleight of hand is actually rational and where frequentist information fits within a broader Bayesian rationality. OB allows Bayesians to take up available empirical information that subjectivists leave on the table and allows frequentists to answer the correct questions for decision making without losing their long-run performance bounds.


A relevant question from a practical perspective, though, is that since frequentism is at least approximately correct in most scientific contexts, is there a practical benefit to going through the trouble of producing OB results? That, however, will be a question for another time.


 

References


Berger, J., Bernardo, J., & Sun, D. (2024). Objective Bayesian Inference. WORLD SCIENTIFIC.


Williamson, J. (2010). In defence of objective Bayesianism, Oxford University Press.


 

Appendix

Head-to-Head Results



Random

Oracle

Uncalibrated Bayes

Extreme Sub Bayes

Freq Plug In

Freq LB

Freq UB

Freq Random

Calibrated Bayes

High Alpha Cal Bayes

Calib Ext Sub Bayes

High Alpha CESB

Objective Bayes

High Alpha OB

Random

 

Oracle

UB

Random

FP

FLB

FUB

FR

CB

HCB

CESB

HCESB

OB

HOB

Oracle

Oracle

 

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Oracle

Uncalibrated Bayes

UB

Oracle

 

UB

FP

FLB

UB

UB

CB

HCB

UB

UB

OB

HOB

Extreme Sub Bayes

Random

Oracle

UB

 

FP

FLB

FUB

FR

CB

HCB

CESB

HCESB

OB

HOB

Freq Plug In

FP

Oracle

FP

FP

 

Inc

FP

FP

FP

Inc

FP

FP

FP

Inc

Freq LB

FLB

Oracle

FLB

FLB

Inc

 

FLB

Inc

FLB

Inc

FLB

FLB

Inc

Inc

Freq UB

FUB

Oracle

UB

FUB

FP

FLB

 

FR

CB

HCB

Inc

Inc

OB

HOB

Freq Random

FR

Oracle

UB

FR

FP

Inc

FR

 

CB

HCB

FR

FR

OB

HOB

Calibrated Bayes

CB

Oracle

CB

CB

FP

FLB

CB

CB

 

HCB

CB

CB

Inc

HOB

High Alpha Cal Bayes

HCB

Oracle

HCB

HCB

Inc

Inc

HCB

HCB

HCB

 

HCB

HCB

HCB

Inc

Calib Ext Sub Bayes

CESB

Oracle

UB

CESB

FP

FLB

Inc

FR

CB

HCB

 

CESB

OB

HOB

High Alpha CESB

HCESB

Oracle

UB

HCESB

FP

FLB

Inc

FR

CB

HCB

CESB

 

OB

HOB

Objective Bayes

OB

Oracle

OB

OB

FP

Inc

OB

OB

Inc

HCB

OB

OB

 

Inc

High Alpha OB

HOB

Oracle

HOB

HOB

Inc

Inc

HOB

HOB

HOB

Inc

HOB

HOB

Inc

 

Win Counts

Method

Win Counts

Win Counts with Inc

Oracle

13

13

Freq Plug In

9

10.5

High Alpha Cal Bayes

9

10.5

High Alpha OB

8

10

Freq LB

7

9.5

Objective Bayes

7

8.5

Calibrated Bayes

7

7.5

Freq Random

5

5.5

Uncalibrated Bayes

5

5

Calib Ext Sub Bayes

3

3.5

Freq UB

2

3

High Alpha CESB

2

2.5

Random

1

1

Extreme Sub Bayes

0

0

Bookie Gain Rates


Random

Oracle

Uncalibrated Bayes

Extreme Sub Bayes

Freq Plug In

Freq LB

Freq UB

Freq Random

Calibrated Bayes

High Alpha Cal Bayes

Calib Ext Sub Bayes

High Alpha CESB

Objective Bayes

High Alpha OB

Random

 

35.55

36.92

41.67

36.36

31.11

43.75

36.67

37.14

36.67

37.50

37.50

36.67

36.67

Oracle

35.55

 

9.44

36.00

8.33

6.67

2.00

12.50

9.17

8.57

16.67

16.67

8.80

8.33

Uncalibrated Bayes

36.92

9.44

 

36.67

2.06

3.86

18.46

10.00

0.53

2.00

16.25

16.67

0.55

2.06

Extreme Sub Bayes

41.67

36.00

36.67

 

36.36

27.50

50.00

38.33

36.67

36.36

20.00

20.00

36.36

36.00

Freq Plug In

36.36

8.33

2.06

36.36

 

0.00

18.18

8.93

1.56

0.00

18.75

15.83

1.56

0.00

Freq LB

31.11

6.67

3.86

27.50

0.00

 

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Freq UB

43.75

2.00

18.46

50.00

18.18

0.00

 

18.33

20.50

17.92

34.00

34.00

18.33

18.33

Freq Random

36.67

12.50

10.00

38.33

8.93

0.00

18.33

 

9.29

8.67

17.14

17.14

9.29

8.89

Calibrated Bayes

37.14

9.17

0.53

36.67

1.56

0.00

20.50

9.29

 

1.50

16.67

15.71

0.00

1.50

High Alpha Cal Bayes

36.67

8.57

2.00

36.36

0.00

0.00

17.92

8.67

1.50

 

15.71

15.71

1.50

0.00

Calib Ext Sub Bayes

37.50

16.67

16.25

20.00

18.75

0.00

34.00

17.14

16.67

15.71

 

0.00

13.75

16.25

High Alpha CESB

37.50

16.67

16.67

20.00

15.83

0.00

34.00

17.14

15.71

15.71

0.00

 

16.67

15.83

Objective Bayes

36.67

8.80

0.55

36.36

1.56

0.00

18.33

9.29

0.00

1.50

13.75

16.67

 

0.00

High Alpha OB

36.67

8.33

2.06

36.00

0.00

0.00

18.33

8.89

1.50

0.00

16.25

15.83

0.00

 


Average Bookie Gain Rate


Method

Rate

Freq LB

5.318071

Objective Bayes

11.03681

High Alpha OB

11.06667

High Alpha Cal Bayes

11.12413

Freq Plug In

11.37961

Calibrated Bayes

11.55629

Uncalibrated Bayes

11.95981

Oracle

13.74633

Freq Random

15.01339

High Alpha CESB

17.05671

Calib Ext Sub Bayes

17.13004

Freq UB

22.60077

Extreme Sub Bayes

34.7634

Random

37.24441


2 views0 comments

Recent Posts

See All

Commentaires


bottom of page