top of page

Information Theory and Maximum Entropy Inference

Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum-entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.

Jaynes, E. T. (1957). Information Theory and Statistical Mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/physrev.106.620


Objective Bayesians have been looking toward Information Theory as a foundation for statistics for over 67 years, starting less than a decade after Shannon's initial publication of the theory in 1948, yet most practitioners are still taught that Bayesianism must be rooted in subjectivity. While Jaynes' explication of maximum-entropy inference ignores Calibration (in the sense discussed by Williamson 2010), it is clearly not subjective in the sense of de Finetti or Savage; there is no role for personal judgement in the inference - it's as fixed as any physical parameter might hope to be, except this parameter is informational.


Jaynes saw probabilistic inference as a kind of generalization of logic, eschewing the nervous, scientistic demand for empirical observability of frequentist probabilities (which are not really observable anyway [1]) for consistency of reasoning and inference under constraint.


Williamson's strand of Objective Bayesianism brings Jaynes' maximum-entropy inference back together with a Bayesian basis for empirical Calibration. This is in contrast to Gelman's own brand of pragmatic Bayesianism which uses model checking to validate calibration expectations (Gelman et al 2013) from within an otherwise subjective-ish Bayesian statistical practice (his comments in 2012 notwithstanding).


I'm interested in both of these approaches: I think Williamson's approach has more philosophical appeal and coherence, but Gelman's approach closely resembles my own Bayesian practice and seems more practicable given the current state of Bayesian computational frameworks. After all, doing a posterior predictive check is easier to motivate to an uninitiated practitioner using PyMC or Stan than MaxEnt/MinXEnt with unfamiliar constraints and which may have to be homebrewed using PyTensor or Tensorflow or something like it. However, given that frequentist approaches are typically much more computationally viable, Williamson's leveraging of frequentist modeling may be the more practical in some circumstances. I would like to run some comparisons for myself here as well.


 

[1] This should be obvious. Consider a biased coin with p(Heads)=1/pi. Any finite sequence of Bernoulli experiments can only yield rational estimates for the Bernoulli parameter, but posited parameter is actually transcendental. Hence the need for an idealization, like a long-run limit. Some take comfort in the idea that even if such parameters are not observable they are nonetheless broadly empirical in that they can be tested (using frequentist testing) as parameters of a data generating process (e.g. Spanos 2013).


I think that this puts the lie to the demand for the kind of radical empiricism often found in the sciences: if we must agree that objectivity is broader than observability, then why should we nonetheless continue to cling to the stats-cultural assumption that anything except unobservable-but-still-totally-"empirical" frequencies are illegitimate? Rubbing elbows with Positivism isn't exactly my conception of learning from bad ideas. I think that this assumption is an unfortunate remnant of the Bayesian abandonment of Calibration just as the sciences turned to Calibration-centric frequentist inference for the appearance of legitimacy in noisy inference during the early 20th century. (I am not a historian, and history was my worst subject, but I believe Clayton's book has relevant information to this effect. That, and my own admitted bias.) A reunion of Bayesianism with Calibration is well overdue.


 

Resources


Clayton, A. (2022a). Bernoulli’s Fallacy: Statistical illogic and the crisis of modern science. Columbia University Press.


Gelman, A., & Shalizi, C. R. (2012). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x


Gelman, A., Rubin, D. B., Carlin, J. B., & Stern, H. S. (2013). Bayesian Data Analysis. Chapman & Hall.


Spanos, A. A frequentist interpretation of probability for model-based inductive inference. Synthese 190, 1555–1585 (2013). https://doi.org/10.1007/s11229-011-9892-x


Williamson, J. (2010). In Defence of Objective Bayesianism. Oxford University Press.


9 views0 comments

Comments


bottom of page