Rational Ignorance: optimal learning from complex mechanistic models
Abstract: In the statistics lore, a model with too many parameters causes problems by overfitting from limited data. One might imagine that a Bayesian approach with a suitably chosen prior would solve this problem; after all, a Bayesian update should be precisely correct, neither under- nor over-fitting from incomplete data. However, the most commonly used uninformative prior, Jeffreys prior, does not achieve this. Instead, models must be carefully pruned to remove extraneous parameters before constructing Jeffreys prior, lest results become biased. But why?
In this talk I will present recent work in which we trace this problem to a ubiquitous structure common to multiparameter models, and demonstrate that an optimal prior can completely avoid introducing this type of bias. We advocate for choosing a prior which maximizes the expected information1 - the mutual information between parameters and their expected data. In the limit of infinite data - in which even irrelevant parameters can be inferred - this protocol yields Jeffreys prior. But with finite data, the optimal prior is very different, lying on lower dimensional edges of the full model which are themselves reduced models, and thereby ignoring irrelevant parameters. Jeffreys prior fails by reweighting this optimal prior by the irrelevant co-volume. We show that variation in this co-volume introduces enormous bias and is the reason that Jeffreys prior cannot be widely used.