Skip to main content
Notice removed Authoritative reference needed by CommunityBot
Bounty Ended with no winning answer by CommunityBot
Improved tags
Link
gmvh
  • 2.8k
  • 5
  • 26
  • 43
Notice added Authoritative reference needed by ACR
Bounty Started worth 50 reputation by ACR
Source Link
ACR
  • 790
  • 5
  • 19

Who introduced the term hyperparameter?

I am trying to find the earliest use of the term hyperparameter. Currently, it is used in machine learning but it must have had earlier uses in statistics or optimization theory. Even the multivolume Lexicon der Mathematik (Springer) does not have this term.

So far, one can trace up to 1972 in Bayes Estimates for the Linear Model D. V. Lindley, A. F. M. Smith Journal of the Royal Statistical Society. Series B (Methodological), Volume 34, Issue 1 (1972), 1-41. Link

The authors introduce the term hyperparameter with a footnote:

In the present paper we study situations where we have exchangeable prior knowledge and assume this exchangeability described by a mixture. In the example this implies $E\left(\theta_i\right)=\mu$, say, a common value for each $i$. In other words there is a linear structure to the parameters analogous to the linear structure supposed for the observations $\mathbf{y}$. If we add the premise that the distribution from which the $\theta_i$ appear as a random sample is normal, the parallelism between the two stages, for $\mathbf{y}$ and $\boldsymbol{\theta}$, becomes closer. In this paper we study the situation in which the parameters of the general linear model themselves have a general linear structure in terms of other quantities which we call hyperparameters. $\dagger$ In this simple example there is just one hyperparameter, $\mu$.

Footnote

$\dagger$ We believe we have borrowed this terminology from I. J. Good but are unable to trace the reference.

I.J. Good was a statistician turned philosopher but Google Scholar shows no hope that he introduced this term after 1960s.