2
$\begingroup$

I was reading some basic texts on machine learning where you build a Gaussian model of a generative process from a vector of available data points. To give the contexts and the notations, assume $x_1, x_2, x_n\in\mathbb{R}$ are independent available data points from a Gaussian distribution. You have to estimate $\mu$ (the mean) and $\sigma>0$ (the standard deviation) from these known data points.

Using some maximum likelihood estimator, we can say the problem is basically

$$\max_{\mu, \sigma}\prod_{i=1}^nf_G(x_i)$$ where $f_G(x_i)$ is the Gaussian PDF with the mean and SD. The solution is easy, just the mean and SD of the data points give the optimum.

But I am interested in a more general question where I calculate the joint probability density of $\mu$ and $\sigma$ given the data points? Is there any way to calculate

$$f(\mu, \sigma \mid x_1, x_2, \cdots, x_n)=\frac{F(\mu, \sigma,x_1, x_2, \cdots, x_n)}{f(x_1, x_2, \cdots, x_n)}$$

Of course, we throughout assume that the underlying generative process is Gaussian, but I am stuck with the PDFs. Do I need any additional assumption to answer this question?

$\endgroup$
1
  • $\begingroup$ The parameters $\mu$ and $\sigma$ are unknown constants, so (absent a Bayesian context) I'm not sure how to interpret your last displayed equation. I think you meant to ask for PDFs of estimators, not parameters. I tried to give some relevant distributional information in my Answer to get you on the right track. $\endgroup$
    – BruceET
    Commented Mar 3, 2018 at 1:38

1 Answer 1

1
$\begingroup$

The standard distribution theory for this model with $X_1, X_2, \dots, X_n$ a random sample from $\mathsf{Norm}(\mu, \sigma)$ is as follows:

$$\bar X \sim \mathsf{Norm}(\mu, \sigma/\sqrt{n}),$$ $$\frac{\sum_{i=1}^n(X_i - \mu)^2}{\sigma^2} \sim \mathsf{Chisq}(n),$$ $$\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1),$$ $$ T = \frac{\bar X - \mu}{S/\sqrt{n}} \sim \mathsf{T}(n-1),$$ where $\bar X = \frac 1 n \sum_{i=1}^n X_i,\,$ $E(\bar X) = \mu;\,$ $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X),\,$ $E(S^2) = \sigma^2.$ And finally, for normal data (only) $\bar X$ and $S^2$ are stochastically independent random variables--even though not functionally independent.

$\mathsf{Chisq}$ denotes a chi-squared distribution with the designated degrees of freedom, and $\mathsf{T}$ denotes Student's t distribution with the designated degrees of freedom. You can find formal distributions and density functions of these distributions on the relevant Wikipedia pages.

The first displayed relationship is most often used when $\sigma$ is known and $\mu$ is to be estimated by $\bar X.$ The second relationship is most often used when $\mu$ is known and $\sigma^2$ is to be estimated by $\frac 1 n \sum_{i=1}^n(X_i - \mu)^2.$ These relationships are easily shown using standard probability formulas, moment generating functions, and the definition of the chi-squared distribution.

The last two displayed relationships and the independence of $\bar X$ and $S^2$ are often used when both $\mu$ and $\sigma$ are unknown. Then ordinarily, $\mu$ is estimated by $\bar S,\,$ $\sigma$ by $S^2,\,$ and $\sigma$ by $S$ (even though $E(S) < \sigma).$ Proofs are more advanced and are discussed in mathematical statistics texts.


For the special case $n = 5,\, \mu = 100,\, \sigma=10$ a simulation in R statistical software of 100,000 samples suggests (but of course does not prove) that $\bar X \sim \mathsf{Norm}(\mu, \frac{\sigma}{\sqrt{n}}),\,$ $Q = \frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(4)$ and that $\bar X$ and $S$ are independent. The code below the figure also illustrates $E(\bar X) = 100,\,$ $E(S) < 10.\,$ $E(S^2) = 100,$ and $r = 0,$ within the margin of simulation error (accuracy to two, maybe three significant digits).

enter image description here

set.seed(3218)  # retain for exactly same simulation; delete for fresh run
m = 10^5;  n = 5;  mu = 100;  sg = 10
MAT = matrix(rnorm(m*n, mu, sg), nrow=m)  # m x n matrix: 10^5 samples of size 4
a = rowMeans(MAT)   # m sample means (averages)
s = apply(MAT, 1, sd);  q = (n-1)*s^2/sg^2  # m sample SD's and values of Q
mean(a)
## 100.0139     # aprx E(x-bar) = 100
mean(s);  mean(s^2)    
## 9.412638     # aprx E(S) < 10
## 100.3715     # aprx E(S^2) = 100
cor(a, s)
## -0.00194571  # approx r = 0

par(mfrow=c(1,3))  # enable 3 panels per plot
hist(a, prob=T, col="skyblue2", xlab="Sample Mean", main="Normal Dist'n of Sample Mean")
  curve(dnorm(x, mu, sg/sqrt(n)), add=T, lwd=2, col="red")
hist(q, prob=T, col="skyblue2", ylim=c(0,.18), xlab="Q", main="CHISQ(4)")
  curve(dchisq(x, n-1), add=T, lwd=2, col="red")
plot(a, s, pch=".", xlab="Sample Means", ylab="Sample SD", main="Illustrating Indep")
par(mfrow=c(1,1))
$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .