0
$\begingroup$

Question: I am trying the measure the nugget effect, which is parameterized by $(1-\lambda)$ in the following variance-covariance used to describe the multivariate normal distribution of my n-observations: $\sigma_{x}^{2}[(\lambda)A_{n}+(1-\lambda)\mathbb{I}_{n}]$, where $A_{n}$ is perfectly known, has diagonal values of all 1s, and is positive semi-definite and $\mathbb{I}_{n}$ is the identity matrix. The mean vector, $\mu_{x}$, for the distribution is just the same number repeated n-times. So far I have used the maximum likelihood estimate and the restricted maximum likelihood estimate (REML), but both of them are biased. For my problem, $\lambda$ is restricted to $[0,1]$ though because I am using off the shelf implementations, for my REML estimates, the restriction was lifted (e.g. negative values allowed), but ideally it wouldn't be. Anyways, you can see from the simulation results that both estimators give biased estimates of the true $\lambda$ value. Is there an unbiased estimator I can use for $\lambda$? Is there a minimum MSE estimator for it that might be different from the MLE estimate?

Simulation Results: For each value of $\lambda$ in ${0.0,0.1,0.2,...0.9,1.0}$ I simulated 5000 datasets using my model with known $\mu_{x}$, $\sigma_{x}^{2}$, $\lambda$, and $A_{n}$ and then fit the two methods (ML, REML). In the table, I report the mean of the error, it's variance, and the MSE (mean squared error).

Note: For this table, error is estimated-true, so if the estimate is 1.0, and the true is 0.9, then the error is +0.1.

Lambda used in simulation 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
ML: mean(error) 0.0096 -0.076 -0.11 -0.059 0.050 0.15 0.21 0.21 0.17 0.093 -0.00081
ML: variance(error) 0.00086 0.0022 0.0084 0.017 0.018 0.012 0.0049 0.0015 0.00028 3.8e-05 4.9e-06
ML: MSE 0.00096 0.0081 0.020 0.021 0.021 0.034 0.049 0.047 0.029 0.0088 5.6e-06
REML: mean(error) 0.0058 -0.068 -0.082 -0.027 0.074 0.17 0.22 0.22 0.17 0.094 -0.00033
REML: variance(error) 0.0031 0.0050 0.011 0.017 0.017 0.011 0.0045 0.0013 0.00027 3.7e-05 6.0e-06
REML: MSE 0.0031 0.0097 0.017 0.017 0.022 0.039 0.052 0.048 0.029 0.0088 6.1e-06

If you are interested, further details on the biology problem my data deals with can be found here and here, but I am happy to answer any questions; you do not have to go read those links.

$\endgroup$
4
  • 1
    $\begingroup$ The answer depends on your data. For instance, when you have replicates at zero distance, the usual estimator of their variance (via ANOVA) is an unbiased estimator of the measurement error (from which you can obtain, by subtraction, an estimate of the nugget); but if you don't have replicates, all estimators depend on your model and it's likely there would be an unbiased estimator. $\endgroup$
    – whuber
    Commented Jun 30 at 18:18
  • $\begingroup$ I don't have any replicates, but some pairs of points are closer to together than others; perhaps I could run simulations where I use some k number of point pairs that are closest together, or alternatively, I could pick a distance threshold. What issues should I watch out for if I take this approach, and how would I justify it to reviewer who said because I'm not using the MLE, my estimate isn't trustworthy/useful? I'm assuming for example, that even if my new estimator has a lower MSE, it still would be invalid to use it for hypothesis testing, for example. $\endgroup$ Commented Jun 30 at 19:53
  • 1
    $\begingroup$ I'd also note that even in the setting with replicates, the unbiased estimator won't be strictly non-negative. If it were, it would have to always estimate exactly zero when the truth was zero, and it's not going to do that. $\endgroup$ Commented Jun 30 at 23:26
  • 1
    $\begingroup$ When you only have pairs of nearby points, you cannot distinguish measurement error from the nugget effect. It's not even possible to talk about being unbiased, because you can't even estimate the nugget unless you employ a specific variogram model. $\endgroup$
    – whuber
    Commented Jul 1 at 12:33

0