0
$\begingroup$

I perform Bayesian inference on a mixture model such that $\mu$ is the mixture weight for a feature in the mixture

$p(x | \mu, \theta) = \mu p_{f}(x|\theta) + (1-\mu)p_{nf}(x|\theta)$

I have prior $p(\theta)$ and an implicit likelihood function $\mathcal{N} (f(\mu, \theta), \epsilon)$ where f is a deterministic simulator.

So basically, $p_{f}(x|\theta) = \mathcal{N} (f_{f}(\mu, \theta), \epsilon)$ and $p_{nf}(x|\theta) = \mathcal{N} (f_{nf}(\mu, \theta), \epsilon)$.

I calculate $T(\mu) =-2* ( \; \log p(x|\mu=0, \theta(\mu=0)) - \log p(x | \mu^*, \theta^*) )$ where x is fixed, by replacing $p(x | \mu, \theta) = p(\mu, \theta|x) / p(\mu, \theta)$. The posterior is amortized, so I can condition it on any simulated observation. I only have one observation $x_{obs}$ and I'm trying to plot its p value over the $\chi^2$ distribution over simulated x's, in order to assess which model fits better to it.

Here,

$\theta(\mu=0) = \arg max_{\theta} p(x | \mu=0, \theta) $ (null hypothesis)

$\mu^{*}, \theta^{*} = \arg max_{\mu,\theta} p(x | \mu, \theta) $

I have access to the posterior distribution $p(\theta|x)$ that I can sample from and compute the density of.

I calculate the LLR statistic for several $x_{obs}$ and several simulated x by sampling from the prior of $\theta$ as $x= f(\mu, \theta)$. https://towardsdatascience.com/the-likelihood-ratio-test-463455b34de9 tells me that I should obtain a chi2 distribution with dof being the difference between the model parameters of the whole model- null hypothesis. However, this is what I obtain for two separate models with dof 6.

enter image description here enter image description here

I get a chi2 of degrees of freedom 19 and not a chi2 for the second model. I would expect a chi2 of degrees of freedom 6. What am I missing?

$\endgroup$
5
  • 1
    $\begingroup$ Your distribution in the second plot doesn't look even remotely "uniform," so please explain what you mean by this and how you are reading your plot. Note, too, that the theory is purely asymptotic, which means it's important to disclose how much data you have. Finally, the result is not expected to follow a chi-squared distribution when the MLEs are on the boundary, which easily can happen with mixture models, so it's useful to display summary statistics or graphics about the estimates. $\endgroup$
    – whuber
    Commented Jun 5 at 22:53
  • $\begingroup$ I agree. It does not look uniform. With that, I only meant to say it doesnt look like a chi2.........I have around 1700 LRs for the first plot, and 150 for the second.........could you please elaborate on this "MLEs are on the boundary".......also what summary stats and graphics would be useful to say more? My bad for the naive questions. I'm not a stats expert at all. $\endgroup$ Commented Jun 5 at 23:09
  • 2
    $\begingroup$ 1. "I do not have access to the likelihood." How can this be? Your first equation is the probability of $x$ given the parameters, which is the same as the likelihood of the parameters given $x$. 2. "I calculate the LLR statistic for several x by sampling from the prior of theta as x= f(μ, theta)." This is not even remotely like how you calculate a likelihood ratio, so I am not surprised you aren't getting distributions that look like those of the likelihood ratio. 3. Please format your math using Mathjax. $\endgroup$
    – jbowman
    Commented Jun 6 at 1:03
  • $\begingroup$ 1. Well, basically I have an implicit likelihood. I train a normalizing flow to directly obtain the posterior distribution using the pairs of (f(theta)+gaussian noise, theta). I guess in some way i do have access to the likelihood. let me change that. How I implement the mixture model is by sampling from a bernoulli's distribution with a certain prob (free param) and the outcome decides which model the network is trained on. so the network sees varying proportions of the two models. $\endgroup$ Commented Jun 6 at 7:52
  • $\begingroup$ 2. "This is not even remotely like how you calculate a likelihood ratio", basically I'm calculating the likelihood ratio using the equation T(μ) =-2* ( log( p(x|μ=0, theta(μ=0)) - log p(x | μ*, theta*) ). I do this for several x, and plot the distribution. I obtain the x from sampling from the prior and plugging it in the simulator. If this is what you understood, could you please tell me how I can generate the chi distribution? $\endgroup$ Commented Jun 6 at 7:57

0