Consider this simple example where I have 2d data that looks like this:
I'm trying to find the parameters of a Normal distribution that maximizes the difference between two likelihood functions $$L_1-L_2=\prod^{n_i}_{i=1}f(x_i|\mu,\sigma^2)-\prod^{n_k}_{k=1}f(x_k|\mu,\sigma^2).$$ Here, the $X_i$ may be the black points and the $X_k$ may be the red points. I'm looking to find $\theta=\{\mu, \sigma^2\}$ that makes the joint density of the black points as large as possible while keeping the joint density of the red points as small as possible.
I already tried another approach by instead maximizing $$\frac{L_1}{L_2}=\frac{\prod^{n_i}_{i=1}f(x_i|\mu,\sigma^2)}{\prod^{n_k}_{k=1}f(x_k|\mu,\sigma^2)}$$ which led me to $$\hat\mu=\frac{\sum^{n_i}x_i-\sum^{n_k}x_k}{n_i-n_k}$$ and $$\hat\sigma^2=\frac{\sum^{n_i}(x_i-\mu)^2-\sum^{n_k}(x_k-\mu)^2}{n_i-n_k}.$$ However, $\hat\sigma^2$ can quickly become negative, which confused me. (I worked with the univariate case to not further complicate things).
So now I'm trying to maximize the difference between the two likelihood functions stated in the beginning. My question is: Is there any mathematical "trick" that simplifies this problem? For the derivation of the standard ML estimators taking logs simplifies the problem a lot but in this case, it does not make much sense given that I no longer have a pure product anymore.
Any hint towards related questions/works/papers is also appreciated!