0
$\begingroup$

So right now i'm reading this paper: Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation, available here: https://arxiv.org/abs/2208.03217

In brief, they use a latent representation of the input to determine whether or not it is out of distribution using the mahalanobis distance between the input and the training data distribution, which is thus assumed to be a multivariate normal law.

And so i understand what represents the mahalanobis distance, and that it should follow a chi square two law if x follows a normal law. So the common way of estimating confidence interval is to find alpha such as: Chi(X < alpha) = 0.95 for example and it makes total sense. But in the paper it feels as if they're using the distance such as 0.95 percent of the training data has a smaller distance. Then any input whose mahalanobis distance is bigger is classified as out of distribution.

My question is then: Isn't it weird to not use the chi squared law, because it would be like assuming that a normal distribution does not represent the data in the first place, and as such it means that using the mahalanobis distance makes no sense, no ?

(Bonus Question, not really needed) Thus to estimate if it was indeed interesting to use a gaussian distribution in the first place it would have been good to do chi square test, but since the variable are continuous, is there a way to do so ?

Thank you in advance,

$\endgroup$

0

You must log in to answer this question.