Comparing the output distribution of two ML models

Ask Question

Asked 8 months ago

Modified 8 months ago

Viewed 150 times

Consider a regression task (e.g. predicting house prices) with a given train and test sets.

We start with constructing a linear regression model, in which we assume $y_i=X^T\beta+\epsilon$ with $E[\epsilon_i]=0$ and usually $\epsilon\sim\mathcal{N}(0,\sigma^2I)$. As we know the real value of the dependent variable $y$, we can denote the residuals as $e_i=y_i-\hat y_i$. The residuals are estimates of the "real" error $\epsilon$. This is all in the definitions of GLMs.

Next, we construct another model (e.g. XGBoost) for the same task and using the same data. We can calculate the residuals for this model in a similar manner.

Now, with the two models at hand, we would like to assess whether or not the models have the same output distribution - that is, whether or not the sets of residuals have the same distribution. Of course, the two distributions will never be exactly identical (I handle this using equivalence testing approach), but we can test them for a certain extent of similarity/ I can think of some tests for central/dispersion metrics, but as we all know that's not enough.

Unlike GLMs, in other models (especially ensembles) we can't take assumptions on the residual distribution, which is the main issue here. If both models have normal residuals with close enough $\mu,\sigma$, that's one thing; if the parameters differ by much, that's another story. Of course, If one model has (for example) normal residuals $\sim\mathcal{N}(0,1)$ and the other has uniform residuals $\sim\mathcal{U}[-3,3]$, I want to be able to spot this. The same applies for tail differences, as sub-Exponential distros are not sub-Gaussian. I think you get the point.

I had some thoughts, such as:

Using parametric tests for distributions (KS or AD, although both are heavily criticized)
Calculating the KLD and then inferring (a possible problem: there's a distribution only for some cases)
Calculating Empirical CDFs and then measuring the area between them (but how to I test it? I can't simply use the Raju method for ECDFs)
Drawing a QQ plot for the sets of residuals, but then I'm not sure what to do with deviations from the identity line

Any ideas?

edited Nov 13, 2023 at 18:05

asked Nov 13, 2023 at 14:08

Spätzle

4,0321 gold badge12 silver badges30 bronze badges

$\begingroup$ "Have the same distribution" has various meanings depending on what probability assumptions you make and on the details of selecting the models. Could you be more specific about those elements of your question? $\endgroup$
– whuber ♦
Commented Nov 13, 2023 at 14:14
$\begingroup$ Also, the answer is basically assured to be that, no, the distributions are not the same. Is there some kind of downstream task for which you need, say, a p-value? $\endgroup$
– Dave
Commented Nov 13, 2023 at 14:20
1

$\begingroup$ @Dave Of course there will always be some difference, that's why I use equivalence testing (rather than significance). Still' I find it rather difficult to set a statistical starting point to work with $\endgroup$
– Spätzle
Commented Nov 13, 2023 at 14:57
$\begingroup$ @whuber I want to use as few preconditions as possible. Sure, under a normality assumption it is easy to work with - but we cannot assume that. I already have a test for equivalent means for the residuals, so that's some nice start. With this situation of knowing nothing I must go for nonparametrics, hence KS / AD / KLD / ECDF or maybe QQ plot - but then again I'm not sure how should I deal with the deviations in QQ (i.e. how to measure distance from the line $y=x$ and what to do with it). $\endgroup$
– Spätzle
Commented Nov 13, 2023 at 15:30
1

$\begingroup$ The problem is that the procedure you use partially determines the distribution of the residuals. You can't make much progress unless you specify some strong conditions. But regardless, unless the procedures always give identical results, by construction they will induce (at least slightly) different distributions of the residuals. In this sense the question is meaningless. Perhaps it ought to be rephrased as a question of how the residual distributions might differ due to the fitting procedure. $\endgroup$
– whuber ♦
Commented Nov 13, 2023 at 16:31

| Show 2 more comments

Stack Exchange Network

Comparing the output distribution of two ML models

0

Browse other questions tagged
distributions
nonparametric
residuals
model-comparison
empirical-cumulative-distr-fn
or ask your own question.

Hot Network Questions

Comparing the output distribution of two ML models

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged distributionsnonparametricresidualsmodel-comparisonempirical-cumulative-distr-fn or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
distributions
nonparametric
residuals
model-comparison
empirical-cumulative-distr-fn
or ask your own question.