0
$\begingroup$

I have a dataset with the loss rates of each contract as dependent variable. As independent variables I have country (four values), profession (5 values) and income (continous variable). I apply binning and create for each independent variable a few monotonic buckets. After I apply target encoding by using the mean loss rate of each bucket as independent variable in regression.

After, I run the multivariate linear regression. If, e.g., contract 1 originates from countries in bucket 1, proffesions from bucket 1, and income category from bucket 2, it means:

loss rate of contract 1 = alpha + beta1mean loss rate of all countries in bucket 1 + beta2mean loss rate of all contracts with professions in bucket 2 + beta3*mean loss rate all contracts withih income bucket 2.

How can I interpret the coefficients and results of the regression? Do I have some bias because the mean loss rate of the bucket depends on the loss rate of each particular contract from this bucket?

$\endgroup$
3
  • $\begingroup$ Why are you binning and exactly how are you doing that? $\endgroup$
    – Peter Flom
    Commented May 6 at 8:57
  • $\begingroup$ I apply binning to catch the possible non-linear relation. I group the contracts based on mean loss rate (e.g., countries 1, 2, and 6 have a similar mean loss rate, that means that all contracts from these countries will be in one bucket; countries 3, 7, have a higher mean loss rate, these will be in second bucket, ...). The problem for me is that I am not sure if it makes much sense to predict the loss rate of a particular contract with the mean loss rate of the bucket where it originates from $\endgroup$
    – Vit123
    Commented May 6 at 9:22
  • $\begingroup$ That's not a good method to catch nonlinearity. Much better to use a spline. $\endgroup$
    – Peter Flom
    Commented May 6 at 10:23

0