0
$\begingroup$

Say I have a binary response which I want to model with logistic regression on covariates $x$. Fitting a model with PROC LOGISTIC will fit MLE coefficients for the model

$$ \text{logit}(\pi) = \alpha + \beta' x $$

where $\pi = \pi(x) = \mathbb{P}(y=1|x)$.

If we build this model on a training dataset, we can use OUTMODEL to save the model information. Say we then want to get predicted probabilities on a new dataset, we can use INMODEL and the SCORE statement to score this dataset.

Now, I've noticed that there is an option PRIOR/PRIOREVENT within the SCORE statement, which allows one to specify a prior probability for the event $y=1$. I understand why this is useful; for example if the class proportions in the training dataset is very different to the true proportions in reality (e.g. rare disease classification).

SAS documentation says "By specifying the correct priors, the posterior probabilities are adjusted appropriately." So using this option should adjust the predicted probabilities.

My questions is: how is this adjustment made?

I have the following theory inspired by this article, but can't find anything in the SAS documentation to confirm this:

Let $p_\text{train}$ be the proportion of events in the training data, and let $p_\text{prior}$ be the prior probability of the event. Then at the log odds level, we make the adjustment: $$ \alpha + \beta' x + \left(\log\left(\frac{p_\text{prior}}{1-p_\text{prior}}\right) - \log\left(\frac{p_\text{train}}{1-p_\text{train}}\right)\right). $$ In other words, our adjusted predicted (posterior) probabilities become $$ \text{logit}^{-1}(\alpha + \beta' x + \left(\log\left(\frac{p_\text{prior}}{1-p_\text{prior}}\right) - \log\left(\frac{p_\text{train}}{1-p_\text{train}}\right)\right). $$

EDIT: I found here that SAS gives the adjusted probabilities as: $$ \mathbb{P}(y=i|x) = \frac{\mathbb{P}_0(y=i|x)\cdot \frac{p_n(y=i)}{p_0(y=i)}}{\sum_j \mathbb{P}_0(y=j|x)\cdot \frac{p_n(y=j)}{p_0(y=j)}}, $$ where $p_0(y=i)$ is the old class probability (from the proportion in the training set) and $p_n(y=i)$ is the new prior specified with PRIOR, and $\mathbb{P}_0$ denotes the predicted probabilities without adjustments.

I find this expression interesting, and I think it might actually be equivalent to the answer I suggested above, but I don't have a proof. However, one can take this expression, use it to calculate the odds, simplify and apply Bayes' rule to get that the (adjusted) odds = data likelihood ratio multiplied by odds ratio of $y$ prior. This to me (given the article linked earlier) suggests that they are equivalent.

$\endgroup$

0

Browse other questions tagged or ask your own question.