Understanding PRIOR option in SCORE statement for PROC LOGISTIC (SAS)

Ask Question

Asked 3 months ago

Modified 2 months ago

Viewed 35 times

Say I have a binary response which I want to model with logistic regression on covariates $x$. Fitting a model with PROC LOGISTIC will fit MLE coefficients for the model

$$ \text{logit}(\pi) = \alpha + \beta' x $$

where $\pi = \pi(x) = \mathbb{P}(y=1|x)$.

If we build this model on a training dataset, we can use OUTMODEL to save the model information. Say we then want to get predicted probabilities on a new dataset, we can use INMODEL and the SCORE statement to score this dataset.

Now, I've noticed that there is an option PRIOR/PRIOREVENT within the SCORE statement, which allows one to specify a prior probability for the event $y=1$. I understand why this is useful; for example if the class proportions in the training dataset is very different to the true proportions in reality (e.g. rare disease classification).

SAS documentation says "By specifying the correct priors, the posterior probabilities are adjusted appropriately." So using this option should adjust the predicted probabilities.

My questions is: how is this adjustment made?

I have the following theory inspired by this article, but can't find anything in the SAS documentation to confirm this:

Let $p_\text{train}$ be the proportion of events in the training data, and let $p_\text{prior}$ be the prior probability of the event. Then at the log odds level, we make the adjustment: $$ \alpha + \beta' x + \left(\log\left(\frac{p_\text{prior}}{1-p_\text{prior}}\right) - \log\left(\frac{p_\text{train}}{1-p_\text{train}}\right)\right). $$ In other words, our adjusted predicted (posterior) probabilities become $$ \text{logit}^{-1}(\alpha + \beta' x + \left(\log\left(\frac{p_\text{prior}}{1-p_\text{prior}}\right) - \log\left(\frac{p_\text{train}}{1-p_\text{train}}\right)\right). $$

EDIT: I found here that SAS gives the adjusted probabilities as: $$ \mathbb{P}(y=i|x) = \frac{\mathbb{P}_0(y=i|x)\cdot \frac{p_n(y=i)}{p_0(y=i)}}{\sum_j \mathbb{P}_0(y=j|x)\cdot \frac{p_n(y=j)}{p_0(y=j)}}, $$ where $p_0(y=i)$ is the old class probability (from the proportion in the training set) and $p_n(y=i)$ is the new prior specified with PRIOR, and $\mathbb{P}_0$ denotes the predicted probabilities without adjustments.

I find this expression interesting, and I think it might actually be equivalent to the answer I suggested above, but I don't have a proof. However, one can take this expression, use it to calculate the odds, simplify and apply Bayes' rule to get that the (adjusted) odds = data likelihood ratio multiplied by odds ratio of $y$ prior. This to me (given the article linked earlier) suggests that they are equivalent.

edited May 21 at 4:00

asked Apr 17 at 12:08

cpahanson

233 bronze badges

Add a comment |

Stack Exchange Network

Understanding PRIOR option in SCORE statement for PROC LOGISTIC (SAS)

0

Browse other questions tagged
logistic
predictive-models
prior
sas
or ask your own question.

Hot Network Questions

Understanding PRIOR option in SCORE statement for PROC LOGISTIC (SAS)

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged logisticpredictive-modelspriorsas or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
logistic
predictive-models
prior
sas
or ask your own question.