When using a SCORE statement in PROC LOGISTIC in SAS, I can get fit statistics with FITSTAT. My response variable is binary.
I want to get log likelihood, but looking at this documentation, I'm actually not sure if SAS is using the right formula for log likelihood. This is the formula given in the SAS documentation (my data doesn't have any frequency or weights columns): $$ \log L =\sum_i f_i w_i \log(\hat\pi_i). $$
In particular, the $\hat\pi_i$ in the formula is said to be the 'predicted probability of the observation'. By this do they mean the 'corrected' probability (which is the predicted probability of the observed outcome of $y_i$). See this page for a reference. It's particularly suspicious if you look at the SAS documented binary brier score formula, which seems to suggest that $\hat\pi_i$ is not the corrected probability but just the probability that $y_i=1$ (but perhaps I'm reading that wrong; I'm not too sure about all the $f_i, w_i, n_i$ and $ r_i$ variables).
My understanding is that log likelihood should be: $$ \sum_i y_i \log(p_i) + (1-y_i)\log(1-p_i) $$ where $p_i$ is the predicted probability that $y_i=1$. Or alternatively, $$ \sum_i \log(P_i) $$ where $P_i$ is the 'corrected' probability (i.e. $P_i= p_i$ if $y_i=1$ and $P_i=1-p_i$ if $y_i=0$.)
So basically I'm not sure if I trust SAS here. After posting this I will try to test this on a small dataset to see, but would appreciate any insight.
fitstat
, but it then also takes the predicted probability of the observed outcome as $\pi_i$ which makes the two formulae equivalent (minus normalization constant that is included in the default reported fit statistics). $\endgroup$