2
$\begingroup$

I am an MBA Student and I am trying to understand how to calculate the Confidence Interval for an Odd's Ratio. Our professor gave us this link (https://www.ncbi.nlm.nih.gov/books/NBK431098/) which contains the formula for the Confidence Interval of the Odd's Ratio :

  • Upper 95% CI = e ^ [ln(OR) + 1.96 sqrt(1/a + 1/b + 1/c + 1/d)]
  • Lower 95% CI = e ^ [ln(OR) - 1.96 sqrt(1/a + 1/b + 1/c + 1/d)]

I tried to work out the same formula but I could not get the same answer. For example:

Suppose a "Treatment" results in "A" people "Dying" and "C" people "Surviving" - there are "n1" people who were given the treatment. And suppose the "Control" results in "B" people "Dying" and "D" people "Surviving" - there are "n2" people who were given the treatment.

  • The Odd's Ratio = (a/c)/(b/d)

Since the Confidence Interval is related to the Variance, we need to find out the Variance of "(a/c)/(b/d)". In other words, what is VAR((a/c)/(b/d)). Using the laws of total variance, I think VAR((a/c)/(b/d)) = VAR(a/c) + VAR(b/d). And seeing that since these are basically proportions, I think that (based on the law of the variance of proportions):

  • VAR(Odd's Ratio) = VAR((a/c)/(b/d)) = VAR(a/c) + VAR(b/d) = [((a/c)(1-a/c))/n1] + [((b/d)(1-b/d))/n2]

But as we can see, my formula does not match the formula in the link.

Can someone help me out and show me what I might be doing incorrectly?

$\endgroup$
2

2 Answers 2

2
$\begingroup$

Here is the derivation using the delta method. Let's look at the familiar $2\times2$-Table below.

2x2 Table

Suppose that $\theta = f(p_{11},p_{12},p_{21},p_{22})$ where $p_{ij}$ is defined as in the table above.

The Odds Ratio is defined as $$ \theta=\mathrm{OR}=\dfrac{p_{11}p_{22}}{p_{21}p_{12}} $$ We want to derive the variance of $\theta$. The multivariable version of the delta method is: $$ \operatorname{Var}(\hat{\theta})\approx \nabla f(p_{11}, p_{12}, p_{21}, p_{22})\cdot \operatorname{Cov}(p_{11}, p_{12}, p_{21}, p_{22})\cdot \nabla f(p_{11}, p_{12}, p_{21}, p_{22})^{T} $$ Where $\nabla$ is the gradient vector. That is: $$ \nabla f(p_{11}, p_{12}, p_{21}, p_{22}) = \left(\frac{\partial f}{\partial\,p_{11}}, \ldots,\frac{\partial f}{\partial\,p_{22}}\right) $$ We want to estimate $$ \operatorname{Var}(\log(\mathrm{OR}))=\mathrm{Var}\left[\log\left(\frac{p_{11}p_{22}}{p_{21}p_{12}}\right)\right] $$ Let the function $f$ be $$ f = \left[\log(p_{11}) + \log(p_{22}) - \log(p_{21}) - \log(p_{21})\right] $$ The gradient $\nabla f$ is $$ \nabla f = \left(\frac{1}{p_{11}},-\frac{1}{p_{12}},-\frac{1}{p_{21}},\frac{1}{p_{22}}\right) $$ The variance covariance matrix for a multinomial distribution with $c=4$ categories is $$ \Sigma=\frac{1}{n}\left( \begin{array}{cccc} \left(1-p_{11}\right) p_{11} & -p_{11} p_{12} & -p_{11} p_{21} & -p_{11} p_{22} \\ -p_{11} p_{12} & \left(1-p_{12}\right) p_{12} & -p_{12} p_{21} & -p_{12} p_{22} \\ -p_{11} p_{21} & -p_{12} p_{21} & \left(1-p_{21}\right) p_{21} & -p_{21} p_{22} \\ -p_{11} p_{22} & -p_{12} p_{22} & -p_{21} p_{22} & \left(1-p_{22}\right) p_{22} \\ \end{array} \right) $$ Then $\nabla f\,\Sigma$ equals $$ \nabla f\,\Sigma=\frac{1}{n}\times \left[1, -1, -1, 1\right] $$ Now we need $(\nabla f\,\Sigma)\times \nabla f^{T}$ which equals: $$ (\nabla f\,\Sigma)\times \nabla f^{T}=\frac{1}{n}\times \left[\frac{1}{p_{11}} +\frac{1}{p_{12}}+\frac{1}{p_{21}}+\frac{1}{p_{22}}\right] $$ Substituting the MLEs for $\widehat{p_{ij}}=n_{ij}/n$ finally yields $$ \widehat{\operatorname{Var}(\log(\operatorname{OR})}=\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}} $$ So the approximative standard error for the relative risk on the log-scale is $$ \widehat{\operatorname{SE}(\log(\operatorname{OR})}=\sqrt{\widehat{\operatorname{Var}(\log(\operatorname{OR})}}=\sqrt{\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}} $$ So an approximative two-sided confidence interval of level $\alpha$ for the relative risk on the original scale is $$ \mathrm{CI}=\exp(\log(\operatorname{OR})\pm z_{1-\alpha/2}\times \operatorname{SE}(\log(\operatorname{OR})) $$

$\endgroup$
3
  • $\begingroup$ @ CoolSerdash: This is a really cool answer and I want to spend some time trying to understand it! $\endgroup$
    – stats_noob
    Commented Sep 11, 2022 at 19:21
  • $\begingroup$ When you wrote "We want to derive the variance of θ. The multivariable version of the delta method is" .... can you please explain how you were able to use the Delta Method to figure out that Variance(thetha) = Del(p11, p12, p13, p14) * cov(p11, p12, p13, p14) * Del-Transpose(p11, p12, p13, p14)? $\endgroup$
    – stats_noob
    Commented Sep 11, 2022 at 19:23
  • $\begingroup$ @MBA_Grad_Student_2022 Thanks. I'm not sure what you're asking. The multivariate delta method is derived on Wikipedia, for example. Then it's just a matter of calculating the "ingredients" of the delta method. $\endgroup$ Commented Sep 11, 2022 at 20:02
2
$\begingroup$

There is actually a section on this in the book Practical Guide to Logistic Regression by Joseph Hilbe on Pages 25-26. They derive a function here that is also in the LOGIT package as the toOR function.

toOR <- function(object, ...) { 
  coef <- object$coef 
  se <- sqrt(diag(vcov(object)))
  zscore <- coef / se 
  or <- exp(coef) 
  delta <- or * se 
  pvalue <- 2*pnorm(abs(zscore),lower.tail=FALSE) 
  loci <- coef - qnorm(.975) * se 
  upci <- coef + qnorm(.975) * se 
  ortab <- data.frame(or, delta, zscore,
                      pvalue, exp(loci), exp(upci)) 
  round(ortab, 4)
}

If you load the LOGIT library and the medpar dataset from their package, you can test this out yourself with the following code below:

library(LOGIT)
smlogit <- glm(died ~ white + los + factor(type), 
               family = binomial, data = medpar) 
toOR(smlogit)

Which gives you the confidence intervals you want on the far right.

                  or  delta  zscore pvalue exp.loci. exp.upci.
(Intercept)   0.4885 0.1065 -3.2855 0.0010    0.3186    0.7490
white         1.3569 0.2835  1.4610 0.1440    0.9010    2.0436
los           0.9635 0.0075 -4.7747 0.0000    0.9488    0.9783
factor(type)2 1.5163 0.2184  2.8900 0.0039    1.1433    2.0109
factor(type)3 2.5345 0.5789  4.0716 0.0000    1.6198    3.9657

Dipetkov has also kindly mentioned in the comments an alternative from the 'broom' package if you are interested as well.

$\endgroup$
3
  • $\begingroup$ [The following is a personal preference, not a rule] The LOGIT package has been removed from CRAN. This means that it's no longer supported, so I'd suggest not to rely on such packages. You also don't explain where to get the medpar dataset from. $\endgroup$
    – dipetkov
    Commented Sep 11, 2022 at 7:45
  • $\begingroup$ Here is an alternative: Use the broom package. This should work: broom::tidy(smlogit, conf.int = TRUE, exponentiate = TRUE, conf.method = "Wald"). You might be interested in learning about other types of confidence intervals, eg., conf.method = "profile". $\endgroup$
    – dipetkov
    Commented Sep 11, 2022 at 7:45
  • 1
    $\begingroup$ I meant to say in my answer the dataset comes from the LOGIT package. Regardless, I shared the function here so you dont need the library. Thank you for the insight on the tidy function from broom though. $\endgroup$ Commented Sep 11, 2022 at 8:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.