
I am an MBA Student and I am trying to understand how to calculate the Confidence Interval for an Odd's Ratio. Our professor gave us this link (https://www.ncbi.nlm.nih.gov/books/NBK431098/) which contains the formula for the Confidence Interval of the Odd's Ratio :

  • Upper 95% CI = e ^ [ln(OR) + 1.96 sqrt(1/a + 1/b + 1/c + 1/d)]
  • Lower 95% CI = e ^ [ln(OR) - 1.96 sqrt(1/a + 1/b + 1/c + 1/d)]

I tried to work out the same formula but I could not get the same answer. For example:

Suppose a "Treatment" results in "A" people "Dying" and "C" people "Surviving" - there are "n1" people who were given the treatment. And suppose the "Control" results in "B" people "Dying" and "D" people "Surviving" - there are "n2" people who were given the treatment.

  • The Odd's Ratio = (a/c)/(b/d)

Since the Confidence Interval is related to the Variance, we need to find out the Variance of "(a/c)/(b/d)". In other words, what is VAR((a/c)/(b/d)). Using the laws of total variance, I think VAR((a/c)/(b/d)) = VAR(a/c) + VAR(b/d). And seeing that since these are basically proportions, I think that (based on the law of the variance of proportions):

  • VAR(Odd's Ratio) = VAR((a/c)/(b/d)) = VAR(a/c) + VAR(b/d) = [((a/c)(1-a/c))/n1] + [((b/d)(1-b/d))/n2]

But as we can see, my formula does not match the formula in the link.

Can someone help me out and show me what I might be doing incorrectly?


Here is the derivation using the delta method. Let's look at the familiar $2\times2$-Table below.

2x2 Table

Suppose that $\theta = f(p_{11},p_{12},p_{21},p_{22})$ where $p_{ij}$ is defined as in the table above.

The Odds Ratio is defined as $$ \theta=\mathrm{OR}=\dfrac{p_{11}p_{22}}{p_{21}p_{12}} $$ We want to derive the variance of $\theta$. The multivariable version of the delta method is: $$ \operatorname{Var}(\hat{\theta})\approx \nabla f(p_{11}, p_{12}, p_{21}, p_{22})\cdot \operatorname{Cov}(p_{11}, p_{12}, p_{21}, p_{22})\cdot \nabla f(p_{11}, p_{12}, p_{21}, p_{22})^{T} $$ Where $\nabla$ is the gradient vector. That is: $$ \nabla f(p_{11}, p_{12}, p_{21}, p_{22}) = \left(\frac{\partial f}{\partial\,p_{11}}, \ldots,\frac{\partial f}{\partial\,p_{22}}\right) $$ We want to estimate $$ \operatorname{Var}(\log(\mathrm{OR}))=\mathrm{Var}\left[\log\left(\frac{p_{11}p_{22}}{p_{21}p_{12}}\right)\right] $$ Let the function $f$ be $$ f = \left[\log(p_{11}) + \log(p_{22}) - \log(p_{21}) - \log(p_{21})\right] $$ The gradient $\nabla f$ is $$ \nabla f = \left(\frac{1}{p_{11}},-\frac{1}{p_{12}},-\frac{1}{p_{21}},\frac{1}{p_{22}}\right) $$ The variance covariance matrix for a multinomial distribution with $c=4$ categories is $$ \Sigma=\frac{1}{n}\left( \begin{array}{cccc} \left(1-p_{11}\right) p_{11} & -p_{11} p_{12} & -p_{11} p_{21} & -p_{11} p_{22} \\ -p_{11} p_{12} & \left(1-p_{12}\right) p_{12} & -p_{12} p_{21} & -p_{12} p_{22} \\ -p_{11} p_{21} & -p_{12} p_{21} & \left(1-p_{21}\right) p_{21} & -p_{21} p_{22} \\ -p_{11} p_{22} & -p_{12} p_{22} & -p_{21} p_{22} & \left(1-p_{22}\right) p_{22} \\ \end{array} \right) $$ Then $\nabla f\,\Sigma$ equals $$ \nabla f\,\Sigma=\frac{1}{n}\times \left[1, -1, -1, 1\right] $$ Now we need $(\nabla f\,\Sigma)\times \nabla f^{T}$ which equals: $$ (\nabla f\,\Sigma)\times \nabla f^{T}=\frac{1}{n}\times \left[\frac{1}{p_{11}} +\frac{1}{p_{12}}+\frac{1}{p_{21}}+\frac{1}{p_{22}}\right] $$ Substituting the MLEs for $\widehat{p_{ij}}=n_{ij}/n$ finally yields $$ \widehat{\operatorname{Var}(\log(\operatorname{OR})}=\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}} $$ So the approximative standard error for the relative risk on the log-scale is $$ \widehat{\operatorname{SE}(\log(\operatorname{OR})}=\sqrt{\widehat{\operatorname{Var}(\log(\operatorname{OR})}}=\sqrt{\frac{1}{n_{11}}+\frac{1}{n_{12}}+\frac{1}{n_{21}}+\frac{1}{n_{22}}} $$ So an approximative two-sided confidence interval of level $\alpha$ for the relative risk on the original scale is $$ \mathrm{CI}=\exp(\log(\operatorname{OR})\pm z_{1-\alpha/2}\times \operatorname{SE}(\log(\operatorname{OR})) $$

There is actually a section on this in the book Practical Guide to Logistic Regression by Joseph Hilbe on Pages 25-26. They derive a function here that is also in the LOGIT package as the toOR function.

toOR <- function(object, ...) { 
  coef <- object$coef 
  se <- sqrt(diag(vcov(object)))
  zscore <- coef / se 
  or <- exp(coef) 
  delta <- or * se 
  pvalue <- 2*pnorm(abs(zscore),lower.tail=FALSE) 
  loci <- coef - qnorm(.975) * se 
  upci <- coef + qnorm(.975) * se 
  ortab <- data.frame(or, delta, zscore,
                      pvalue, exp(loci), exp(upci)) 
  round(ortab, 4)

If you load the LOGIT library and the medpar dataset from their package, you can test this out yourself with the following code below:

smlogit <- glm(died ~ white + los + factor(type), 
               family = binomial, data = medpar) 

Which gives you the confidence intervals you want on the far right.

                  or  delta  zscore pvalue exp.loci. exp.upci.
(Intercept)   0.4885 0.1065 -3.2855 0.0010    0.3186    0.7490
white         1.3569 0.2835  1.4610 0.1440    0.9010    2.0436
los           0.9635 0.0075 -4.7747 0.0000    0.9488    0.9783
factor(type)2 1.5163 0.2184  2.8900 0.0039    1.1433    2.0109
factor(type)3 2.5345 0.5789  4.0716 0.0000    1.6198    3.9657

Dipetkov has also kindly mentioned in the comments an alternative from the 'broom' package if you are interested as well.

