How to calculate the p-value of a log-odds ratio, given that the variance depends on the observed frequencies?

Question

I am a bit confused about how people calculate p-value when calculating odds-ratios.

The log-odds ratio (LOR) for a contingency table with two entries is $L = \log \frac{p_{1}}{p_{0}}$ and has an unbiased estimator using sampled frequencies: $\hat{L} = \log \frac{n_{1}}{n_{0}}$. This estimator has asymptotic variance $\sqrt{\frac{1}{n_1} + \frac{1}{n_0}}$, which allows you to assign confidence intervals to the estimated LOR. If you also want to assign a p-value to the observed sample LOR, then you'd need the variance around the null hupothesis of a LOR of zero, which in this case, since $n_1+n_2=N$ and $n_1 = n_0$, is equal to $\frac{2}{\sqrt{N}}$. This is independent of the population parameters since it only depends on the total number of samples, which makes it a pivotal statistic. This means you can shift the distribution to zero to calculate probabilities under the null hypothesis of a LOR of zero, and assign p-values. No problems there.

However The LOR for a contingency table with four entries is $L = \log \frac{p_{11}p_{00}}{p_{10}p_{01}}$ and has an unbiased estimator using sampled frequencies: $\hat{L} = \log \frac{n_{11}n_{00}}{n_{10}n_{01}}$. This estimator has variance $\sqrt{\frac{1}{n_{11}} + \frac{1}{n_{00}} + \frac{1}{n_{01}} + \frac{1}{n_{10}}}$.

While this still allows you to construct a confidence interval, it is (if I understand correctly) no longer a pivotal statistic: the variance depends on the observed frequencies and thus the population parameters.

Still, I see people calculate p-values associated to nonzero LORs (see for example this discussion: How to calculate the p.value of an odds ratio in R?). How is that possible? Am I missing something? Are there hidden assumptions?

EdM · Accepted Answer · 2023-05-28 19:38:12Z

If you use a likelihood-based binomial regression, as suggested by Frank Harrell and Ben Bolker on the page you cite, or use log-linear analysis of counts in a contingency table, the p-values are based on the asymptotic normality of the maximum-likelihood estimator. The test statistic is then a pivotal z-statistic from which confidence intervals can be calculated. There remains a question of whether there are enough cases to be close enough to asymptotic normality, but that's an issue for all maximum-likelihood estimation.

Agresti devotes Chapter 3 of the second edition of Categorical Data Analysis to "Inference for Contingency Tables." Sections 3.5 and 3.6 discuss relative advantages of different methods for small samples, where the highly discrete nature of the data poses particular problems.

Stack Exchange Network

How to calculate the p-value of a log-odds ratio, given that the variance depends on the observed frequencies?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
confidence-interval
variance
p-value
odds-ratio
contingency-tables
or ask your own question.

Linked

Hot Network Questions

How to calculate the p-value of a log-odds ratio, given that the variance depends on the observed frequencies?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged confidence-intervalvariancep-valueodds-ratiocontingency-tables or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
confidence-interval
variance
p-value
odds-ratio
contingency-tables
or ask your own question.