p-value for a fair coin/dice

Question

1.If I toss a coin for 10 times and 9 times are tails. Ask if the coin is fair.

My method is that the null hypothesis is that the coin is fair ($p_0=p_1=\dfrac12$), alternative hypothesis is it's not fair. The test statistic is the number of tails.

And p-value is calculated by $C_{10}^9(\dfrac12)^9(\dfrac12)+C_{10}^{10}(\dfrac12)^{10}$.

But I was told by others I should choose a two sided p-value, so the result should be doubled (i.e. $[C_{10}^9(\dfrac12)^9(\dfrac12)+C_{10}^{10}(\dfrac12)^{10}]\times 2$). Is he right?

2.Now consider roll a dice for 10 times and 9 times is 1 and 1 time is 6. Ask if the dice is fair.

My method is the null hypothesis is that the dice is fair ($p_1=p_2=p_3=p_4=p_5=p_6=\dfrac16$), alternative hypothesis is it's not fair. The test statistic is the frequency of the most frequent side.

And p-value is calculated by $[C_{10}^9(\dfrac16)^9(\dfrac56)+C_{10}^{10}(\dfrac16)^{10}]\times 6$ (i.e. the possibility of one side appearing more than 9 times). Is it right?

Any p-value depends on (a) the null hypothesis; (b) the alternate hypothesis; and (c) the test statistic. Please tell us what you have assumed about (a), (b), and (c). — whuber, Commented Aug 26, 2022 at 21:42
Thank you for the additional information. But please tell us what your test statistic is: my hope is that in doing that you will see how to answer both parts of your question. — whuber, Commented Aug 26, 2022 at 22:13
@whuber Thank you very much, but I'm not an expert at statsitics. I only know vague concept of p-value. Can 'tails appear more than 9 times' and 'one side of the dice appear more than 19 times' be counted as a statistic? — user900476, Commented Aug 26, 2022 at 22:19
That's why we have threads covering those concepts. See stats.stackexchange.com/questions/31 about p-values. A similar search turns up answers to your first question at stats.stackexchange.com/questions/171451. — whuber, Commented Aug 26, 2022 at 22:26
@whuber Thank you. I revised my question. Do you think the test statistics are reasonable? — user900476, Commented Aug 26, 2022 at 22:32

Glen_b · Accepted Answer · 2022-08-28 00:49:16Z

You're doing a one-tailed test with the coin, even though the original question of interest was about fairness, for which the alternative would not be about only a preponderance of tails.

That is, you seem to have chosen a specific direction to test based on what you found in the data -- the same data you want to use in the test. This means the correct p-values will not be what you calculate. Such data-snooping is a particular form of p-hacking. See https://en.wikipedia.org/wiki/Testing_hypotheses_suggested_by_the_data for specific discussion of this issue.

Your associate is correct - if your wanted to test fairness of the coin, you'd double that p-value calculation you did.

The easy way to see the issue is to simulate running a large number of experiments when $H_0$ is true and calculate that your rejection rate (at some chosen significance level) is considerably higher than the one you specified when you choose the direction of the test post hoc -- though in the coin example you could do the calculations by hand easily enough. You can avoid specifying a significance level and do it directly with the p-value if you keep in mind its definition.

Similar comments apply to your dice example (you're choosing which specific face to test for an excess of by looking at the data); formulated in terms of just that test statistic, yes you would multiply by 6.

However there are many ways for dice to be unfair that your specific example glosses over; it's only going to tend to have good power if the specific form of unfairness is a higher proportion on a single face. If for example, you might have wanted to conclude that the die would be unfair if your die was biased so that it likely to produce an excess of all the values between 4 and 6, or if it was biased to produce very few 1's, then your test statistic may not be a good choice; it won't have very good power against those kinds of biases. Your specific test statistic would make sense of your original question of interest had not been about unfairness in general but about that very specific form of bias, before seeing the data.

A common omnibus test for fairness against all alternatives would be the chi-squared test (multinomial goodness of fit).

Typically, also, you'd want a lot more than 10 tosses to test a die for fairness unless you're only concerned about huge effect sizes. (If your sample sizes will be very small, then it does help to pre-specify a specific alternative if that makes sense to do in the context.)

I'd suggest something well over a hundred tosses, and likely considerably higher - perhaps well into the hundreds. Similarly with the coin; ten tosses will only pick up very severely biased coins/coin tossing methods. I'd look at perhaps something closer to a hundred tosses; even then you'll tend to miss deviations that you might find of concern, depending on why the coin needs to be fair.

The correct procedure is to formulate your null and alternative hypothesis before you even collect the data you want to use for inference, so that the data cannot influence what things you decide to test. If you really want to use some data to choose your hypotheses, you then use different data (data which did not participate in the choice of hypothesis) to test the hypothesis.

Thank you! I want to directly compare my p-value to the significance level of 0.05. But I was told by others I cannot directly set a significance level without specifying its rejection region. Is he right? But if I set a specific rejection region the significance level cannot be 0.05 unless I introduce randomized test, which I know nothing about. To avoid using randomized test, I select the rejection region to be {X=0,1,2,8,9,10} where X is the number of the tails in the 10 trials, and the significance level is $\dfrac{112}{2^{10}}$. — user900476, Commented Aug 28, 2022 at 23:55
But in this case I don't need to calculate p-value since X=9 directly falls into the rejection region. Is it right? So in this case I should decide my rejection region first instead of calculating p-value right? — user900476, Commented Aug 28, 2022 at 23:55
You can either make your decision rule by setting a rejection region for your statistic (in this case $X$), chosen so that you attain some suitable type I error rate, or you can choose an available significance level (for suitably defined rejection regions) and compare $p$ with $\alpha$. When correctly carried out, the two approaches are equivalent, it doesn't really matter which you do. You can ignore all that and simply compare $p$ with some arbitrary significance level like $0.05$ but you won't actually have a $5\%$ test. In some cases the actual significance level may be far lower. ...ctd — Glen_b, Commented Aug 29, 2022 at 0:11
ctd ... I've seen people (including in published papers) ignore the discreteness of some statistics and carry out tests with a type I error rate of exactly $0$ (and, naturally, power of $0$; i.e. they cannot reject no matter what sample is observed). ... naturally, whether you're setting an explicit rejection region or comparing $p$ to an available $\alpha$, the choice of which parts of the sample space make up the rejection region matters (you want power to identify effects of interest). — Glen_b, Commented Aug 29, 2022 at 0:13

Stack Exchange Network

p-value for a fair coin/dice

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
p-value
or ask your own question.

Linked

Hot Network Questions

p-value for a fair coin/dice

1 Answer 1

Not the answer you're looking for? Browse other questions tagged hypothesis-testingp-value or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
p-value
or ask your own question.