0
$\begingroup$

I have a list of 1000 samples from a distribution. I can find the lower limit of the 95% CI as it is the 2.5% percentile. In this case, it is zero. I would like to also test the hypothesis that a sample from the distribution would be negative please. I tried counting how many of the samples were negative and dividing this by the number of samples but where, in this case, I would expect to get 2.5%, because the lower bound of the 95% CI is zero, I actually got 12%.

How can I test the hypothesis that a sample from the distribution would be positive please?

In Python:

import numpy as np
import pandas as pd

print('Generate the sample data')
data = pd.DataFrame({'A':[1]*14+[0]*2+[0]*3,
                     'B':[1]*14+[1]*2+[0]*3})

print('sample size: ',len(data))
print('')
print('A B X')
print('1 1',len(data[((data.A==1)&(data.B==1))]))
print('1 0',len(data[((data.A==1)&(data.B==0))]))
print('0 1',len(data[((data.A==0)&(data.B==1))]))
print('0 0',len(data[((data.A==0)&(data.B==0))]))
print('')

# Results
Lower = {}
Media = {}
Upper = {}

# Control Parameters
Runs_Max = 1000
Runs = range(Runs_Max)

BS = len(data)
print('bootstrap size: ',BS)

# Results
I_R = []
    
for R in Runs:
        
    # Bootstrap
    BooP = data.sample(BS, replace=True)
    
    # Data
    X_11 = len(BooP[((BooP.A==1)&(BooP.B==1))])
    X_10 = len(BooP[((BooP.A==1)&(BooP.B==0))])
    X_01 = len(BooP[((BooP.A==0)&(BooP.B==1))])
    X_00 = len(BooP[((BooP.A==0)&(BooP.B==0))])
    
    # Improvement (I) = pB/pA-1
    if X_11+X_10 == 0:
        I_x = 10101 # approx infinity!
    else:
        I_x = (X_11+X_01)/(X_11+X_10)-1
    
    # Results
    I_R.append(I_x)
    
    # CI
    Lower[R] = np.percentile(I_R,  2.5)
    Media[R] = np.percentile(I_R, 50  )
    Upper[R] = np.percentile(I_R, 97.5)

Low = Lower[max(list(Lower.keys()))]
Med = Media[max(list(Lower.keys()))]
Hig = Upper[max(list(Lower.keys()))]

print('I = ',Med,Low,Hig)

print('Hypothesis test')
I_N = [i for i in I_R if i <= 0]
Hyp = len(I_N)/len(I_R)
print('Hyp = ',Hyp)
```
$\endgroup$
3
  • $\begingroup$ For what quantity are you calculating the confidence interval? It’s not clear from your Python code. $\endgroup$
    – Dave
    Commented Feb 26, 2021 at 11:16
  • $\begingroup$ Thanks Dave, it's the quantity described in this question, pB/pA-1: stats.stackexchange.com/questions/506994/… $\endgroup$
    – R. Cox
    Commented Feb 26, 2021 at 11:27
  • $\begingroup$ In the code, its the quantity called "I" for Increase $\endgroup$
    – R. Cox
    Commented Feb 26, 2021 at 11:29

1 Answer 1

1
$\begingroup$

The reason why the reverse hypothesis came out at 12% was that 12% of the estimates were zero. My hypothesis test should have been "<", not "<=". In Python:

print('Hypothesis test')
I_N = [i for i in I_R if i < 0]
Hyp = len(I_N)/len(I_R)
print('Hyp =',Hyp)

print('min =',min(I_R))

Which gives:

Hypothesis test
Hyp = 0.0
min = 0.0

enter image description here

Figure 1, example data

enter image description here

Figure 2, Reverse Hypothesis (RH) against number of Runs for example I

enter image description here

Figure 3, Reverse Hypothesis (RH) against number of Runs for example II

Bootstrapping is conditional on the original sample. The probability that pB<pA cannot really be zero because it is possible that the sample is not representative of the population. The proportion of bootstrap samples for which pB<pA is however zero and this strongly indicates that pB<pA.

$\endgroup$
1
  • $\begingroup$ Is this really a hypothesis test and can bootstrapping be used for hypothesis testing? $\endgroup$
    – R. Cox
    Commented Feb 27, 2021 at 13:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.