I have a list of 1000 samples from a distribution. I can find the lower limit of the 95% CI as it is the 2.5% percentile. In this case, it is zero. I would like to also test the hypothesis that a sample from the distribution would be negative please. I tried counting how many of the samples were negative and dividing this by the number of samples but where, in this case, I would expect to get 2.5%, because the lower bound of the 95% CI is zero, I actually got 12%.
How can I test the hypothesis that a sample from the distribution would be positive please?
In Python:
import numpy as np
import pandas as pd
print('Generate the sample data')
data = pd.DataFrame({'A':[1]*14+[0]*2+[0]*3,
'B':[1]*14+[1]*2+[0]*3})
print('sample size: ',len(data))
print('')
print('A B X')
print('1 1',len(data[((data.A==1)&(data.B==1))]))
print('1 0',len(data[((data.A==1)&(data.B==0))]))
print('0 1',len(data[((data.A==0)&(data.B==1))]))
print('0 0',len(data[((data.A==0)&(data.B==0))]))
print('')
# Results
Lower = {}
Media = {}
Upper = {}
# Control Parameters
Runs_Max = 1000
Runs = range(Runs_Max)
BS = len(data)
print('bootstrap size: ',BS)
# Results
I_R = []
for R in Runs:
# Bootstrap
BooP = data.sample(BS, replace=True)
# Data
X_11 = len(BooP[((BooP.A==1)&(BooP.B==1))])
X_10 = len(BooP[((BooP.A==1)&(BooP.B==0))])
X_01 = len(BooP[((BooP.A==0)&(BooP.B==1))])
X_00 = len(BooP[((BooP.A==0)&(BooP.B==0))])
# Improvement (I) = pB/pA-1
if X_11+X_10 == 0:
I_x = 10101 # approx infinity!
else:
I_x = (X_11+X_01)/(X_11+X_10)-1
# Results
I_R.append(I_x)
# CI
Lower[R] = np.percentile(I_R, 2.5)
Media[R] = np.percentile(I_R, 50 )
Upper[R] = np.percentile(I_R, 97.5)
Low = Lower[max(list(Lower.keys()))]
Med = Media[max(list(Lower.keys()))]
Hig = Upper[max(list(Lower.keys()))]
print('I = ',Med,Low,Hig)
print('Hypothesis test')
I_N = [i for i in I_R if i <= 0]
Hyp = len(I_N)/len(I_R)
print('Hyp = ',Hyp)
```