1
$\begingroup$
n   X
1   0,77289
2   -0,20690
3   -1,62976
4   2,13931
5   -0,05032
6   1,62602
7   0,03347
8   1,23017
9   0,94318
10  -0,13439
11  1,27475
12  -0,46833
13  -1,29183
14  0,15840
15  -0,21400
16  0,96476
17  0,44186
18  0,13076
19  0,64583
20  -1,17897
21  -0,46450
22  0,49462
23  -0,82661
24  0,06210
25  -0,06504
26  -1,16634
27  -1,55248
28  -3,31522
29  -0,30336
30  0,62031
31  -0,21778
32  -2,10164
33  0,09509
34  -0,18172
35  0,87899
36  0,82714
37  0,54116
38  -1,40146
39  -1,89213
40  0,14927
41  -0,10478
42  -0,02299
43  0,93190
44  -0,46364
45  1,27699
46  0,74645
47  -0,27361
48  -1,07804
49  0,23890
50  -1,49626
51  1,04261
52  -0,60959
53  -1,59934
54  2,99478
55  0,02980
56  0,67092
57  0,76343
58  0,71883
59  -0,78345
60  -0,71834
61  0,52169
62  -0,58672
63  0,09481
64  0,15371
65  -0,81942
66  -0,59766
67  -1,24847
68  1,03972
69  -0,36787
70  0,70877
71  1,06798
72  1,39779
73  -0,40106
74  0,66422
75  -0,05959
76  -1,55657
77  0,64989
78  0,37951
79  0,90888
80  1,32459
81  0,12405
82  -0,63045
83  0,22994
84  -0,26462
85  0,19480
86  0,66986
87  -1,42513
88  -0,49099
89  -0,82704
90  -0,08567
91  -0,91654
92  -1,32165
93  -1,86533
94  -1,02460
95  0,17815
96  0,52362
97  -0,07853
98  -0,57974
99  -0,75904
100 -0,00847
101 0,79662
102 -1,84104
103 0,14173
104 -0,25872
105 0,41303
106 0,81888
107 -0,34082
108 -0,67793
109 0,62701
110 0,17294
111 0,00622
112 0,04105
113 -0,65415
114 -0,73899
115 1,62401
116 -1,10286
117 -0,56838
118 3,11644
119 1,88621
120 -0,25325
121 -0,24249
122 -1,45579
123 0,42472
124 0,45616
125 -0,48841
126 0,83233
127 -0,16433
128 1,38730
129 0,76333
130 -0,52836
131 1,06204
132 0,41206
133 0,59054
134 1,91364
135 0,93663
136 0,28697
137 -0,68507
138 0,64951
139 -0,21817
140 -0,29455
141 0,36043
142 -0,01763
143 -0,87450
144 -0,07201
145 -0,15924
146 0,20941
147 -0,47654
148 -0,85141
149 -0,73608
150 1,00008
151 -0,95990
152 -0,43059
153 -1,85411
154 1,75877
155 0,81461
156 1,12794
157 -1,09949
158 0,48556
159 1,73074
160 2,02996
161 -1,35557
162 0,55896
163 -1,28307
164 -0,47946
165 0,11027
166 0,48298
167 1,40146
168 -0,43227
169 1,92572
170 -0,72628
171 1,56955
172 0,65008
173 1,17774
174 0,27308
175 0,82124
176 0,83514
177 0,23379
178 -0,06833
179 -0,00233
180 0,12182
181 -1,15345
182 -0,25942
183 0,13763
184 0,56102
185 -0,94772
186 -1,78449
187 -1,33570
188 -0,40206
189 0,67082
190 0,13767
191 1,15434
192 1,47822
193 0,31850
194 -0,16100
195 -0,10134
196 -1,32883
197 -0,55789
198 -0,49393
199 -0,72997
200 0,07370
201 1,10159
202 -0,14544
203 0,47226
204 -0,30343
205 0,07638
206 -0,40837
207 0,31547
208 0,12794
209 -0,14003
210 0,20300
211 0,52942
212 0,21920
213 -0,43748
214 0,92144
215 0,22184
216 -0,20253
217 0,12143
218 0,64640
219 -0,01541
220 0,43218
221 -1,58710
222 -0,67725
223 0,94760
224 -1,28603
225 -0,48058
226 1,38991
227 0,16358
228 1,23918
229 0,38495
230 -2,12082
231 0,01939
232 -0,75405
233 -1,26057
234 -1,46557
235 0,73979
236 -1,08436
237 -0,45896
238 -0,41678
239 0,75080
240 1,97168
241 -0,01327
242 -1,18512
243 0,09635
244 -0,22311
245 1,13665
246 -0,53039
247 1,69011
248 -0,30147
249 1,41778
250 -1,72800
251 -0,40845
252 0,53330
253 0,30271
254 -1,92718
255 -0,35065
256 -1,31563
257 1,43857
258 0,28414
259 0,19013
260 0,31850
261 2,01402
262 -0,28202
263 -0,27099
264 0,59739
265 -1,90653
266 -0,05568
267 0,67735
268 0,35187
269 0,37869
270 -0,03914
271 -0,69399
272 -1,71123
273 -0,68788
274 0,95012
275 -1,89905
276 0,05990
277 -0,18624
278 0,02050
279 -0,14538
280 0,68226
281 -0,81152
282 -0,14722
283 0,05653
284 -0,10012
285 0,39576
286 -0,65206
287 1,49837
288 0,85990
289 1,40269
290 0,24237
291 -0,32974
292 0,11843
293 -0,58618
294 0,01496
295 -0,16980
296 0,28697
297 -1,11360
298 -1,50641
299 0,18398
300 -0,58981

The problem is the following : a) using the data with 300 observations test Null Hypothesis that Varibale X has normal distribution , use Pearson's chi square criteria, divide data into 10 intervals such that there are 30 observations in each.

The real issue is that our lecturer in statistics spent about 5-10 minutes in the end of the lecture talking about that ( about dividing into intervals and so on) but he did not provide a numerical example how the test should be performed and how to calculate expected value for each interval, the point that i did not get quite good. How to solve that problem?

$\endgroup$
3
  • $\begingroup$ Sorry when i pluged in the data it was in column form. $\endgroup$
    – Ilya
    Commented Feb 5, 2017 at 11:07
  • $\begingroup$ I know that working with data in such form is uncomfortable(sorry again) but i answer i am more interested in the process of test, so you may just write steps how to perform the test , with no reference to data( later i would be able to use it much better with help of excel) $\endgroup$
    – Ilya
    Commented Feb 5, 2017 at 11:13
  • $\begingroup$ See en.wikipedia.org/wiki/Pearson%27s_chi-squared_test under "Test for fit of a distribution". $\endgroup$
    – Roland
    Commented Feb 5, 2017 at 12:59

1 Answer 1

0
$\begingroup$

To save myself some serious tedium transcribing your data, provided in a 'user hostile' format, and to avoid doing your actual homework problem, I will generate 300 observations of my own, and comment on some of the steps you need to take. (I use R statistical software throughout.)

Listing of Data. Data, sorted from smallest to largest. (Partial list, to save space.)

x = sort(round(rnorm(300),3))
x
   [1] -2.754 -2.663 -2.622 -2.523 -2.335 -2.243 -2.062 -2.057 -2.024 -1.941
  [11] -1.918 -1.838 -1.820 -1.759 -1.733 -1.732 -1.700 -1.698 -1.644 -1.635
  [21] -1.520 -1.457 -1.402 -1.370 -1.349 -1.343 -1.299 -1.268 -1.256 -1.247
  [31] -1.236 -1.225 -1.189 -1.184 -1.153 -1.152 -1.141 -1.131 -1.118 -1.110
  [41] -1.108 -1.088 -1.081 -1.067 -1.040 -1.010 -0.997 -0.994 -0.992 -0.987
  [51] -0.947 -0.935 -0.921 -0.906 -0.898 -0.888 -0.882 -0.880 -0.875 -0.843
  [61] -0.839 -0.819 -0.815 -0.808 -0.794 -0.793 -0.783 -0.775 -0.773 -0.772
  [71] -0.767 -0.759 -0.759 -0.758 -0.721 -0.716 -0.706 -0.703 -0.703 -0.701
 ...
 [251]  0.950  0.958  1.001  1.019  1.026  1.068  1.086  1.174  1.200  1.208
 [261]  1.209  1.218  1.228  1.231  1.240  1.242  1.245  1.296  1.310  1.315
 [271]  1.321  1.363  1.403  1.406  1.407  1.419  1.422  1.432  1.457  1.459
 [281]  1.467  1.503  1.513  1.515  1.567  1.653  1.690  1.789  1.813  1.822
 [291]  1.941  1.989  2.028  2.117  2.166  2.245  2.273  2.568  2.954  3.477

Determining Intervals and Observed Counts. How to divide your into 10 intervals (or categories or bins) of about 30 each? You can count down the sorted list, or you can find 'deciles' that do the job.

quantile(x, (1:9)/10)
    10%     20%     30%     40%     50%     60%     70%     80%     90% 
-1.2371 -0.8398 -0.6150 -0.3306 -0.0790  0.1808  0.4649  0.7982  1.3156 

According to these numbers, there should be about 30 observations below 1.2371, and another 30 between -1.2371 and -0,8398. Looking at the sorted list, I see 31 in the 1st group, and 29 in the 2nd. The last group has 30. These give you the observed counts.

Estimating Parameters and Computing Expected Counts. Next you need to find the expected counts. Which normal distribution will you use? It seems that you have to estimate $\mu$ and $\sigma$ from the data. For my data, I get $\hat \mu = \bar X = -0.035$ and $\hat \sigma = S_X = 1.029.$

mean(x);  sd(x)
[1] -0.03507
[1] 1.028687

What probability would $\mathsf{Norm}(\mu = -0.035, \sigma = 1.029)$ put into each of the ten intervals $(-\infty, -1.2371)\,$ $(-1,2371,-0.8398), \dots (1.3156, \infty)?$ You can compute that. For the second interval (using software or printed normal tables), the probability is $0.0957,$ and the expected count in that interval is $0.0957(300) = 28.714.$

p = diff(pnorm(c(-1.2371, -0.8398), -0.035, 1.029)); p; p*300
## 0.09571375
## 28.71412

Computing the GOF Statistic and Performing the Test. Next, the formula for the chi-squared goodness-of-fit (GOF) statistic is $$Q = \sum_{i=1}^{10} \frac{(X_i - E_i)^2}{E_i},$$ where the $X_i$ and $E_i$ are the observed and (unrounded) expected counts, respectively. If the expected counts all exceed 5, then $Q \sim \mathsf{Chisq}(\nu = 7).$ The degrees of freedom $\nu$ are would be $\nu = 10-1 = 9$ if we had been given a particular normal distribution, but we have estimated $\mu$ and $\sigma$ so the degrees of freedom are reduced to $\nu = 7.$

You will reject a fit to normal at the 5% level of significance if $Q > 14.07.$ Otherwise, you will say that the data are 'consistent' with a normal distribution.

qchisq(.95, 7)
## 14.06714

Notes: Because my fake data were generated as normal, the expected counts in the ten 'bins' are all about 30, closely matching the observed counts, so $Q$ is small, and we cannot reject a fit to normal.

I doubt your instructor will regard the fact he was in a hurry during the lecture on this topic as a signal you don't need to study your text to master this test. I hope this explanation helps.

$\endgroup$
1
  • $\begingroup$ I have got it already but thanks still $\endgroup$
    – Ilya
    Commented Feb 6, 2017 at 15:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .