0
$\begingroup$

Normal distribution is stated to have a symmetrical bell-shaped curve, with mean = median = mode, with data point values basically derived from a large sample representative of the population.

But, real world data is biased (skewed); say about income (including unaccounted one, hence unrecorded) of a 3rd world country, and recorded only by a 'suitable' survey.

But, a normal distribution is symmetrical. And this property cannot be satisfied for the above example of a large survey based sample of income of people, in a country. The skewness is inherent; inspite of howsoever large the sample might be, as a significant proportion of the population.

Also, for such distribution , we don't have the property of mean = mode = median.

But, given the possibility of a very big sample size, can there be approximation to the normal distribution.

How certain distributions can be treated as normal distribution?

There is seemingly a need to rely on the Central limit theorem, which states to holds even if the original variables are not normally distributed.

The article states the conditions 'commonly' (i.e., they can be relaxed too) to be the random variables to be independent and identically distributed.

For the case at hand, there can be a lot of quantitative (e.g., different taxation systems across regions, loose implementation of rules, flawed governance, corruption, qualification, etc.); and qualitative factors as years of experience, etc.

These factors can be assumed to be independent too, though might not be totally so.

But, need a much thorough look for approximation of the above distribution to normal one.

Request response and links that are more detailed and accessible than the above one.

$\endgroup$
15
  • 5
    $\begingroup$ A normal distribution is symmetrical. Not all observed distributions are symmetrical. Income, for example, is generally skewed left due to income inequality. $\endgroup$
    – pancini
    Commented Apr 22 at 21:44
  • 1
    $\begingroup$ Many distributions can be sort of approximated by a normal curve, even if it's not actually normal. Income is certainly not a normal distribution since it can't be negative and usually is not symmetric, but the mean and standard deviation still give some rough idea. $\endgroup$
    – aschepler
    Commented Apr 22 at 21:53
  • 2
    $\begingroup$ There is SOO much written and proven about normal distributions that answer all of these question that it makes little sense to re-hash them on this site. $\endgroup$ Commented Apr 22 at 22:29
  • 1
    $\begingroup$ No, what I want to emphasize is that $\text{Dist}(X+Y)\neq\text{Dist}(X)+\text{Dist}(Y)$. $\endgroup$
    – Gonçalo
    Commented Apr 22 at 23:41
  • 1
    $\begingroup$ The CLT gives sufficient conditions for sampling means to be Gaussian. Sampling errors are often Gaussian too, for completely different reasons. There are also some other places one expects Gaussian distributions; here is one example. $\endgroup$
    – J.G.
    Commented Apr 24 at 18:39

1 Answer 1

1
$\begingroup$

It is a soft question , with a soft answer.

Consider this :

When we take $\pi=22/7$ , it is good enough for school work & even some technical work.
We might take $\pi=355/113$ to get more accuracy in calculations.
According to Sources , NASA uses $3.141592653589793$ which is still not Exact , though good enough for rocketry.

Now we might ask : How can such "small" approximations work out well when we have millions of PI Digits available for use : Answer is that the "target output" makes us choice the approximation. Practical "target output" Cases like Electrical Engineering & Chemistry & Machine Learning & Statistics might make do with $3.14159$

Like-wise :

When we have a large Data-Set & want to analyze it , we might use highly accurate Distributions & moderately accurate Distributions & inaccurate Distributions : Which-ever gives us good "Prediction" with least effort will win.

It appears that in most Cases where "certain Distribution has been treated as normal Distribution" , the Prediction or what ever Parameter was wanted has been given with suitable accuracy. Hence there was no need to use more accurate Distribution.
When the "Prediction" is widely off-target , the Statistician will of course have to use some other more accurate Distribution.

SUMMARY :

Suitability of Normal Distribution is based on the target use.
When target Prediction is not accurate enough , then Suitability of Normal Distribution is rejected & other Distribution will be tried out.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .