Normal distribution is stated to have a symmetrical bell-shaped curve, with mean = median = mode, with data point values basically derived from a large sample representative of the population.
But, real world data is biased (skewed); say about income (including unaccounted one, hence unrecorded) of a 3rd world country, and recorded only by a 'suitable' survey.
But, a normal distribution is symmetrical. And this property cannot be satisfied for the above example of a large survey based sample of income of people, in a country. The skewness is inherent; inspite of howsoever large the sample might be, as a significant proportion of the population.
Also, for such distribution , we don't have the property of mean = mode = median.
But, given the possibility of a very big sample size, can there be approximation to the normal distribution.
How certain distributions can be treated as normal distribution?
There is seemingly a need to rely on the Central limit theorem, which states to holds even if the original variables are not normally distributed.
The article states the conditions 'commonly' (i.e., they can be relaxed too) to be the random variables to be independent and identically distributed.
For the case at hand, there can be a lot of quantitative (e.g., different taxation systems across regions, loose implementation of rules, flawed governance, corruption, qualification, etc.); and qualitative factors as years of experience, etc.
These factors can be assumed to be independent too, though might not be totally so.
But, need a much thorough look for approximation of the above distribution to normal one.
Request response and links that are more detailed and accessible than the above one.