I have a question about a non-Gaussian distributed parameter that can only take certain values in a defined interval.
Knowing that I have to define this parameter starting from a set of its values and in the end I must use only average value and tolerance, I am asking myself if the mean value should be calculated in the whole set, or only inside the tolerance.
I'll try to explain my situation more in detail: I know that I have to consider only 84% (this is incorrect! +/-1.5*sigma is 86.6%) of the original set of values (cutting the same percent from the head and from the tail) and those considered should be those who give me the esteem I am looking for. While in the case of a Gaussian I would use avg value
and +/- 1.5 * standard deviation
to have in the end my parameter and its tolerance (yes, in that case I would be a little higher than 84%, but I'm really looking for 84% of the values - also 86.6, not 84 ),
This picture is incorrect. The percentage should be 86.6%
in the current case I must decide whether to calculate an avg value
(weighted by probability of occurrence of the value) on the whole set or on the "cut" set and eventually to decide if it is better to calculate the tolerance as the maximum deviation of the 8th-percentile/92nd-percentile (really the 6.7th and the 93.3rd) from the avg value
or as the average of the deviations of both, or whatever... I am not sure here too.
Below a chart Values vs. Probability
of my parameter (in this case avg value
has been calculated on the original set):
This picture is incorrect. The percentiles should be 6.7 and 93.3
Blue line is a trendline made with Excel, the columns include all the values between those shown in the x-axis and the next one. This representation is maybe not the best one ever, but helps to understand how the distribution goes.
Which are the most correct options?