1
$\begingroup$

I've got a big list of data that is split by industry, and 'keyword' within this. I'm trying to do an average per industry, as well as an average for overall, and wanted to check what is the most accurate way of finding averages here. I've checked other similar questions and couldn't quite be sure of the answer.

So I have data split as follows:

So the first Overall Average in the Score column is the average of all the numbers overall, the second is average after doing an average of each keyword, the third is the average of all the scores within an industry, and the final column is per keyword grouping per industry.

Sample data

Obviously each of these comes up with a different average, but I want to be able to say:

  • The average for finance is XX
  • The average overall is XX

There will not necessarily be the same number within each industry or keyword.

Could you please let me know which way is the most acccurate to show an overall average, and an average of each industry?

Please let me know if that's not clear. Thank you in advance.

$\endgroup$

2 Answers 2

1
$\begingroup$

You are right to suspect that you can't simply average averages.

You have computed the average for the industries as

Finance    5
Grocers    2
Football 4.6

You cannot then average those by adding and dividing by 3, since the industries have different weights. There are $5$ entries for Finance but only $2$ for grocers and $4$ for football. The correct way to average them is $$ \frac{5 \times 5 + 2 \times 3 + 4 \times 4.5} {5 + 2 + 4}. $$

Then you will get the same overall average: $4.7$, so no new information.

You can report the averages per industry separately, you just can't average them.

By the way, if you do any of that you should stick to one digit after the decimal point. All the others are phony precision.

$\endgroup$
0
$\begingroup$

It highly depends on what kind of average you calculate and what exactly you want to say with this statistic. Are your averages per industrie already weighted and do you want to have the same weighting on the total averages? If your industrie figures just state the average per observed company, you should probably weigh the averages per industry with the number of companies observed per industry to get:

average_per_observed_companie= (sum over avg_industrie * num_observed_companies_industrie) / (sum over num_observed_companies_industrie) average_per_observed_companie= (sum over avg_industrie * num_observed_companies_industrie) / (sum over num_observed_companies_industrie)

But if you rather (for example) want to state numbers of an average company found in the wild, you also have to take into account how the companies are distributed over all the industries (probably something you need to get external data for). So there can't be no plain answer for your question, since it depends on what the statistic is intended for.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .