0
$\begingroup$

I have a list of documents that belong to a category. Each document has n number of pages. Each page is analyzed with an OCR engine and the time to process each page is saved into a table.

So, I have

CATEGORY | DOC | PAGE | TIME

1           1     1      2
1           1     2      3      
1           2     1      3
1           3     3      4
1           3     2      4
2           5     1      5
...

Now, I am grouping these documents to calculate the average time to process their pages. For instance,

CATEGORY  |  DOC | AVG TIME
    1         1      2.5
    1         2       3
   ...

Now, I'd like to know the actual average to process a page per category. But adding these averages and dividing by the number of docs within that category does not make any sense to me.

What can I do to do this? Can I calculate a weighted average in this case? How?

Thanks

$\endgroup$
1
  • 1
    $\begingroup$ The numbers in the second table are insufficient to calculate a per-page average. You need an additional column with the total number of pages on each line. Then the average, of course, will be total pages divided by total time or, otherwise put, the average of average times weighted by the number of pages. $\endgroup$
    – dxiv
    Commented Mar 2, 2017 at 20:42

1 Answer 1

3
$\begingroup$

You need to do a weighted average. The weight in your case is number of pages. What that does for you is recalculate the time for all pages, and divide by the total number of pages $$Average=\frac{\sum_{DOC}AvgTime\times pages}{\sum_{DOC}pages}$$

$\endgroup$
1
  • $\begingroup$ Sweet! thanks @Andrei $\endgroup$ Commented Mar 2, 2017 at 21:03

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .