132
$\begingroup$

Is there a way to incrementally calculate (or estimate) the average of a vector (a set of numbers) without knowing their count in advance?

For example you have a = [4 6 3 9 4 12 4 18] and you want to get an estimate of the average but you don't have all the values at hand so you want to have a running average without keeping all the previous values available.

$\endgroup$
2

5 Answers 5

149
$\begingroup$

You need to keep at least two pieces of information: the number of terms so far and the running mean (or something equivalent to it).

Let's suppose the $n$th component of the vector is $a_n$ and the running mean up to this is $m_n$ so $$m_n= \frac{1}{n}\sum_{i=1}^n a_i.$$

Starting with $m_0=0$, you can use the obvious single pass $$m_n = \frac{(n-1)m_{n-1}+a_n}{n}$$ but precision errors are likely to be smaller if you use the equivalent $$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}.$$

$\endgroup$
11
  • 1
    $\begingroup$ Can you elaborate on why precision errors are smaller in the second expression? Does the first expression contain a subtraction of nearly equal values? $\endgroup$
    – Paul
    Commented Nov 18, 2013 at 16:02
  • 5
    $\begingroup$ If $n$ is large, it is likely that $(n-1)m_{n-1}$ will be much larger than $a_n$, so making the addition less precise $\endgroup$
    – Henry
    Commented Nov 18, 2013 at 19:58
  • 1
    $\begingroup$ What if n is so big that (using finite precision float point numbers) $$\frac{a_n - m_{n-1}}{n}$$ gives us zero? $\endgroup$
    – castarco
    Commented Jan 8, 2014 at 18:57
  • 3
    $\begingroup$ @castarco Suppose you have $n-k$ terms with mean $m_{n-k}$ and $k$ terms with mean $m_k$ with $0 \ll k \ll n$. Then your new combined mean would be $m_{n-k}+ \dfrac{k}{n}(m_k -m_{n-k})$. $\endgroup$
    – Henry
    Commented Jan 9, 2014 at 16:49
  • 3
    $\begingroup$ @Adam: something like $ m_n = m_{n-1} + \dfrac{w_n(a_{n}-m_{n-1})}{w_n + \sum_{i=1}^{n-1} w_i}$ and instead of keeping the number of terms so far you need to keep the sum of weights so far $\endgroup$
    – Henry
    Commented Jun 2, 2017 at 6:55
27
$\begingroup$

Intuition

It's an easy question and @Henry basically answered it. However, I think it would be nice to add some intuition on the second equation:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

The idea is to represent the new value $x_k$ value by a part that is equal to the previous mean $\mu_{k-1}$ plus the remaining part $x_k-\mu_{k-1}$. The new mean $\mu_k$ we then contains $k$ times the previous mean and one time the remaining part of the new value, divided by the total count. Okay, so how to get there?

Derivation

Let's say you already have the mean $\mu_{k-1}$ for the elements $x_0,...,x_{k-1}$. Of course, we can easily incorporate an additional value $x_k$ by undoing the division, adding the new value, and redoing the scaling (but the new count):

$$\mu_k=\frac{(k-1)*\mu_{k-1}+x_k}{k}$$

Now, let's represent the new value by the previous mean and the difference to the previous mean:

$$x_k=\mu_{k-1}+(x_k-\mu_{k-1})$$

Putting that into our first equation:

$$\mu_k=\frac{(k-1)*\mu_{k-1}+\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

Look how we now got $k$ times the previous mean:

$$\mu_k=\frac{k*\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

The nice thing is that we don't have to undo and redo the division on the previous mean anymore:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

As you can see, we still have to do the division on the difference of the new value to the previous mean. Since the new value comes in "sum-space" already, we don't have to undo anything on it, just divde by the total count.

Application

An application of this form of incremental mean can be found in the field of reinforcement learning where a value function is averaged over many experienced rewards. In this setting, you can think of $x_k-\mu_{k-1}$ as a temporal-difference error term. Since we usually don't know the total number of experiences here, we multiply by a small learning rate rather than dividing by $k$.

$\endgroup$
1
  • 3
    $\begingroup$ This substitution of $x_k$ is contrived. Just apply the distributive law. $\endgroup$ Commented Feb 21, 2023 at 17:30
14
$\begingroup$

yep easy to do - I call it the travelling mean :)

Note you will need to keep track of the 1. 'count' of values, 2. previous 'mean' and 3. the 'new value'. Algorithm is:

in words : ('previous mean' * '(count -1)') + 'new value') / 'count'

benefit you have a running mean can do the same with 'standard deviation' just a little more complex

$\endgroup$
2
  • 1
    $\begingroup$ but this has the same problem as above right? Where if n becomes very large the (previous mean) * (count -1) it 'could' cause an overlfow (in computers) $\endgroup$ Commented Apr 15, 2020 at 8:29
  • 1
    $\begingroup$ @CharlesHarring Exactly. If mean * count doesn't overflow your data type then a simple running total won't overflow as well, and will be more accurate to boot. $\endgroup$
    – Boris B.
    Commented Sep 14, 2020 at 23:36
7
$\begingroup$

Here is the general solution to the problem. We start calculating average from (m+1)-th element up to (n+1)-th element of the incoming data.
Giving that:

  • $a_{n}$ is the n-th element of incoming data,
  • $a_{m}$ is the element from which we start averaging,
  • $\widehat{a_{n-m}}$ is the avarage from m-th to n-th element,
  • $\widehat{a_{n+1-(m+1)}}$ is the avarage from (m+1)-th to (n+1)-th element

So, if we initially have $a_{m}, a_{m+1}, a_{m+2}, \ldots, a_{n}$, an average of $n-m$ elements can be easily calculated. That is $$ \widehat{a_{n-m}} = \frac{a_{m}+a_{m+1}+a_{m+2}+\ldots+a_{n}}{n-m} $$

Next, when the $n+1$ elementh comes, we want to obtain average calculated of the same number of elements that is from $a_{m+1}$ to $a_{n+1}$:
$$ \widehat{a_{(n+1)-(m+1)}} = \frac{a_{m+1} + a_{m+2} + \ldots + a_{n} + a_{n+1}}{(n+1) - (m+1)} $$

Then from first equation $$ \widehat{a_{n-m}}(n-m) - a_{m} = a_{m+1}+a_{m+2}+\ldots+a_{n} $$

Substituting this to equation 2 $$ \widehat{a_{(n+1)-(m+1)}} = \frac{\widehat{a_{n-m}}(n-m) - a_{m} + a_{n+1}}{n-m} $$

In order to dynamicaly calculate average for new element we need previous average $\widehat{a_{n-m}}$, first element of the previous average $a_{m}$, number of the elements we include in the average $n-m$.
I hope you will find this solution inspiring. Please write if you find any errors in the above reasoning.

$\endgroup$
5
  • $\begingroup$ I have to mention, that I didn't manage to find any practical appliaction for this avarage. I thought about using it for calculating complexity-efficient avarage with big ammount of data (let sat n-m = 1000), but in the end I gave up. If you know how to store $n-m$th element (for $a_{m}$) without using $n-m$th element array, please let me know. $\endgroup$ Commented Feb 10, 2014 at 16:18
  • $\begingroup$ So, what happens in the case you are going one step at a time and don't want to store any past information, like you have the previous average, now you have one new element? what else you need to keep, total count of elements and the first one? $\endgroup$
    – Ali
    Commented Feb 10, 2014 at 16:20
  • $\begingroup$ Yes. You need at least to know how many elements you include in the average, the first element in the average and the average from the previous step. Note please, that $n-m$, that is the total count of elements in the average can be a constant number. So at least one piece of information you have given "for free" - you don't have to calculate it when new data arrives. $\endgroup$ Commented Feb 12, 2014 at 16:09
  • $\begingroup$ @MichalWolodzko This seems like a very dumb objection, but I'm just working through the equations in my head. Let's say that for the first equation, we want the average from n = 7 to m = 5. That gives three terms in the numerator (a_5, a_6, and a_7) and a denominator of 7 - 5 = 2. This doesn't seem right - surely the denominator must be the number of samples being averaged. Am I missing something obvious? $\endgroup$ Commented Aug 28, 2019 at 14:21
  • 3
    $\begingroup$ By the way, I have a practical application for this formula. I'm doing moving average calculations in a real-time system where I want to incrementally update the moving average as old values expire out of the averaging window, without recalculating the entire average each time a new value arrives. $\endgroup$ Commented Aug 28, 2019 at 14:23
0
$\begingroup$

Here is a solution similar to @Michal Wolodzko but with a constant batch size $w$. I developed this because I need to take an average of a large sequence of high resolution images. With this method, I can load $w$ images at a time and update the running average of the images. This is useful for computing the a "background" state which can be used for further image analysis. The following derivation is for a sequence of numbers, but you can easily generalize it to a sequence of images with pixel intensities stored in $m \times n$ arrays.

Consider a sequence of numbers $x_0,...,x_{N-1}$.

Define the following:

  • $N$ length of the sequence
  • $w$ width or batch size
  • $N_b = \text{floor }(N/w)$ number of batches
  • $N_r = \mod(N/w)$ remaining elements that don't fit evenly into a batch
  • $\overline{S}_k = \frac{1}{w}\sum_{i=kw}^{(k+1)w-1} x_i$ average over batch $k$ where $0\leq k \leq N_b-1$
  • $\overline{S}_{r} = \frac{1}{N_r}\sum_{i=Nw}^{N-1} x_i$ average over the remaining elements
  • $\overline{S}_{0:k} = \frac{1}{k}\sum_{i=0}^{k-1} \overline{S}_k$ average of the batch averages
  • $\overline{S} = \frac{1}{N}\sum_{i=0}^{N-1}x_i$ the average over all the elements

Note that $$ \begin{align*} \overline{S}_{0:k} &= \frac{1}{k} \sum_{i=0}^{k-1} \overline{S}_k\\ &= \frac{1}{kw}\left(\overline{S}_0w + \overline{S_1}w + \cdots + \overline{S}_{k-1}w\right)\\ &= \text{average of $\{x_i\}$ up to element $kw$} \end{align*} $$ so we can get the running average $\overline{S}_{0:k}$ by just averaging the batch averages.

Now, we don't want to store all the batch averages. Instead, we want to update the running average with each newly computed batch average. Consider $$ \begin{align*} \overline{S}_{0:k+1}&=\frac{1}{k+1}\sum_{i=0}^k \overline{S}_i\\ &= \frac{1}{k+1}\left(\sum_{i=0}^{k-1} \overline{S_i} + \overline{S_k}\right)\\ &= \frac{1}{k+1}\left(\overline{S}_{0:k}k + \overline{S_k}\right)\\ &= \frac{k}{k+1}\overline{S}_{0:k} + \frac{1}{k+1}\overline{S_k} \end{align*} $$ Thus we can just update the running average by taking a weighted average of what we've computed so far. The last step is to compute update the average with the awkward number of remaining elements $N_r$. $$ \begin{align*} \overline{S} &= \frac{1}{N}\left(\overline{S}_{0:N_b-1}N_bw + \overline{S}_rN_r\right)\\ &= \frac{N_bw}{N}\overline{S}_{0:N_b-1} + \frac{N_r}{N}\overline{S}_r \end{align*} $$ Overall, this method is nice because we have no risk of overflow (computing an excessively large sum) and no need to load all the numbers (or images) into memory at once.

Here's some python code demonstrating the method.

import numpy as np
import math

# replace this with a sequence of images
# only need to load (w) at a time
x=np.random.rand(37)

w=5 # batch size
N=len(x) # length of sequence
Nb=math.floor(N/w) # number of batch averages
Nr=N%w # remaining elements to average

Srun=0 # running average
for k in range(0,Nb):
    Sk=np.mean(x[k*w:(k+1)*w]) # batch average
    Srun=Srun*(k/(k+1))+Sk/(k+1) # updated running average
Sr = np.mean(x[Nb*w:]) # average over the remaining elements
Srun=Srun*(Nb*w/N)+Sr*(Nr/N) # final running average

# compare the running average
print('Numpy: ', x.mean())
print('Running: ', Srun)

with output

Numpy:  0.4537805990791784
Running:  0.45378059907917856
$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .