41
$\begingroup$

What would be the ideal way to find the mean and standard deviation of a signal for a real time application. I'd like to be able to trigger a controller when a signal was more than 3 standard deviation off of the mean for a certain amount of time.

I'm assuming a dedicated DSP would do this pretty readily, but is there any "shortcut" that may not require something so complicated?

$\endgroup$
4
  • $\begingroup$ Do you know anything about the signal? Is it stationary? $\endgroup$
    – user42
    Commented Dec 3, 2011 at 7:49
  • $\begingroup$ @Tim Let's say that it's stationary. For my own curiosity, what would be the ramifications of a non-stationary signal? $\endgroup$
    – jonsca
    Commented Dec 3, 2011 at 8:39
  • 4
    $\begingroup$ If it's stationary, you could simply compute a running mean and standard deviation. Things would be more complicated if the mean and standard deviation varied with time. $\endgroup$
    – user42
    Commented Dec 3, 2011 at 9:16
  • 6
    $\begingroup$ Very related: en.wikipedia.org/wiki/… $\endgroup$ Commented Dec 4, 2011 at 12:34

5 Answers 5

44
$\begingroup$

There's a flaw in Jason R's answer, which is discussed in Knuth's "Art of Computer Programming" vol. 2. The problem comes if you have a standard deviation which is a small fraction of the mean: the calculation of E(x^2) - (E(x)^2) suffers from severe sensitivity to floating point rounding errors.

You can even try this yourself in a Python script:

ofs = 1e9
A = [ofs+x for x in [1,-1,2,3,0,4.02,5]] 
A2 = [x*x for x in A]
(sum(A2)/len(A))-(sum(A)/len(A))**2

I get -128.0 as an answer, which clearly isn't computationally valid, since the math predicts that the result should be nonnegative.

Knuth cites an approach (I don't remember the name of the inventor) for calculating running mean and standard deviation which goes something like this:

 initialize:
    m = 0;
    S = 0;
    n = 0;

 for each incoming sample x:
    prev_mean = m;
    n = n + 1;
    m = m + (x-m)/n;
    S = S + (x-m)*(x-prev_mean);

and then after each step, the value of m is the mean, and the standard deviation can be calculated as sqrt(S/n) or sqrt(S/n-1) depending on which is your favorite definition of standard deviation.

The equation I write above is slightly different than the one in Knuth, but it's computationally equivalent.

When I have a few more minutes, I'll code up the above formula in Python and show that you'll get a nonnegative answer (that hopefully is close to the correct value).


update: here it is.

test1.py:

import math

def stats(x):
  n = 0
  S = 0.0
  m = 0.0
  for x_i in x:
    n = n + 1
    m_prev = m
    m = m + (x_i - m) / n
    S = S + (x_i - m) * (x_i - m_prev)
  return {'mean': m, 'variance': S/n}

def naive_stats(x):
  S1 = sum(x)
  n = len(x)
  S2 = sum([x_i**2 for x_i in x])
  return {'mean': S1/n, 'variance': (S2/n - (S1/n)**2) }

x1 = [1,-1,2,3,0,4.02,5] 
x2 = [x+1e9 for x in x1]

print "naive_stats:"
print naive_stats(x1)
print naive_stats(x2)

print "stats:"
print stats(x1)
print stats(x2)

result:

naive_stats:
{'variance': 4.0114775510204073, 'mean': 2.0028571428571427}
{'variance': -128.0, 'mean': 1000000002.0028572}
stats:
{'variance': 4.0114775510204073, 'mean': 2.0028571428571431}
{'variance': 4.0114775868357446, 'mean': 1000000002.0028571}

You'll note that there's still some rounding error, but it's not bad, whereas naive_stats just pukes.


edit: Just noticed Belisarius's comment citing Wikipedia which does mention the Knuth algorithm.

$\endgroup$
6
  • 1
    $\begingroup$ +1 for the detailed answer with example code. This approach is superior to the one indicated in my answer when a floating-point implementation is needed. $\endgroup$
    – Jason R
    Commented Jan 20, 2012 at 6:46
  • 1
    $\begingroup$ One might also check this for a C++ implementation: johndcook.com/standard_deviation.html $\endgroup$ Commented May 13, 2013 at 16:43
  • 1
    $\begingroup$ yep, that's it. He uses the exact equations Knuth uses. You can optimize somewhat and avoid having to check for initial iteration vs. subsequent iterations if you use my method. $\endgroup$
    – Jason S
    Commented May 13, 2013 at 17:41
  • 1
    $\begingroup$ "Knuth cites an approach (I don't remember the name of the inventor) for calculating running mean" -- it's Welford's method, by the way. $\endgroup$
    – Jason S
    Commented Mar 24, 2016 at 17:30
  • 1
    $\begingroup$ Based on this, you can imagine a single-line JavaScript function that calculates average: const getAvg = values => values.reduce((m, x, i) => m + (x - m) / (i + 1), 0) $\endgroup$ Commented Jun 9, 2022 at 19:38
16
$\begingroup$

What would be the ideal way to find the mean and standard deviation of a signal for a real time application. I'd like to be able to trigger a controller when a signal was more than 3 standard deviation off of the mean for a certain amount of time.

The right approach in situations like this is typically to compute an exponentially weighted running average and standard deviation. In the exponentially weighted average, the estimates of the mean and variance are biased towards the most recent sample giving you estimates of the mean and variance over the last $\tau$ seconds, which is probably what you want, rather than the usual arithmetic average over all samples ever seen.

In the frequency domain, an "exponentially weighted running average" is simply a real pole. It is simple to implement in the time domain.

Time domain implementation

Let mean and meansq be the current estimates of the mean and mean of the square of the signal. On every cycle, update these estimates with the new sample x:

% update the estimate of the mean and the mean square:
mean = (1-a)*mean + a*x
meansq = (1-a)*meansq + a*(x^2)

% calculate the estimate of the variance:
var = meansq - mean^2;

% and, if you want standard deviation:
std = sqrt(var);

Here $0 < a < 1$ is a constant that determines the effective length of the running average. How to choose $a$ is described below in "analysis".

What is expressed above as an imperative program may also be depicted as a signal-flow diagram:

enter image description here

Analysis

The above algorithm computes $y_i = a x_i + (1-a) y_{i-1}$ where $x_i$ is the input at sample $i$, and $y_i$ is the output (i.e. estimate of the mean). This is a simple, single-pole IIR filter. Taking the $z$ transform, we find the transfer function $$H(z) = \frac{a}{1-(1-a)z^{-1}}$$.

Condensing the IIR filters into their own blocks, the diagram now looks like this:

enter image description here

To go to the continuous domain, we make the substitution $z = e^{s T}$ where $T$ is the sample time and $f_s = 1/T$ is the sample rate. Solving $1-(1-a)e^{-sT}=0$, we find that the continuous system has a pole at $s = \frac{1}{T} \log (1-a)$.

Choose $a$: $$ a = 1 - \exp \left\{2\pi\frac{T}{\tau}\right\}$$

References

$\endgroup$
7
  • 1
    $\begingroup$ Could you explain how $a$ determines the length of the running average? And what value of $a$ should be used? The specification 0 > a > 1 is impossible to meet. $\endgroup$ Commented Jan 20, 2012 at 2:02
  • $\begingroup$ This is similar to Jason R's approach. This method will be less accurate but a little faster and lower on memory. This approach ends up using an exponential window. $\endgroup$
    – schnarf
    Commented Jan 20, 2012 at 4:44
  • $\begingroup$ Woops! Of course I meant 0 < a < 1. If your system has sampling tmie T and you'd like an averaging time constant tau, then choose a = 1 - exp (2*pi*T/tau). $\endgroup$
    – nibot
    Commented Feb 29, 2012 at 20:51
  • $\begingroup$ I think there may be a mistake in here. The single-pole filters don't have 0 dB gain at DC and since you are applying one filter in the linear domain and one in the squared domain the gain error is different for E<x> and E<x^2>. I'll elaborate more in my answer $\endgroup$
    – Hilmar
    Commented Mar 1, 2012 at 13:22
  • $\begingroup$ It does have 0 dB gain at DC. Substitute z=1 (DC) into H(z) = a/(1-(1-a)/z) and you get 1. $\endgroup$
    – nibot
    Commented Mar 1, 2012 at 13:27
7
$\begingroup$

A method I've used before in an embedded processing application is to maintain accumulators of the sum and sum-of-squares of the signal of interest:

$$ A_{x,i} = \sum_{k=0}^{i}x[k] = A_{x,i-1} + x[i], A_{x,-1} = 0 $$

$$ A_{x^2,i} = \sum_{k=0}^{i}x^2[k] = A_{x^2,i-1} + x^2[i], A_{x^2,-1} = 0 $$

Also, keep track of the current time instant $i$ in the above equations (that is, note the number of samples that you've added into the accumulators). Then, the sample mean and standard deviation at time $i$ are:

$$ \tilde\mu = \frac{A_{x_i}}{i+1} $$

$$ \tilde\sigma = \sqrt{\frac{A_{x^2_i}}{i+1} - \tilde\mu^2} $$

or you can use:

$$ \tilde\sigma = \sqrt{\frac{A_{x^2_i}}{i} - \tilde\mu^2} $$

depending upon which standard deviation estimation method you prefer. These equations are based on the definition of the variance:

$$ \sigma^2 = \operatorname{E}(X^2) - (\operatorname{E}(X))^2 $$

I've used these successfully in the past (although I was only concerned with variance estimation, not standard deviation), although you do have to be careful about the numeric types you use to hold the accumulators if you're going to be summing over a long period of time; you don't want overflow.

Edit: In addition to the above comment on overflow, it should be noted that this is not a numerically robust algorithm when implemented in floating-point arithmetic, potentially causing large errors in the estimated statistics. Look at Jason S's answer for a better approach in that case.

$\endgroup$
9
  • 1
    $\begingroup$ Maybe rewriting it as $A_{x,i}=x[i]+A_{x,i-1},\ A_{x,0}=x[0]$ will make it clear that you need to add only two numbers at each step, lest someone implement it as summing all $i$ elements of $x$ at each step. $\endgroup$ Commented Dec 6, 2011 at 15:18
  • $\begingroup$ Yes, that's better. I tried to rewrite to make the recursive implementation more clear. $\endgroup$
    – Jason R
    Commented Dec 6, 2011 at 16:12
  • 2
    $\begingroup$ -1 when I have enough rep to do so: this has numerical problems. See Knuth vol. 2 $\endgroup$
    – Jason S
    Commented Jan 20, 2012 at 0:23
  • $\begingroup$ There seem to be a couple of typos here. Why is the mean being subtracted under the square root sign for $\sigma$? it should be $\mu^2$ to match the displayed equation $\sigma^2 = E(X^2) - (E(X))^2$, no? Also, though I won't vote down this answer, I agree with Jason S that there can be numerical problems in this approach. $\endgroup$ Commented Jan 20, 2012 at 1:33
  • 2
    $\begingroup$ @JasonS: I would disagree that the technique is inherently flawed, although I agree with your point that it is not a numerically robust method when implemented in floating point. I should have been more clear that I've used this successfully in an application that used integer arithmetic. Integer (or fixed-point implementations of fractional numbers) arithmetic doesn't suffer from the issue you pointed out that causes loss of precision. In that context, it is a suitable method that requires fewer operations per sample. $\endgroup$
    – Jason R
    Commented Jan 20, 2012 at 6:45
4
$\begingroup$

Jason and Nibot's answer differ in one important aspect: Jason's method calculates the std dev and mean for the the whole signal (since y = 0), while Nibot's is a "running" calculation, i.e. it weighs more recent samples stronger than samples from the distant past.

Since the application seems to require std dev and mean as a function of time, Nibot's method is probably the more appropriate one (for this specific application). However, the real tricky part will be to get the time weighting part right. Nibot's example uses a simple single pole filter.

The proper way to describe this is to that we get an estimate of $E[x]$ by filtering $x[n]$ and an estimate for $E[x^2]$ by filtering $x[n]^2$. Estimation filters are typically low pass filters. These filters should be scaled to have 0dB gain at 0 Hz. Otherwise there is a constant gain error.

The choice of lowpass filter can be guided by what you know about your signal and the time resolution you need for your estimation. Lower cutoff frequencies and higher order will result in better accuracy but slower response time.

To complicate things further one filter is applied in the linear domain and another in the squared domain. Squaring significantly changes the spectral content of the signal so you may want to use a different filter in the squared domain.

Here is an example on how to estimate mean, rms and std dev as a function of time.

%% example
fs = 44100; n = fs; % 44.1 kHz sample rate, 1 second
% signal: white noise plus a low frequency drift at 5 Hz)
x = randn(n,1) + sin(2*pi*(0:n-1)'*5/fs);
% mean estimation filter: since we are looking for effects in the 5 Hz range we use maybe a
% 25 Hz filter, 2nd order so it's not too sluggish
[b,a] = butter(2,25*2/fs);
xmeanEst = filter(b,a,x);
% now we estimate x^2, since most frequency double we use twice the bandwidth
[b,a] = butter(2,50*2/fs);
x2Est = filter(b,a,x.^2);
% std deviation estimate
xstd = sqrt(x2Est)-xmeanEst;
% and plot it
h = plot([x, xmeanEst sqrt(x2Est) xstd]);
grid on;
legend('x','E<x>','sqrt(E<x^2>)','Std dev');
set(h(2:4),'Linewidth',2);
$\endgroup$
4
  • 1
    $\begingroup$ The filter in my answer corresponds to y1 = filter(a,[1 (1-a)],x);. $\endgroup$
    – nibot
    Commented Mar 1, 2012 at 14:05
  • 2
    $\begingroup$ Good point on the distinction between the running statistics and the statistics of the overall sample. My implementation could be modified to compute running statistics by accumulating over a moving window, which can be done efficiently also (at each time step, subtract the time sample that just slid out of the window from each accumulator). $\endgroup$
    – Jason R
    Commented Mar 1, 2012 at 14:13
  • $\begingroup$ nibot, sorry you are right and I was wrong. I'll correct this right away $\endgroup$
    – Hilmar
    Commented Mar 1, 2012 at 17:52
  • 1
    $\begingroup$ +1 for suggesting different filtering for x and x^2 $\endgroup$
    – nibot
    Commented Mar 1, 2012 at 20:57
3
$\begingroup$

Similar to the preferred answer above (Jason S.), and also derived from the formula taken from Knut (Vol.2, p 232), one can also derive a formula to replace a value, i.e. remove and add a value in one step. According to my tests, replace delivers better precision than the two-step remove/add version.

The code below is in Java, mean and s get updated ("global" member variables), same as m and s above in Jason's post. The value count refers to the window size n.

/**
 * Replaces the value {@code x} currently present in this sample with the
 * new value {@code y}. In a sliding window, {@code x} is the value that
 * drops out and {@code y} is the new value entering the window. The sample
 * count remains constant with this operation.
 * 
 * @param x
 *            the value to remove
 * @param y
 *            the value to add
 */
public void replace(double x, double y) {
    final double deltaYX = y - x;
    final double deltaX = x - mean;
    final double deltaY = y - mean;
    mean = mean + deltaYX / count;
    final double deltaYp = y - mean;
    final double countMinus1 = count - 1;
    s = s - count / countMinus1 * (deltaX * deltaX - deltaY * deltaYp) - deltaYX * deltaYp / countMinus1;
}
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.