29
$\begingroup$

What differences or other criteria can be used to help decide between using overlap-add and overlap-save for filtering? Both overlap-add and overlap-save are described as algorithms for doing FFT based fast convolution of data streams with FIR filter kernels. What are the latency, computational efficiency or caching locality (etc.) differences, if any? Or are they the same?

$\endgroup$

2 Answers 2

31
$\begingroup$

Essentially, OS is slightly more efficient since it does not require the addition of the overlapping transients. However, you may want to use OA if you need to reuse the FFTs with zero-padding rather than repeated samples.

Here is a quick overview from an article I wrote a while ago

Fast convolution refers to the blockwise use of circular convolution to accomplish linear convolution. Fast convolution can be accomplished by OA or OS methods. OS is also known as “overlap- scrap” . In OA filtering, each signal data block contains only as many samples as allows circular convolution to be equivalent to linear convolution. The signal data block is zero-padded prior to the FFT to prevent the filter impulse response from “wrapping around” the end of the sequence. OA filtering adds the input-on transient from one block with the input-off transient from the previous block. In OS filtering, shown in Figure 1, no zero-padding is performed on the input data, thus the circular convolution is not equivalent to linear convolution. The portions that “wrap around” are useless and discarded. To compensate for this, the last part of the previous input block is used as the beginning of the next block. OS requires no addition of transients, making it faster than OA.

$\endgroup$
5
  • $\begingroup$ Great article! = ) $\endgroup$
    – Phonon
    Commented Jun 29, 2012 at 0:52
  • $\begingroup$ There may be some optimizations in the way the DFT over the zero-padded portion of the OA buffer is computed, that give an edge to the OA method. This would depend on your processor and FFT package. Also, you could feasibly write your own FFT algorithm specifically for the OA that takes into account the zero-pad. $\endgroup$
    – orodbhen
    Commented Oct 10, 2016 at 17:35
  • $\begingroup$ @orodbhen, do you know of any such FFT package? $\endgroup$ Commented Oct 10, 2016 at 18:14
  • $\begingroup$ @MarkBorgerding In OpenCV you can specify the number of zero rows, but that's specific to 2D. As far as what implicit optimizations are present in that or other FFT packages, I don't know. I can think of a lot of cases where a custom FFT to exploit sparseness would be helpful but I haven't gone down that road myself. Not yet. $\endgroup$
    – orodbhen
    Commented Oct 11, 2016 at 18:59
  • 2
    $\begingroup$ Good thing you quoted because the link is broken :( $\endgroup$
    – user541686
    Commented May 10, 2019 at 21:08
-1
$\begingroup$

OS efficiency can be tweaked even more by converting the filter impulse response from causal to non-causal. This Octave code snippet from Wikimedia shows two methods, as well as the usual causal-filter method:

  if edge_effects_at_end_of_filter_output
   %Rotate the filter coefficients
   %Simple way
   %  H = fft([h(M) zeros(1,N-M) h(1:M-1)]);  
   %Fancy way
      H = fft( circshift([h zeros(1,N-M)], -(M-1)) );
      Xb = (1:L);                       % indices of good output
  else
      H = fft(h,N);
      Xb = M-1 + (1:L);                 % indices of good output
  endif

The advantage is seen in the difference between the "indices of good output". Indices of $(1:L)$ mean that the inverse DFT output buffer of length $L+M-1$ can be positioned within the overall output array, overlapping the previous output segment by $M-1$. So $M-1$ previous outputs are simply overwritten (discarded), and there is no need to copy the $L$ "good" outputs from one buffer to another.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.