11
$\begingroup$

For STFT, we impose window of certain size onto the original signal, then we perform fft on each window. The uncertanty about frequency and time is determined by the width of the window, however, I can't understand what is the point of having overlap windows...

If we have a signal, for instance, why can't we just divide the signal into 6 trunks (non-overlapping window), and then we perform fft on each of those trunks?

Maybe let me make it more clearly in my application. I am going to mostly dealing with 60Hz power line wave, and occationally, we want to monitor the 180Hz transient effect at the power line. Since the signal will be mostly periodic, should I use window then?

$\endgroup$
5
  • $\begingroup$ Because it's a convolution: Equivalence between "windowed Fourier transform" and STFT as convolutions/filtering $\endgroup$ Commented Jun 2, 2023 at 19:49
  • $\begingroup$ Naw, if it's for analysis only and not analysis-synthesis, there really isn't any convolution going on except in the Goertzel sense of the word. If you're doing fast convolution that's sorta related by not directly the same thing as the STFT although you could do overlap-add fast convolution using a complementary Hann window on the input. But it wouldn't be as efficient as regular-old fast convolution. $\endgroup$ Commented Jun 3, 2023 at 0:46
  • $\begingroup$ @robertbristow-johnson You need to stop talking smack behind people's backs. STFT is fully equivalent to bandpass convolutions (rather cross-correlations), and I proved it in code in the linked post, and it can be proven mathematically fairly easily. Not just equivalent but more accurate in time-frequency. If you're not after time-frequency and have alternate uses, that's fair, but STFT is by chief motivation time-frequency. Again, gatekeeping terminology like a politician - looks much like the beef with Jazz. The way you object more impartially is, "not necessarily convolution". $\endgroup$ Commented Jun 5, 2023 at 9:57
  • $\begingroup$ //"You need to stop talking smack behind people's backs."// - - - what smack behind whose back? please explain. - - - //" STFT is fully equivalent to bandpass convolutions"// - - - as is the Goertzel algorithm. - - - //" (rather cross-correlations)"// difference between cross-correlation and convolution is time-reversal of one of them. $\endgroup$ Commented Jun 5, 2023 at 15:36
  • $\begingroup$ @robertbristow-johnson Arguing against others without notifying them. I also saw the periodicity jab with Hilmar, but that doesn't matter, it just adds up with this and two other @-less replies. -- Ok, I thought Goertzel is some sarcasm reference. It's sort of worse because it implies, even after seeing my linked post, you don't get the role of convolutions, or how time-frequency works. STFT can be evaluated at any frequency, FFT is just for speed and convenient inversion. -- If you think conv vs CC has just to do with compute here, you're strongly mistaken. $\endgroup$ Commented Jun 9, 2023 at 11:50

3 Answers 3

12
$\begingroup$
  1. We always want to apply some kind of a window function in order to minimize the effect of leakage. This makes rectangular window (lack of any windowing) case never used, this is why:

  2. Any tapering function used is almost always decreasing to zero at boundaries. enter image description here

This is why we are losing some data. In order to retrieve that somehow you will usually do 50% of overlap when processing. This will retrieve whatever was in between.

enter image description here

  1. Another thing is that if you apply the Inverse STFT, you should use complementary window, that is summing to 1, i.e. Hanning with 50%.

Finalising - yes, you should pretty much always use windowing in your applications.

For more comprehensive informations please refer to great white-paper:

Heinzel G. - Spectrum and spectral density estimation by the DFT, including a comprehensive list of window functions and some new flat-top windows

$\endgroup$
7
  • $\begingroup$ Thank you very much for your help! The matlab algorithm that I am looking at actually start 1st window (with its centre at the beginning of the time axis, but in your drawing it is the detail of first window. Can I assume that the reason they do this because they don't want to lose the information at the beginning of the time axis ? $\endgroup$
    – kuku
    Commented Nov 25, 2014 at 16:00
  • $\begingroup$ Picture is only exemplary. You can see that there are no values on time axis. Way you describe is perfectly OK. Anyway it's just an addition of one extra frame. $\endgroup$
    – jojeck
    Commented Nov 25, 2014 at 16:59
  • $\begingroup$ Read the paper. Read the paper. Read the paper. Section 12. That is all. $\endgroup$
    – Andy Piper
    Commented Aug 20, 2019 at 17:11
  • $\begingroup$ @AndyPiper: you mean 10, right? $\endgroup$
    – jojeck
    Commented Aug 20, 2019 at 18:06
  • $\begingroup$ Section 12 is a cookbook for all of this stuff and the best place to start in my view. Section 10 is specifically about overlap. But it's all great :) $\endgroup$
    – Andy Piper
    Commented Aug 21, 2019 at 8:12
1
$\begingroup$

why overlapping the window?

Because otherwise loses information, a ton of it. STFT is equivalently convolutions (rather, cross-correlations) with windowed complex sinusoids, i.e. bandpass filtering. For:

  • Spectrograms (modulus): the loss is greatest and in every sense. Otherwise, with maximum overlap, the STFT is invertible within a global phase shift, which is a strong inversion, unlike DFT/FFT modulus. This strong inversion, and STFT's robustness properties, are exclusively enabled by overlapping in both domains.
  • Extracting phase/amplitude/frequency vs time: there's tremendous aliasing, worst for phase.
  • Non-time-frequency uses (phase vocoding, analysis/synthesis): without modulus, the STFT is perfectly invertible as long as hop_size <= window_size (NOLA), which makes a lot of algorithms possible, but said algorithms may still require analytic information (phase, amplitude, etc), which is aliased or not mapped.

hop_size or window_size - noverlap - gap between windows - is the subsampling factor along time. n_fft - size of frequency dimension - is inversely hop_size along frequency. Simple example:

The spectrogram hence wrongly suggests a pure sine where we have strong F.M., despite hop_len=64 STFT being perfectly invertible.

Further reading

$\endgroup$
7
  • 1
    $\begingroup$ Hay, O, are you Bruce S.? Just curious. $\endgroup$ Commented Jun 5, 2023 at 16:45
  • $\begingroup$ Who? You mean to tell me you never checked my profile or Github? I upsampled your face! $\endgroup$ Commented Jun 6, 2023 at 16:31
  • $\begingroup$ It was your NOLA link. I thought that might have been your NOLA paper. $\endgroup$ Commented Jun 6, 2023 at 16:38
  • $\begingroup$ Ah. Well the guy's 500 years old and I finished college like... not that long ago, be blasted my endless delays. I did manage to squeeze out a paper and go to war with all coauthors before it published. $\endgroup$ Commented Jun 6, 2023 at 16:44
  • $\begingroup$ So you're John? $\endgroup$ Commented Jun 6, 2023 at 17:12
0
$\begingroup$

You could think of a N-point windowed block DFT/FFT (STFT) as a set of N complex FIR filters (running convolution) where we keep only every Mth output. The question then becomes, not «why do we have overlap», but rather «why do we decimate filter outputs».

Because we can. Because we have sufficient information. And because even with the complexity reduction of using using FFTs, running a new one for each input sample would often add too much compute cost.

$\endgroup$
6
  • $\begingroup$ "Because we have sufficient information" No we don't. Title and body ask why there's overlap at all, not why there's decimation at all. Being invertible doesn't mean being useful. $\endgroup$ Commented Jun 6, 2023 at 16:35
  • $\begingroup$ Not sure that I understand where you are coming from there. Convolution is «maximum overlap». Back to back block processing is zero overlap. Partially overlapped block processing could be viewed as either decimated convolution, or overlapped block processing, both views are equally valid? $\endgroup$
    – Knut Inge
    Commented Jun 6, 2023 at 20:46
  • $\begingroup$ If I didn't have a post on this very Q&A that directly refutes your answer and answers the questions you're asking me, I'd respond differently. What you're asking is for me to put extra time, repeating what I said in a compressed form - "free work". That's neither reasonable, nor does it change the fact that, me or who else, put out easily comprehensible information that's pointed to you that you're not consulting. And generally the way to ask such questions where expecting an explanation makes sense is by opening a new post. $\endgroup$ Commented Jun 12, 2023 at 20:31
  • $\begingroup$ But ok, let's do this once: OP asks why overlap at all, meaning why not hop_size = window_size. You say "we have sufficient information". Amplitude, phase, frequency over time - gone. Spectrogram (not STFT) invertibility, gone. There's a ton that is lost, and there's always some loss with non-unity hop (and for spectrogram, even unity), and this loss can be measured. To imply it's just about compute cost is way off. Your answer goes from "incorrect" to "incomplete" by specifying you don't refer to max hop. It's also not "decimate filter outputs" but subsample/downsample; no extra filtering. $\endgroup$ Commented Jun 12, 2023 at 20:31
  • $\begingroup$ The term «decimation» does not always imply filtering, but convoluting by a N-sample convolutive filter (bank) instead of using an FFT you would have to do decimation in order to obtain equivalence. I can sketch it in Matlab code if my words are unclear? $\endgroup$
    – Knut Inge
    Commented Jun 12, 2023 at 20:54

Not the answer you're looking for? Browse other questions tagged or ask your own question.