1
$\begingroup$

Many podcast apps allow you to listen to podcasts faster than the speed at which they were recorded (typically at x1.25, x1.5, x1.75, and x2 speeds).

If these apps are simply replacing the sound's waveform $A(t)$ by $A(k t)$, where $k$ is the nominal speed multiplier, than the Fourier transform $\tilde{A}(\omega)$ changes to $\frac{1}{k} \tilde{A}(\omega/k)$, so the entire frequency spectrum gets shifted higher by a constant proportion. Since the interval between two notes (in octaves) is the base-2 log of their frequency ratio, this means that every note gets shifted up in pitch by the same interval. Specifically, for playback at x1.25 speed this would increase the pitch of every note by about a major third (exactly a major third in just intonation), for x1.5 speed by about a perfect fifth, for x1.75 speed by a little less than a minor seventh, and for x2 speed by exactly one octave.

When you fast-forward an audiocassette tape, you do indeed get the "Alvin and the Chipmunks" phenomenon that all the sounds are dramatically higher in pitch. But when I listen to a podcast on higher speed, the voices don't sound noticeably higher to me. They just sound like people talking faster than usual, but at the same pitch. (I.e. it sounds like fewer periods of sound oscillation fit into each spoken word, but the instantaneous frequency sounds about the same. Of course, I can't directly hear the large number of sound oscillations over a human-scale interval.)

It's a little hard for me to separate out the effects of people talking fast from the pitch of their voice - they certainly don't sound normal when played at x2 speed - but I would have thought that an increase in pitch of one octave would be clearly noticeable.

Is it just the case that x2 isn't enough of a speedup for the higher pitch to be clearly audible? Or are these podcast apps doing some fancy sound processing where they somehow keep the instantaneous pitch the same but speed up the audible speaking rate? It seems to me that this would require some kind of separation of time scales, where they find a way to compress the waveform's "slow envelope" (the audible speed at which whole phonemes and words are being formed) while leaving the instantaneous "fast frequency" (the instantaneous pitch) unchanged. Do podcast apps do something like this?

$\endgroup$
7
  • $\begingroup$ I noticed and tested this on Substack's build-in podcast player, but I think I've also noticed it on Apple Podcasts and various web sites' unnamed podcast players. $\endgroup$
    – tparker
    Commented Apr 25, 2023 at 0:54
  • 3
    $\begingroup$ Welcome to digital sound processing… $\endgroup$
    – Jon Custer
    Commented Apr 25, 2023 at 1:00
  • 4
    $\begingroup$ Also: en.wikipedia.org/wiki/Audio_time_stretching_and_pitch_scaling This is a computing question, not physics. EE SE probably has someone who knows more. $\endgroup$
    – DKNguyen
    Commented Apr 25, 2023 at 1:25
  • 1
    $\begingroup$ It used to be like that. That was how we had Alvin and the Chipmunks. The kind people here who know about this have already tried to tell you that we are now compensating for as much of the effects that you correctly point out that would be distorted. $\endgroup$ Commented Apr 25, 2023 at 2:34
  • 6
    $\begingroup$ Signal Processing SE is probably the best place for this if you want to know the details of how this is done. $\endgroup$
    – Puk
    Commented Apr 25, 2023 at 3:12

1 Answer 1

7
$\begingroup$

The podcast apps do indeed use a digital sound processing system that increases the playback speed while simultaneously dividing down the playback pitch. This yields normal-sounding speech that is sped up.

Rush Limbaugh used to use this "speed-talk" speech processing technique in his broadcast shows to "make room" for commercial breaks while still delivering his full monologue within the time limits of his program.

This is also how the fine print disclaimers get squeezed into the end of a commercial message. In this case, the speedup is so great as to be right at the limit of intelligibility- while still meeting the legal requirement of airing the disclaimer message.

This technique is also used in music processing software to do pitch correction without affecting the length of a song, or to slow down a song to facilitate learning it while maintaining proper pitch.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.