36
$\begingroup$

Why can't you hear music well well over a telephone line?

I was asked this question in an interview for a university study placement and I unfortunately had no idea.

I was given the hint that the telephone sampling rate is 8000 samples per second.

$\endgroup$
2
  • 13
    $\begingroup$ @user13107: It's not related to hearing per se. It has to do with the technical limitations of the phone and the network itself. What is heard has nothing to do with cochlear for instance. $\endgroup$ Commented Mar 20, 2014 at 15:53
  • 2
    $\begingroup$ Danny, the pieces you added in the last couple edits would be more suitable as comments, not as part of the question. (Well really there's no need to link to an answer to this question in a comment on the question itself) Please don't put them into the question again. $\endgroup$
    – David Z
    Commented Mar 23, 2014 at 18:11

7 Answers 7

47
$\begingroup$

The hint given by the interviewer is a red herring. The limitation you're hearing has been part of the phone network since long before digital sampling had any part in the telephone system. And it applies even in a local phone call where the signal is never digitized.

It is related to the fact that the connection from a land-line phone in your house or office back to the "central office" of the phone company is essentially a continuous connection through a pair of wires. There's typically no active circuits such as amplifiers, repeaters, digitizers, or other electonics involved.

Given the technology of 100 years ago when the phone network was first designed, a connection of this length could really only carry a very limited bandwidth. The engineers who designed the network did numerous experiments to determine just what frequencies needed to be conveyed for people to understand each other's regular speech, and designed the network only to be sure those frequencies were transmitted. They didn't add any costly components to the system if they weren't needed to achieve this goal.

For example they might have used passive filters to "emphasize" high frequencies in circuits that were a bit longer (and so naturally tend to cut out the high frequencies) than average, or to cut off high frequencies in circuits that were shorter than average, to ensure all users get as much as possible the same quality of connections.

Later, when they started using multiplexing to connect multiple voice circuits through a single wire (for inter-city connections, for example), the limitted bandwidth allowed them to carry more connections on a single wire, and at that point the bandwidth limitation would have been deliberately enforced by filtering to ensure that conversations didn't cross-talk between each other.

Finally, when digital sampling and digital transmission was introduced into the network, the sampling theorem limitations discussed in the other answers came into play. Fortuitously, the bandwidth limitations introduced in the early days of analog telephone networks allowed digitization to be done at really low bitrates without degrading the signal quality below what it had been all along, and again this allows more conversations to be carried on a given wire in the network.

Edit

I want to summarize with a key point that I previously posted in a comment on another answer:

The digital sampling rate (and later, compression methods) used in digital telephony was chosen to match the characteristics of the analog phone network, not the other way around.

$\endgroup$
12
  • 8
    $\begingroup$ +1 for getting across that the 4kHz bandwidth of the line was already a property before digital. This allowed the first applications of digital signals to be used on the trunks. Remnants of that can be seen in the specifications for ISDN and T-1 services, where the available bit rates are suspicions multiples of 8-bit samples at 8 kHz. Those services were originally created for trunk lines, and when introduced long distance calls got better in quality due to the noise immunity of the digital signals compared to all the older analog solutions. $\endgroup$
    – RBerteig
    Commented Mar 19, 2014 at 21:08
  • 3
    $\begingroup$ "They didn't add any costly components to the system if they weren't needed to achieve this goal." Exactly. And for a long time, the microphone was a carbon-granule type, so that put a bit of a limit on achievable sound quality too. (None of which is the answer the OP was supposed to come up with, ref the hint he was given, but still . . ) $\endgroup$
    – peterG
    Commented Mar 19, 2014 at 23:15
  • $\begingroup$ @peterG, Ha ha, I didn't see that bit about the hint from the interviewer --- guess they were fishing for a particular answer, even if its not really the "true" answer. $\endgroup$
    – The Photon
    Commented Mar 19, 2014 at 23:54
  • $\begingroup$ Isn't the same network now carrying internet signals? With a bandwidth of the order of 10 Mbps we can easily stream not only good quality audio, but a full video! I do not understand why the phone quality has still to be so crappy. $\endgroup$
    – DarioP
    Commented Mar 20, 2014 at 9:21
  • 1
    $\begingroup$ @DannyRancher, because the question is why the telephone system is the way it is. And the reason why is the result of the historical progression of technology as I outlined in my answer. Furthermore, if you call your next door neighbor, there is very likely no digital processing being used on that call, but you still will not be able to transmit music well through that connection. I am emphasizing that the interviewer's hint is misleading as to why the phone system is the way it is. $\endgroup$
    – The Photon
    Commented Mar 26, 2014 at 1:39
26
$\begingroup$

According to Wikipedia the frequency range of the plain old telephone service is 300Hz to 3.4kHz. So any music you listen to will be missing the low frequencies and missing the high frequencies. If you remember back to the last time you heard hold music on the phone you'll probably remember that it sounded a bit muffled, but I have to say that it's still recognisable i.e. you can identify what music is being played. I'd be annoyed if my Hi-Fi sounded like that, but the music isn't totally mangled.

In my youth I used to be a Hi-Fi enthusiast, and the manufacturers' technical specs would boast that their equipment had a flat frequency spectrum from around 20Hz to 20kHz. The problem with reproducing this in a phone system is that as DisplayName mentions in their answer, to carry a frequency $f$ over a digital network requires a sampling frequency of at least $2f$ otherwise you get aliasing. Providing bandwidth costs money and reduces call capacity (i.e. fewer calls per optic fibre) so phone backbones use a sampling frequency of only 8kHz, and hence the highest permissible frequency is 4kHz. The upper limit is a bit lower than this because it's hard to engineer audio filters with very sharp cutoffs. The 3.4kHz limit I mentioned above is presumably to ensure that no frequency near 4kHz gets through.

Whether such a large frequency range is required for music playback is debatable. At a recent hearing checkup I was told I cannot hear anything above 12kHz (too many Black Sabbath gigs in my youth) but music on my Hi-Fi still sounds fine to me.

$\endgroup$
4
  • $\begingroup$ This limitation has been built into the phone system since before digital technology was used. Can you explain why? $\endgroup$
    – The Photon
    Commented Mar 19, 2014 at 20:33
  • 1
    $\begingroup$ I deleted an inappropriate comment and following discussion. $\endgroup$
    – David Z
    Commented Mar 19, 2014 at 20:41
  • 1
    $\begingroup$ @ThePhoton: good point, it's dangerously easy to forget the world hasn't always been digital. I won't update my answer though since you've given a thorough description. From an inauspicious start I think we now have an excellent set of answers to the question. $\endgroup$ Commented Mar 20, 2014 at 7:09
  • $\begingroup$ The requirement is not actually that a 3400Hz filter block anything above 4KHz, but rather that for any frequency f over 4KHz, the combined attenuation at f and 4000-f, be adequate; filter designers thus have about 1KHz of passband to play with, rather than just 500Hz. $\endgroup$
    – supercat
    Commented Mar 20, 2014 at 22:59
13
$\begingroup$

Have a look into the Nyquist theorem. The sampling frequency needs to be at least double the rate of the sampled frequency. I.e. that's why the human ear can hear up to ca. 20kHz and the CD samples at 44.1kHz.

Wikipedia Nyquist-Shannon Theorem

What do we hear instead if we do listen to (originally) 5 Hz to 20 kHz music through the phone? Is everything above 8 kHz simply gone or is there another effect? E.g., will 14 kHz be audible somehow (but differently) at 7 kHz?

Or in other words: "What is happening to the frequencies that are above the Nyquist threshold?"

The frequencies are missing. As simple as that. Not present. What our ear does instead is remember what should be there, based on experience. So when you talk to somebody, you know over the phone your brain adds what must be there. Still I noticed that the first time I did this my brain gave me the real info (lacking frequencies) and only later learned that it can just fake the rest, based on the knowledge of the voice of the opponent. See Wikipdedia:CELP which uses a similar approach for audio compression.

If you want to know more about the reasons of the 8kHz sampling rate you can again use wikipedia: Wikipedia:PSTN the standard used is G.711. Also Sampling Frequency and Human Speech, which I have not read yet, goes into what you need as a minimum for human speech including graphs and explanations. Finally you can look into Wikipedia:MP3 in order to understand psychoacoustics. Hint a beat masks things that come after it for example. So that stuff can be dropped, since you don't hear it and other nice things. :D

$\endgroup$
9
  • $\begingroup$ Could you possibly explain what we hear instead if we do listen to (originally) $5\text{ Hz}$ to $20\text{ kHz}$ music through the phone? Is everything above $8\text{ kHz}$ simply gone or is there another effect? E.g., will $14\text{ kHz}$ be audible somehow (but differently) at $7\text{ kHz}$? $\endgroup$
    – Řídící
    Commented Mar 19, 2014 at 17:56
  • $\begingroup$ The frequencies are missing. As simple as that. Not present. What our ear does instead is remember what should be there, based on experience. So when you talk to somebody, you know over the phone your brain adds what must be there. Still I noticed that the first time I did this my brain gave me the real info (lacking frequencies) and only later learned that it can just fake the rest, based on the knowledge of the voice of the opponent. See en.wikipedia.org/wiki/Code_Excited_Linear_Prediction CELP which uses a similar approach for compression. $\endgroup$ Commented Mar 19, 2014 at 18:07
  • 2
    $\begingroup$ @DisplayName Should add (edit in) that info to your answer, I think it's relevant. $\endgroup$
    – Kyle Oman
    Commented Mar 19, 2014 at 18:09
  • 2
    $\begingroup$ Sampling rates are a red herring here. They have no bearing on why an originally analog system is limited to under 4 kHz bandwidth. It's a matter of long, unamplified twisted pairs running all the way back to the CO, and what was a reasonable design to get a human voice across this system. There was no reason to need the fidelity to carry music, so it wasn't built in. $\endgroup$
    – Phil Perry
    Commented Mar 20, 2014 at 15:22
  • 2
    $\begingroup$ ...and it's still very limited in range and bandwidth, because it has to be compatible with the existing network, in particular the wires. No magic, just some compression tricks you can do with digital. $\endgroup$
    – Phil Perry
    Commented Mar 21, 2014 at 16:28
2
$\begingroup$

This is due to signal processing, not physics. Telephone carriers apply aggressive compression optimized for recording only speech well. The AMR codec, still in use, dates to 1999 and achieves up to about 13 kbit/s. Any other codec would not record music well at that bitrate, either. Even MIDI consumes more data.

$\endgroup$
4
  • 2
    $\begingroup$ This limitation has been there since before the phone company ever considered applying compression to digital signals or even digitizing the signals on their network. And it does relate to physics, specifically the bandwidth of the analog connections in the network. The compression schemes used are designed to match the characteristics of the existing network, not the other way around. $\endgroup$
    – The Photon
    Commented Mar 19, 2014 at 22:41
  • $\begingroup$ AMR is a mobile phone codec. Fixed line telephone carriers don't apply compression. Fixed line bandwidth is cheaper than the computation costs. Furthermore, mobile phones have multi-codec support. It would be easier to support music on mobile phones; just signal that you will use a high-quality codec. $\endgroup$
    – MSalters
    Commented Mar 21, 2014 at 12:12
  • $\begingroup$ Mystifying. Where is analog still used, and what would naturally impose cutoff frequencies of 300 and 3500 Hz? I skimmed Nyquist and Shannon, but they don't discuss specific engineering limitations of their time. The common standard for digital telephony is μ-law G.711 PCM, from 1972. That filters the entire communication if some links can't agree on a higher standard. $\endgroup$
    – user130144
    Commented Mar 21, 2014 at 22:48
  • 1
    $\begingroup$ @user130144, Land line phones still often have an analog connection to the central office. Mine at home is relying on lines that were installed 50-80 years ago. $\endgroup$
    – The Photon
    Commented Mar 22, 2014 at 17:01
0
$\begingroup$

Phone companies only built the phone to carry voice frequencies. Bass and tweeter frequencies are generally out of the range of what the phones were built to do. I used to listen to a radio show that when a caller called in with some lame joke, they would play crickets chirping to the person on the phone. It took them a long while and several awkward moments before that figured out the caller on the phone could not hear the crickets but radio listeners could. So they did an on air test and patched crickets to phone and phone to broadcast. Sure enough the crickets were almost completely blocked out by the phone system.

$\endgroup$
5
  • 2
    $\begingroup$ That's actually quite a nice anecdote. But it doesn't really answer the question much beyond that the phones weren't built for music. What happens to those frequencies? Are they indeed completely muted (as your anecdote seems to suggest)? If so, why? $\endgroup$
    – Řídící
    Commented Mar 19, 2014 at 18:02
  • 1
    $\begingroup$ @GlenTheUdderboat the original filters were designed to pass "normal" voice-range frequencies and nothing more, thus maximizing conversation clarity while minimizing total required bandwidth. Even in those long-ago analog days, bandwidth meant power. $\endgroup$ Commented Mar 19, 2014 at 18:35
  • $\begingroup$ @CarlWitthoft I never realised there was an actual (and intentional) filter (which would completely explain the "built to"; as in actual suppression). Do you (perhaps) have a reference of some kind? $\endgroup$
    – Řídící
    Commented Mar 19, 2014 at 18:41
  • $\begingroup$ @GlenTheUdderboat A few relevant comments pop up in this page: cnx.org/content/m15683/latest/?collection=col10503/latest . $\endgroup$ Commented Mar 19, 2014 at 18:53
  • 1
    $\begingroup$ @GlenTheUdderboat: This is commonly understood by electronic engineers. The absence of such filters cause aliasing. Sampling a 5Khz signal at 4 Khz produces a signal which is indistinguishable from a 3 Khz signal. Therefore, every analog input to be sampled is always filtered first. In the really old days, the lines themselves acted as such a filter. $\endgroup$
    – MSalters
    Commented Mar 21, 2014 at 12:21
-1
$\begingroup$

There are a few different reasons. Let's only face the digital channel.

  1. Only a band limited signal is being used. G.711 uses a sampling rate of 8kHz resulting in a usable bandwidth of 4kHz which is left for the voice. It's OK for voice telephony but almost unusable for music. Other codecs use different bandwidths, for example G.722 (Wideband telephony) uses a sampling rate of 16 kHz, effective usable bandwidth ~8 kHz. This sounds much better.

  2. Special case takes place in cell phone codecs. These are so called hybrid codecs. This codecs are highly optimized for voice transmission (so called hybrid codecs). You use different kind of models of the vocal tract which are being excited by highly reduced signal form of your voice. If you are into this stuff, look for: Baseband-RELP, GSM Fullrate Codec, CELP. But beware: this is heavy stuff.

$\endgroup$
-3
$\begingroup$

Using the Nyquist theorem, telephones will only transmit frequencies which are half the sampling rate called the Nyquist frequency correctly; so with a sampling rate of 8000 samples per second it will only transmit sounds with frequency less than 4000Hz correctly.

The fundamental frequency (the pitch you hear) of the human voice is in the range of 80 to 1100 Hz. Harmonic frequencies (component frequencies with an frequency of an integer multiple of the fundamental frequency) of the human voice may be much higher. Therefore a sampling rate of 8000 samples per second is sufficient to transmit human voices without many issues (harmonics may still exceed the Nyquist frequency).

When frequencies above the Nyquist frequency are transmitted, such as in the case of transmitting music, aliasing occurs. This causes distortion. This is detailed in the diagram below.

aliasing

The red line is the original signal. The blue dots represent times that samples are taken of the original signal. The blue line is the signal reconstructed by the ear from the insufficient sampling rate. As you can see, it has been distorted from the red signal and it now has a lower frequency; a frequency lower than the Nyquist frequency of the sampling rate.

I wrote some simple Matlab code for an aliasing experience.

WARNING: Turn volume on speakers/headset right down before execution.

% Aliasing in Matlab.
% http://physics.stackexchange.com/questions/104281/why-cant-you-hear-music-well-over-a-telephone-line

fs = 8000 % sampling rate (Hz)

nyquistfrequency = fs / 2 % Nyquist frequency (Hz)

freq = [1000;
        2000;
        3500; % ^ these frequencies will play fine

        4500; % v these frequencies will experience aliasing and distort to a frequency lower than the Nyquist frequency
        6000;
        7000 ]; % frequencies (Hz)

duration = 1; % duration of signal

numberofsamples = ceil( duration * fs ); % number of samples

sample_times = (1 : numberofsamples) / fs;

[h w] = size(freq);

for i = 1 : h,
  currentfrequency = freq(i) % current frequency
  simplesound = sin( 2 * pi * currentfrequency * sample_times ); % create sound
  wavplay( simplesound, fs ) % play sound
end;
$\endgroup$
5
  • 1
    $\begingroup$ Can someone explained why this was downvoted? $\endgroup$
    – hadsed
    Commented Mar 25, 2014 at 22:03
  • $\begingroup$ You are witnessing the authoritarian physics.stackexchange community at work :) $\endgroup$ Commented Mar 25, 2014 at 22:18
  • 1
    $\begingroup$ I didn't realize you were the OP. It seems.. like bad form to mark your own answer as the accepted one if you're the asker. That said, I don't see why someone would downvote unless there was incorrect information, in which case they should also leave a comment because now people like me are just sitting here confused. $\endgroup$
    – hadsed
    Commented Mar 26, 2014 at 0:59
  • $\begingroup$ In my opinion, this answer is 100% on topic and 100% correct (hence I marked it so). The other answers talking about the history of the phone exchange are completely irrelevant to my question! Did you run my matlab code yet? $\endgroup$ Commented Mar 26, 2014 at 1:03
  • 2
    $\begingroup$ Digitizing circuits almost invariably will use a filter before digitizing in order to prevent aliasing (see for example anti-aliasing filter). So rather than having high frequencies show up as low frequencies, they are attenuated (in the simplest case, with an RC filter). That makes much of this answer simply wrong - or rather, not applicable to the question. May I suggest you consider accepting a more correct answer, or edit your own? $\endgroup$
    – Floris
    Commented Oct 29, 2015 at 13:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.