22
$\begingroup$

According to my textbook, this is how it works

All of this just doesn’t make sense though.

I mean, doesn’t the amplitude represent the loudness and the frequency the pitch? Aren’t they completely independent from each other?

Is the book just lacking information or am I just not getting something?

I probably don’t have enough insight on how this works, I know the material on a AS Level (for the brits out there), so roughly high school. If you use some advanced explanations please give me some links so I can understand the knowledge I need to have before actually reading your answer lol.

$\endgroup$
6
  • $\begingroup$ A number of comments removed. To answer the question, please post an answer. $\endgroup$
    – rob
    Commented Oct 26, 2023 at 20:46
  • $\begingroup$ Is this the A-Level Computer Science textbook by Heathcote and Heathcote? $\endgroup$
    – wizzwizz4
    Commented Oct 27, 2023 at 17:14
  • $\begingroup$ dsp.stackexchange.com $\endgroup$ Commented Oct 29, 2023 at 8:34
  • $\begingroup$ @wizzwizz4 yes it is lol :).It's the PG Online AQA A-AS Level book. The AQA spec is very ambiguous, it states that students must be able to: Describe the principles of operation of: an analogue to digital converter (ADC) and a digital to analogue converter (DAC). $\endgroup$
    – RedP
    Commented Oct 29, 2023 at 23:04
  • $\begingroup$ @RedP That textbook has many other errors (e.g. the Fully-qualified domain name page is totally wrong and they attribute map-reduce to Google): you could probably get 10k across the network by picking any given page, asking questions about the nonsense on it, and repeating. Aside from the OO programming examples (they have a rather idiosyncratic understanding of what OO is), AQA will give you the marks for writing correct answers, so it won't harm you to learn correct things. $\endgroup$
    – wizzwizz4
    Commented Oct 29, 2023 at 23:16

12 Answers 12

56
$\begingroup$

"Amplitude" is the wrong word. The amplitude of a periodic function is the difference between its greatest value and its least value. Cross out "amplitude" from your textbook, and pencil in, "instantaneous value." For a strictly periodic function, "amplitude" is just a single number. For a signal, it's usually regarded as a value that changes slowly over time.

What the ADC hardware measures is not slowly-changing amplitude. The ADC measures the instantaneous value of the signal, and it measures it tens of thousands of times every second.

There's actually two blocks missing from the diagram. In between the left-hand "amplifier" block and the "ADC" block should be a block labelled "anti-aliasing filter," and in between the "DAC" block and the right-hand "amplifier" block should be a block labelled "reconstruction filter".*

The whole system—from one end to the other—is not only meant to reproduce the "amplitude" and the "frequencies" of the original signal. It's meant to faithfully reproduce every detail of the original waveform that your ears are able to percieve.


* Those filters are always present. Some part of the circuit and/or the transducers will be performing those functions regardless of whether or not the engineers who designed the electronic circuits were aware of it. The engineers who build crappy sound reproduction systems maybe are not aware of anti-aliasing and reproduction filters, but the engineers who design the high-quality stuff absolutely know all about them.

In a system with inadequate or un-designed reconstruction filtering, you may hear high-pitched "quantization noise", and in a system with an inadequate or un-designed anti-aliasing filter, you may hear other weird tones and artifacts.

$\endgroup$
17
  • 25
    $\begingroup$ I think it is pretty common to use the word 'amplitude' to mean 'instantaneous value'. Although ... a quick Google search does agree with you, that it's a misuse of the word. Today I learned. $\endgroup$ Commented Oct 24, 2023 at 23:25
  • 4
    $\begingroup$ Re ""amplitude" is just a single number": Yet, it is called amplitude modulation (AM): "In amplitude modulation, the amplitude (signal strength) of the wave is varied in proportion to that of the message signal, such as an audio signal.". Even if it is not the instantaneous value, at least it varies (is a function of time) $\endgroup$ Commented Oct 25, 2023 at 0:14
  • 7
    $\begingroup$ @PeterMortensen, I said, "amplitude...changes slowly over time." Well, the amplitude that we're talking about in amplitude modulation, is the amplitude of the carrier wave, and changing that at a few kilohertz is "slowly" when you compare it to the carrier frequency. $\endgroup$ Commented Oct 25, 2023 at 0:30
  • 8
    $\begingroup$ @rexkogitan, ohhhh so the “current value” is the height of the wave at the instant? $\endgroup$
    – RedP
    Commented Oct 25, 2023 at 7:50
  • 3
    $\begingroup$ For the function $f(t) = A \sin(\omega t + \varphi)$, isn't its amplitude $A$, and not $2A$, as your definition would suggest? I.e. isn't it the maximum deviation from the mean signal value, and not between its two extremes? $\endgroup$
    – Igor F.
    Commented Oct 26, 2023 at 7:59
43
$\begingroup$

A slight clarification on Solomon Slow's complete answer:

If you sample the original signal frequently enough, the instantaneous values you measure at each of those tiny time slices will actually contain the frequency information which you can then recover by re-assembling the sample slices and playing them back in sequence.

For this to be true, the sampling rate must meet a mathematical condition called the Nyquist sampling criterion which basically states that to capture the frequency content you must sample it at a frequency at least twice that of the highest-frequency signal component you wish to catch. So for a digital audio recorder to detect a 20,000 Hz frequency, it has to sample the waveform at about 40,000 Hz.

$\endgroup$
11
  • 7
    $\begingroup$ The Nyquist criterion is incredible. As long as we don't care about about sounds over eg. 20kHz, then we can reproduce the original waveform exactly by sampling at ~40kHz. Unlike eg. video, sampling at a higher rate provides no additional benefit, because we already have a perfect reproduction! At least theoretically - in the real world, due to the way low-pass filters work, we actually sample at 44.1kHz in order to faithfully reproduce <=20kHz sounds. $\endgroup$ Commented Oct 24, 2023 at 23:43
  • 6
    $\begingroup$ @BlueRaja - Danny Pflughoeft: Re "we can reproduce the original waveform exactly": Doesn't that require an ideal filter (which doesn't exist in a causal form. And require signals of infinite lengths. Or an output latency of infinite time)? Perhaps "exactly" should be qualified? $\endgroup$ Commented Oct 25, 2023 at 0:24
  • 13
    $\begingroup$ @PeterMortensen, You are correct. Nyquist's theory is just that: It's theory. In a practical system, the cutoff of the anti-aliasing and reconstruction filters are somewhat below the theoretical Nyquist frequency so as to leave some room for imperfections in the filters. In a more sophisticated, practical system, the digital signals are oversampled so that the filtering can be partly done in the digital domain, where high-order filters are less expensive to implement. $\endgroup$ Commented Oct 25, 2023 at 0:35
  • 2
    $\begingroup$ Note that Wikipedia refers to this as the Nyquist-Shannon Sampling Theorem, and I was taught it at as the Shannon Sampling Theorem so I believe this is a not-uncommon term for the same thing. $\endgroup$
    – James_pic
    Commented Oct 25, 2023 at 8:52
  • 2
    $\begingroup$ "a frequency at least twice that of the highest-frequency signal component you wish to catch" More than, not at least. $\endgroup$ Commented Oct 26, 2023 at 1:04
30
$\begingroup$

Your instinct is correct, there are two degrees of freedom here so we have to measure two quantities. The one you are missing is the timestamp. We are measuring both the voltage of the signal and the time that that voltage occurs. As long as when we reproduce the signal we space the voltage measurements out with the same time steps as the original, we'll get back the frequency.

$\endgroup$
2
  • 9
    $\begingroup$ While the other answers are more complete, I believe this one most clearly explains the thing that is missing from @RedP's understanding. $\endgroup$
    – JakeRobb
    Commented Oct 25, 2023 at 14:29
  • 1
    $\begingroup$ Each sample is a degree of freedom. $\endgroup$ Commented Oct 26, 2023 at 1:04
5
$\begingroup$

Your ears only hear one thing: the air pressure (technically, the difference between instantaneous and average air pressures) as a function of time: $P(t)$. To lightly edit your textbook:

During the process of converting an analogue sound into a digital recording, a microphone converts the sound energy into electrical energy. The analogue to digital converter (ADC) samples the analogue data at least 40,000 times per second, measuring the amplitude of the waveform at each instant and converting it to a binary value according to the resolution or audio bit depth being used for each sample.

Why the edits matter:

As was pointed out in another answer, young, healthy humans hear frequencies between 20 and 20,000 Hz, and by sampling at double the maximum, we can faithfully reconstruct high-frequency waveforms up to 20,000 Hz. I've described the sampling rate without the word frequency to clarify that it's not an audio phenomenon, despite having the same units. By describing $P(t)$ as a waveform, we can distinguish it from single sinusoid. Also note that audio waves are defined at instants (points in time) rather than points in space.

There are many difficult maths which help us understand the relationship between $P(t)$ and frequency. The important point for your question is that the frequency information is fully encoded in $P(t)$. Given $P(t)$, we can completely determine the frequency and phase information, and given the frequency and phase information, we can completely determine $P(t)$: they are not independent.

By the way, truly understanding all the math in the two links would get you a long way through university math physics, so don't sweat all the details yet - they're just for background.

$\endgroup$
5
  • $\begingroup$ "Your ears only hear one thing: the air pressure (technically, the difference between instantaneous and average air pressures) as a function of time: P(t)" No, the cilia in the ear each resonate at a different frequency, and so measure the frequency, not the pressure. $\endgroup$ Commented Oct 26, 2023 at 1:06
  • $\begingroup$ @Acccumulation And what vibrates the cilia? Also, cilia is plural. So you meant to say 'the cilium in the ear each resonate ... ' which is a true statement, but they don't vibrate in a vacuum $\endgroup$
    – user121330
    Commented Oct 26, 2023 at 2:03
  • $\begingroup$ @user121330 "the cilia in the ear each resonate at a different frequency" is correct, even though (even because) cilia is plural, just as "the phones in the church each ring at a different time" is correct. If the singular cilium were used, resonate would have to have an s on the end, and each (of one cilium) would be nonsense. $\endgroup$
    – LarsH
    Commented Oct 26, 2023 at 12:36
  • $\begingroup$ But I agree with you that at one level, the ear (as bounded at the eardrum) only responds to changes in air pressure (which are internally detected as vibration frequencies by the stereocilia). You could also look at the stereocilium level and say that the ear only responds to frequencies (and amplitudes). Also I just learned that these stereocilia in the inner ear are distinct from cilia (en.wikipedia.org/wiki/Stereocilia). $\endgroup$
    – LarsH
    Commented Oct 26, 2023 at 12:49
  • $\begingroup$ @Acccumulation I don’t think the cilia resonate at all. Different frequencies cause humps at different parts of the of the basilar membrane (like resonance) and the cilia at those bumps are triggered. But I feel like the inner ear is a demodulating mechanism, not a detecting mechanism. The outer and middle ear form the detecting mechanism and that mechanism is only sensitive to pressure changes. $\endgroup$ Commented Oct 27, 2023 at 3:38
4
$\begingroup$

The textbook is using "amplitude" differently from the way you are thinking of it. You are correct that one use of "amplitude" is to describe half of the peak-to-peak distance of a sine wave. But, the text is using amplitude to mean the value the wave takes at some instant in time.

To answer your broader question, think about what the computer is actually doing. The first audio recorders were physical: sound would vibrate a membrane that was attached to a stylus, which would etch the motion of the membrane into a rotating mass of wax. The sound could be replayed by putting the stylus into the groove, and rotation of the wax would cause the stylus to shake as it followed the wavy groove and the membrane would transmit that motion back into the air producing sound. Nothing in that apparatus knows about frequency or sine waves. It is enough to record a time-history of the motion of the air/membrane/stylus.

The computer is simply recording the motion of a membrane in a microphone (perhaps the changing voltage of a piezoelectric transducer as it is deformed by the membrane, or the capacitance between a membrane and a plate as the motion of the air pushes them closer and further apart), then reproducing that motion in another membrane: a speaker (typically through electromagnetic actuation). If it samples the motion quickly enough, the collection of instants in time is indistinguishable from the original motion to human ears -- sort of like how video represents motion as a collection of still images rapidly played back. But, whereas the threshold for visual continuity is somewhere in the tens of hertz (frames per second), the threshold for audio continuity is in the tens of thousands of hertz (sample rate).

In high school, you would have learned about audio in a continuous context. The textbook is discussing it in a discrete context. It's no surprise you found it confusing: if you had continued through college studying acoustics, you probably would have taken more than one course teaching you the math behind analysing discrete representations of signals. It's a big topic!

$\endgroup$
1
  • 2
    $\begingroup$ The last sentence of your first paragraph is the key point. What is being sampled is displacement at regular intervals (of half the periodic time or less). $\endgroup$ Commented Oct 25, 2023 at 21:29
4
$\begingroup$

There are many great answers but since I really love this subject, I want to take an approach from the physical notion of sound wave, with the intention to complement the other great answers.

Sound is a physical phenomenon caused by the vibration of an object that causes particles of the medium (usually air) to generate longitudinal waves. These waves propagate in all directions from the source of the sound (vibrating object), and these affect the normal atmospheric pressure, creating changes in atmospheric pressure over time, thus, generating a waveform. These changes are what we perceive as sound, they carry all information of the sound. So if you wanted to reproduce a specific sound, all you would need is to reproduce/imitate those changes of pressure over time, neither more or less.

So, let say a sound is being generated in a fixed point $A$, and we have a microphone in another point $B$ near the source. This means that we are fixing a point in space and we are registering/measuring/recording those changes of atmospheric pressure we spoke about in the last paragraph.

enter image description here

Fuente de sonido (tono puro): Sound source (pure tone)

Presión atmosférica normal (silencio): Normal atmospheric pressure (silence)

Presión atmosférica máxima (PM): Maximum atmospheric pressure (PM)

Presión atmosférica mínima (Pm): Minimum atmospheric pressure (Pm)

Now we encounter technical difficulties, first, these changes in atmospheric pressure are continuous and computers and circuits can't deal well with a continuous magnitude, this is where the ADC, and sampling rate comes into play as well as the "Nyquist sampling criterion" niels nielsen spoke about.

So the idea is that your ADC has an infinity of options (pressure values over time), but we chose only certain instants of time with their respective pressure values, so basically, if we sample for example at 44.1kHz, what we are telling the ADC to do is the following: every $\frac{1}{44100}$ of a second I want you to register the value of the pressure level. So that every second we have 44100 values of the pressure in the instant $\frac{n}{44100}$ for $n \in \{1,2,...,44100\}$ <- this is how you obtain your digital sampled wave.

enter image description here

Finally you could say: ok, but where is the frequency here? Well, as I said, all you need is the waveform in order to capture all the information necessary to store and reproduce a given sound wave. The frequency analysis can be done later using only the waveform, through the awesome tools of Fourier analysis like the spectrogram.

$\endgroup$
2
$\begingroup$

I mean, doesn’t the amplitude represent the loudness and the frequency the pitch?

Yes, the amplitude is how loud the signal is. But, a low pitch can be loud and a high pitch can have the same "loudness" (or volume level).

So there is something different between an equal volume high pitch and low pitch.

That difference is how fast the sound wave is changing from high to low. A high frequency's wave changes faster from high to low than a low frequency's wave.

How do computers store sound waves just by sampling the amplitude of a wave and not the frequency?

So sampling the signal's value allows us to not just see highs and lows (the loudness) but also how close together in time those highs and lows are (the frequency) - because we know how fast we sampled the signal.

And as others have noted, the minimum rate to sample a signal to accurately capture that signal's frequency is the signal frequency * 2.

$\endgroup$
1
$\begingroup$

As others have explained, each sample has a value and a timestamp, and this provides the necessary information to perfectly reconstruct any band-limited signal where all its frequency components are below half the sample rate (the Nyquist frequency).

(This assumes infinite-precision samples and zero jitter in the sample timing. In practice, leaving some headroom in the sample rate and using 16-bit integer PCM samples makes it work very well for all signals a bit below half the sample rate. With dithering, the reconstruction part can even work ok for signal amplitudes down to less than 1 unit in the last place of your samples which would otherwise be lost to quantization.)

To get a feel for how this works, see it in practice with an analog signal generator, ADC, computer, DAC, and analog oscilloscope + spectrum analyzer. An excellent 23-minute video lecture, Digital Show and Tell by "Monty" Montgomery of xiph.org does this for you and explains what's going on in an easy-to-understand way that avoids and debunks some common misconceptions. It's very much worth your time. (Monty is the guy who developed the Ogg/Vorbis audio compression format, and was involved with Opus, as well as founding Xiph.org, so naturally the video is available in a variety of open formats :P.)

The ball-and-stick (lollipop) representation of the samples, and fitting a curve to them, is useful in understanding that a sample is taken at one instant, not a flat voltage across a whole time interval (i.e. a stair-step output). For signals anywhere near the Nyquist frequency, a stair-step model would be far from correct. (And not band-limited; a true stair-step has frequency components up to infinity, not band-limited.) Monty covers that in the first 8 minutes of his video.

Ball-and-stick samples with a sine curve reconstructed from them


P.S. yes, the main reason I wrote this answer was to link this videos. It's so good for building a qualitative understanding of the subject that IMO it's worth an answer, with enough of a text answer to sidestep the rules against link-only answers. I highly encourage everyone to watch if they're curious about the subject, or just want to enjoy a well-presented demo using real physical hardware by an expert in the subject. Subtitles are available in a few languages. There's also a transcript / text article version, https://wiki.xiph.org/Videos/Digital_Show_and_Tell with some math formatting and images where needed, including the one above.

$\endgroup$
0
$\begingroup$

There whe several answers showing how the sound is transformed to a voltage signal. In fact there is only one sample variable vs time. So the signal can be harmonic with an amplitude and frequency. However, in general, the signal is whatever it is, it can be time limited, so a pitch or frequency is present during a certain amount of time. What is frequency and amplitude of this signal? There is a mathematic instrument called Fourier transform, which takes a time dependent signal and calculate it's frequency decomposition (there might be several pitches), for each frequency it gives an amplitude, so the result signal is a sum of those components. So, to sum up, the frequency and amplitude information is, kind of, included in the signal. What you need is only time dependent signal.

$\endgroup$
0
$\begingroup$

Perhaps it might help to consider how sound is heard, and how it can be mechanically recorded a vinyl record.

Sound is just the change of air pressure, measured at a point.

So, we hear by the diaphragm of our eardrum mechanically displacing in and out, pushed and pulled by soundwaves. This displacement is one-dimensional, like any drum-skin: in or out. How fast it wiggles in or out is the pitch. How far it wiggles in or out is the volume. But at any moment in time, our eardrum is only in one position, with a certain displacement from its rest position.

To record it mechanically on a vinyl record, we take the vibrations of a similar mechanical diaphragm, and connect it to a needle, which scratches a surface in a way that exactly reproduces those vibrations. Like a heart monitor or a seismograph, but for sound waves.

Only one wiggly line scratched into a surface stores all frequencies and amplitudes (pitches and volumes) of the sound. Well, OK, it can get a bit cleverer for stereo, but let's ignore that for now!

There isn't one line for each frequency: there's just one single wavy line, or "waveform". For any point in time, all that line has "stored" for all of the sounds in a musical recording, is the diaphragm's displacement from the rest position.

Just as with our eardrums, the displacement doesn't contain any information about frequency, or about what instrument was playing, or anything. Just "how loud that sound was, right then."

But the changing values of that one displacement value over time is how all the subtlety and nuance of a myriad different sounds in a music track are heard. All the instruments and singing, all stored in that one wavy line.

Are we exactly reproducing the sound, as it truly was? No, there will be elements of the sound that are too high-frequency to move the diaphragm, since the diaphragm has mass. But long as the diaphragm-and-needle combination is at least as sensitive to air movement as the diaphragm of the human eardrum, though, it'll reproduce enough of the sound to replay the whole spectrum of audible sound.

So, what if we want to store that wiggly wave as a sequence of discrete numbers?

We'd slice that wiggly line up into parts and write down the displacement at each point.

If we do it often enough, we should be able to send those values to control a speaker diaphragm to move forward and backwards by the amounts we'd recorded, and it'll move in the same path as the scratched line we recorded earlier.

How big should the numbers be, that we use to write down the displacement? Well, if we figure out the smallest change in displacement our recording equipment (or our ears) can detect, that should be the value of '1' our recording. Doesn't necessarily even need to be linear: maybe our ears don't respond linearly to changes in volume, in which case the volume difference between "0" and "1" can be different to that between "500" and "501".

How often should we write the numbers down? Well, there's no point recording/sampling more often than the fastest the diaphragm (or our ears) can move between two adjacent numbers, since the values can't change more often than that.

$\endgroup$
0
$\begingroup$

Digital audio can be viewed the same way as analog audio. Which digital audio is a sampling of the amplitude over time. It is generally not instantaneous physically, because that would require an infinite sample rate. It is an averaged energy at a time delta of 1/sample-rate. You can learn more by looking into the sampling theorem.

If we take enough samples and look at them in a 2d graph in time and amplitude. (The time domain). We can see that the discrete samples will form a wave with height (amplitude) and width (wave length). The wave length in a sine wave is related to the wave's frequency.

If we add more waves of different frequencies (wave lengths) we can get more complicated waves and hear multiple frequencies. And we can move from the time-domain into the frequency-domain by transforming the samples using a discrete Fourier transform. This is a little complicated, but important in general for audio analysis. Needless to say frequencies of digital audio cannot be determined with one or two samples. We need multiple samples over time to measure a frequency.

Which is to say in order to hear a certain pitch the amplitude needs to be moving up and down at a given frequency, but amplitude alone is not a pitch.

$\endgroup$
-1
$\begingroup$

The issue is you do not understand the principles of sound waves.

A sound wave is completely described in two dimension, by a sin wave. The sin function takes the time and outputs the amplitude. So any discrete approximation is clear hopefully from there.

$\endgroup$
13
  • 1
    $\begingroup$ For example it has NOTHING to do with frequencies. You could have a wave, that never repeats at a single frequency. There is fundamentally no notion of frequency necessarily. Yet, it produces a specific, precise sound that will be encoded. $\endgroup$ Commented Oct 27, 2023 at 2:05
  • $\begingroup$ There is no question that an audio sound wave is precisely defined by amplitudes at points in time. $\endgroup$ Commented Oct 27, 2023 at 2:07
  • 5
    $\begingroup$ I have a degree in mathematics and I’ve studied physics, acoustics, and psychoacoustics for decades and I don’t understand this answer. So I’m not sure it’s as good an answer as you seem to think it is. Perhaps the thinking behind this answer is great, and maybe it’s just the words used to explain the thinking are unclear. Either way, I’d suggest a bit of humility and re-assessment of what you’ve written to see if you can expand and clarify. $\endgroup$ Commented Oct 27, 2023 at 3:45
  • 1
    $\begingroup$ A single pure tone can be represented by a sine wave with a given frequency. But a more general sound can't be represented by a single sine wave; the overpressure as a function of time is some more complicated function. This distinction is, it seems to me, part of the OP's confusion, and a good answer would have to address that. $\endgroup$ Commented Oct 27, 2023 at 11:32
  • $\begingroup$ This is like discussion here, of course frequencies are important. But the core principle of audio is amplitudes at points in time. $\endgroup$ Commented Oct 29, 2023 at 9:18

Not the answer you're looking for? Browse other questions tagged or ask your own question.