2

(question rewritten integrating bits of information from answers, plus making it more concise.)

I use analyser=audioContext.createAnalyser() in order to process audio data, and I'm trying to understand the details better.

I choose an fftSize, say 2048, then I create an array buffer of 2048 floats with Float32Array, and then, in an animation loop (called 60 times per second on most machines, via window.requestAnimationFrame), I do

analyser.getFloatTimeDomainData(buffer);

which will fill my buffer with 2048 floating point sample data points.

When the handler is called the next time, 1/60 second has passed. To calculate how much that is in units of samples, we have to divide it by the duration of 1 sample, and get (1/60)/(1/44100) = 735. So the next handler call takes place (on average) 735 samples later.

So there is overlap between subsequent buffers, like this:

buffer overlap

We know from the spec (search for 'render quantum') that everything happens in "chunck sizes" which are multiples of 128. So (in terms of audio processing), one would expect that the next handler call will usually be either 5*128 = 640 samples later, or else 6*128 = 768 samples later - those being the multiples of 128 closest to 735 samples = (1/60) second.

Calling this amount "Δ-samples", how do I find out what it is (during each handler call), 640 or 768 or something else?

Reliably, like this:

Consider the 'old buffer' (from previous handler call). If you delete "Δ-samples" many samples at the beginning, copy the remainder, and then append "Δ-samples" many new samples, that should be the current buffer. And indeed, I tried that, and that is the case. It turns out "Δ-samples" often is 384, 512, 896. It is trivial but time consuming to determine "Δ-samples" in a loop.

I would like to compute "Δ-samples" without performing that loop.

One would think the following would work:

(audioContext.currentTime() - (result of audioContext.currentTime() during last time handler ran))/(duration of 1 sample)

I tried that (see code below where I also "stich together" the various buffers, trying to reconstruct the original buffer), and - surprise - it works about 99.9% of the time in Chrome, and about 95% of the time in Firefox.

I also tried audioContent.getOutputTimestamp().contextTime, which does not work in Chrome, and works 9?% in Firefox.

Is there any way to find "Δ-samples" (without looking at the buffers), which works reliably?

Second question, the "reconstructed" buffer (all the buffers from callbacks stitched together), and the original sound buffer are not exactly the same, there is some (small, but noticable, more than usual "rounding error") difference, and that is bigger in Firefox.

Where does that come from? - You know, as I understand the spec, those should be the same.

var soundFile = 'https://mathheadinclouds.github.io/audio/sounds/la.mp3';
var audioContext = null;
var isPlaying = false;
var sourceNode = null;
var analyser = null;
var theBuffer = null;
var reconstructedBuffer = null;
var soundRequest = null;
var loopCounter = -1;
var FFT_SIZE = 2048;
var rafID = null;
var buffers = [];
var timesSamples = [];
var timeSampleDiffs = [];
var leadingWaste = 0;

window.addEventListener('load', function() {
  soundRequest = new XMLHttpRequest();
  soundRequest.open("GET", soundFile, true);
  soundRequest.responseType = "arraybuffer";
  //soundRequest.onload = function(evt) {}
  soundRequest.send();
  var btn = document.createElement('button');
  btn.textContent = 'go';
  btn.addEventListener('click', function(evt) {
    goButtonClick(this, evt)
  });
  document.body.appendChild(btn);
});

function goButtonClick(elt, evt) {
  initAudioContext(togglePlayback);
  elt.parentElement.removeChild(elt);
}

function initAudioContext(callback) {
  audioContext = new AudioContext();
  audioContext.decodeAudioData(soundRequest.response, function(buffer) {
    theBuffer = buffer;
    callback();
  });
}

function createAnalyser() {
  analyser = audioContext.createAnalyser();
  analyser.fftSize = FFT_SIZE;
}

function startWithSourceNode() {
  sourceNode.connect(analyser);
  analyser.connect(audioContext.destination);
  sourceNode.start(0);
  isPlaying = true;
  sourceNode.addEventListener('ended', function(evt) {
    sourceNode = null;
    analyser = null;
    isPlaying = false;
    loopCounter = -1;
    window.cancelAnimationFrame(rafID);
    console.log('buffer length', theBuffer.length);
    console.log('reconstructedBuffer length', reconstructedBuffer.length);
    console.log('audio callback called counter', buffers.length);
    console.log('root mean square error', Math.sqrt(checkResult() / theBuffer.length));
    console.log('lengths of time between requestAnimationFrame callbacks, measured in audio samples:');
    console.log(timeSampleDiffs);
    console.log(
      timeSampleDiffs.filter(function(val) {
        return val === 384
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 512
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 640
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 768
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 896
      }).length,
      '*',
      timeSampleDiffs.filter(function(val) {
        return val > 896
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val < 384
      }).length
    );
    console.log(
      timeSampleDiffs.filter(function(val) {
        return val === 384
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 512
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 640
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 768
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 896
      }).length
    )
  });
  myAudioCallback();
}

function togglePlayback() {
  sourceNode = audioContext.createBufferSource();
  sourceNode.buffer = theBuffer;
  createAnalyser();
  startWithSourceNode();
}

function myAudioCallback(time) {
  ++loopCounter;
  if (!buffers[loopCounter]) {
    buffers[loopCounter] = new Float32Array(FFT_SIZE);
  }
  var buf = buffers[loopCounter];
  analyser.getFloatTimeDomainData(buf);
  var now = audioContext.currentTime;
  var nowSamp = Math.round(audioContext.sampleRate * now);
  timesSamples[loopCounter] = nowSamp;
  var j, sampDiff;
  if (loopCounter === 0) {
    console.log('start sample: ', nowSamp);
    reconstructedBuffer = new Float32Array(theBuffer.length + FFT_SIZE + nowSamp);
    leadingWaste = nowSamp;
    for (j = 0; j < FFT_SIZE; j++) {
      reconstructedBuffer[nowSamp + j] = buf[j];
    }
  } else {
    sampDiff = nowSamp - timesSamples[loopCounter - 1];
    timeSampleDiffs.push(sampDiff);
    var expectedEqual = FFT_SIZE - sampDiff;
    for (j = 0; j < expectedEqual; j++) {
      if (reconstructedBuffer[nowSamp + j] !== buf[j]) {
        console.error('unexpected error', loopCounter, j);
        // debugger;
      }
    }
    for (j = expectedEqual; j < FFT_SIZE; j++) {
      reconstructedBuffer[nowSamp + j] = buf[j];
    }
    //console.log(loopCounter, nowSamp, sampDiff);
  }
  rafID = window.requestAnimationFrame(myAudioCallback);
}

function checkResult() {
  var ch0 = theBuffer.getChannelData(0);
  var ch1 = theBuffer.getChannelData(1);
  var sum = 0;
  var idxDelta = leadingWaste + FFT_SIZE;
  for (var i = 0; i < theBuffer.length; i++) {
    var samp0 = ch0[i];
    var samp1 = ch1[i];
    var samp = (samp0 + samp1) / 2;
    var check = reconstructedBuffer[i + idxDelta];
    var diff = samp - check;
    var sqDiff = diff * diff;
    sum += sqDiff;
  }
  return sum;
}

In above snippet, I do the following. I load with XMLHttpRequest a 1 second mp3 audio file from my github.io page (I sing 'la' for 1 second). After it has loaded, a button is shown, saying 'go', and after pressing that, the audio is played back by putting it into a bufferSource node and then doing .start on that. the bufferSource is the fed to our analyser, et cetera

related question

I also have the snippet code on my github.io page - makes reading the console easier.

1
  • experiments I made have shown that if the "Δ-samples", computed as the question elaborates, is off, it's always too low, never too high, and the amount by which it's too low is always a multiple of 128. Commented Apr 4, 2020 at 2:03

3 Answers 3

2

I think the AnalyserNode is not what you want in this situation. You want to grab the data and keep it synchronized with raf. Use a ScriptProcessorNode or AudioWorkletNode to grab the data. Then you'll get all the data as it comes. No problems with overlap, or missing data or anything.

Note also that the clocks for raf and audio may be different and hence things may drift over time. You'll have to compensate for that yourself if you need to.

4
  • I'm confused. ScriptProcessorNode has .onaudioprocess to which you feed a callback function, which is already being called periodically. So my first guess would be that you do everything in that callback function. So where does raf (requestAnimationFrame) come in? What would I need that for? It might make total sense what you're saying, but without example code, I'm not quite understanding what you mean. As for the different clocks, yes, indeed. I'm trying to use the "audio time" only. Are you suggesting the same or something different? Commented Apr 2, 2020 at 16:32
  • I have the impulse to accept your answer - because I think you're right. I really should be using ScriptProcessorNode or AudioWorkletNode. As for ScriptProcessorNode, I checked and confirmed: no overlap. Then again, I'd like to know if it's possible to reliably find those buffer overlap amounts of AnalyserNode without looking into the buffers. Still hoping that I'm wrong, and that it's possible. Commented Apr 2, 2020 at 17:33
  • For the first question, you can probably bufffer the data received from ScriptProcessorNode, and when raf is called, grab the appropriate set of data from the buffer. Or you could just update the graph whenever you have a new buffer. I don't do graphics/raf, so I'm not really knowledgeable here. Commented Apr 2, 2020 at 21:50
  • For your second question, I think you pretty much have to examine the data from an AnalyserNode. The timing from raf isn't perfect and neither is the timing of what data you get from an AnalyserNode because the audio thread is running independently of the main thread it can update data at unexpected times. (But, of course, not while you're reading the data out!) Commented Apr 2, 2020 at 21:52
2

Unfortunately there is no way to find out the exact point in time at which the data returned by an AnalyserNode was captured. But you might be on the right track with your current approach.

All the values returned by the AnalyserNode are based on the "current-time-domain-data". This is basically the internal buffer of the AnalyserNode at a certain point in time. Since the Web Audio API has a fixed render quantum of 128 samples I would expect this buffer to evolve in steps of 128 samples as well. But currentTime usually evolves in steps of 128 samples already.

Furthermore the AnalyserNode has a smoothingTimeConstant property. It is responsible for "blurring" the returned values. The default value is 0.8. For your use case you probably want to set this to 0.

EDIT: As Raymond Toy pointed out in the comments the smoothingtimeconstant only has an effect on the frequency data. Since the question is about getFloatTimeDomainData() it will have no effect on the returned values.

I hope this helps but I think it would be easier to get all the samples of your audio signal by using an AudioWorklet. It would definitely be more reliable.

6
  • ah, spec mentions blackman window. That explains a lot - such as the blurring, at least potentially. Thank you! I looked a smoothingTimeConstant, and fiddled around with that. It had no effect whatsoever. Also, I conjectured right away that Firefox might have a different smoothingTimeConstant, which would explain the higher rms error in FF. But not so - it's also 0.8 in FF, just as in Chrome. Strange. spec calls 128 the 'render quantum', good point. Do you have example code for AudioWorklet? Commented Apr 2, 2020 at 12:20
  • 2
    The Blackman window and smoothingTimeConstant only apply when you want the frequency data. The time domain data is not modified in any way. Commented Apr 2, 2020 at 15:33
  • Thanks Raymond. I edited the answer to mention that the smoothingtimeconstant will have no effect. Commented Apr 3, 2020 at 14:05
  • 1
    Sorry mathheadinclouds, for the misleading info on the smoothingtimeconstant . The Chrome team has created some useful demos which show how the AudioWorklet can be used. googlechromelabs.github.io/web-audio-samples/audio-worklet Commented Apr 3, 2020 at 14:07
  • 1
    Firefox support is on the way. It's enabled in Nightly already. You could also use a polyfill like standardized-audio-context, GoogleChromeLabs/audioworklet-polyfill or jariseon/audioworklet-polyfill. They all use the AudioWorklet if it is available and otherwise fall back to the ScriptProcessorNode. Commented Apr 4, 2020 at 9:37
0

I'm not really following your math, so I can't tell exactly what you had wrong, but you seem to look at this in a too complicated manner.

The fftSize doesn't really matter here, what you want to calculate is how many samples have been passed since the last frame.

To calculate this, you just need to

  • Measure the time elapsed from last frame.
  • Divide this time by the time of a single frame.

The time of a single frame, is simply 1 / context.sampleRate.
So really all you need is currentTime - previousTime * ( 1 / sampleRate) and you'll find the index in the last frame where the data starts being repeated in the new one.

And only then, if you want the index in the new frame you'd subtract this index from the fftSize.

Now for why you sometimes have gaps, it's because AudioContext.prototype.currentTime returns the timestamp of the beginning of the next block to be passed to the graph.
The one we want here is AudioContext.prototype.getOuputTimestamp().contextTime which represents the timestamp of now, on the same same base as currentTime (i.e the creation of the context).

(function loop(){requestAnimationFrame(loop);})();
(async()=>{
  const ctx = new AudioContext();
  
  const buf = await fetch("https://upload.wikimedia.org/wikipedia/en/d/d3/Beach_Boys_-_Good_Vibrations.ogg").then(r=>r.arrayBuffer());
  const aud_buf = await ctx.decodeAudioData(buf);
  const source = ctx.createBufferSource();
  source.buffer = aud_buf;
  source.loop = true;
  
  const analyser = ctx.createAnalyser();
  const fftSize = analyser.fftSize = 2048;
  source.loop = true;
  source.connect( analyser );
  source.start(0);
  
  // for debugging we use two different buffers
  const arr1 = new Float32Array( fftSize );
  const arr2 = new Float32Array( fftSize );

  const single_sample_dur = (1 / ctx.sampleRate);
  console.log( 'single sample duration (ms)', single_sample_dur * 1000);

  onclick = e => {
    if( ctx.state === "suspended" ) {
      ctx.resume();
      return console.log( 'starting context, please try again' );
    }
    
    console.log( '-------------' );
    
    requestAnimationFrame( () => {
      // first frame
      const time1 = ctx.getOutputTimestamp().contextTime;
      analyser.getFloatTimeDomainData( arr1 );
      
      requestAnimationFrame( () => {
        // second frame
        const time2 = ctx.getOutputTimestamp().contextTime;
        analyser.getFloatTimeDomainData( arr2 );
                
        const elapsed_time = time2 - time1;
        console.log( 'elapsed time between two frame (ms)', elapsed_time * 1000 );
        
        const calculated_index = fftSize - Math.round( elapsed_time / single_sample_dur );
        console.log( 'calculated index of new data', calculated_index );

        // for debugging we can just search for the first index where the data repeats
        const real_time = fftSize - arr1.indexOf( arr2[ 0 ] );
        console.log( 'real index', real_time > fftSize ? 0 : real_time );
        
        if( calculated_index !== real_time > fftSize ? 0 : real_time ) {
          console.error( 'different' );
        }
       
      });
    });
  };
  document.body.classList.add('ready');

})().catch( console.error );
body:not(.ready) pre { display: none; }
<pre>click to record two new frames</pre>

10
  • @mathheadinclouds: please then consider deleting your "too loud" comments and removing the downvote . Commented Apr 2, 2020 at 12:36
  • @HovercraftFullOfEels I agree the comments should be gone, the downvote, they do deal with it as they wish. Nobody should tell them what to do with it.
    – Kaiido
    Commented Apr 2, 2020 at 12:42
  • @mathheadinclouds comments are ephemeral here. They're here to tell the author about some problem with the content they wrote. You did that, that was fair use of the comment. You did that in a quite aggressive manner, that was not fine, but no offense from me. Now keep it in mind next time. Sometimes I force myself a good night of sleep before responding here. However, the message of these comments has been heard, I edited my question with I think what you need, or at least, in a better way than it was. Your comments don't apply anymore, you can delete them.
    – Kaiido
    Commented Apr 2, 2020 at 12:46
  • @mathheadinclouds if you now have other concerns on the edit, then feel free to write new comments, but I'll handle them only in 12hrs from now.
    – Kaiido
    Commented Apr 2, 2020 at 12:55
  • 1
    indeed, I tried it also, and audioContent.getOutputTimestamp().contextTime works on Firefox - most of the time. Strange that whatever you do, it works only most of the time. Maybe it will work next year. Commented Apr 4, 2020 at 0:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.