43
\$\begingroup\$

Given some event in a game, what is the maximum delay to producing audio that the player will properly associate the audio with that event (and not perceive lag)?

\$\endgroup\$
5
  • \$\begingroup\$ Not much. I'd guess it has to be less than 1/10 of a second. Though personally, I might notice it if it were more than a few frames at 60 FPS. \$\endgroup\$
    – Almo
    Commented May 13, 2014 at 18:27
  • \$\begingroup\$ Don't forget that in most cases the rendered output will have some lag too, some of which will come from the monitor. It can take over 100 ms for the result of player input to be displayed on screen. See anandtech.com/show/2803 \$\endgroup\$
    – Adam
    Commented May 13, 2014 at 22:22
  • 1
    \$\begingroup\$ It's around 20 milliseconds when playing an instrument, around 80 milliseconds when you're a listener. This is just my personal experience, your mileage may vary. \$\endgroup\$
    – rwols
    Commented May 13, 2014 at 23:57
  • \$\begingroup\$ More than any specific time you need consistency. As long as everything has the same delay you can be within reason. If everything is 100ms late you may not really notice it but if some sounds are near instant and the rest are 100ms or something inbetween then you will notice. \$\endgroup\$
    – 0xFADE
    Commented May 14, 2014 at 20:24
  • \$\begingroup\$ If you are in any way interested in some sort of realistic behaviour, you could consider some delay for events far away from the listener as something positive. \$\endgroup\$
    – Darkwings
    Commented May 7, 2016 at 11:30

6 Answers 6

52
\$\begingroup\$

The following result are calculated for lip synchronization which is concidered to be "the most noticeable a/v sync error".


Wikipedia says

For television applications, audio should lead video by no more than 15 milliseconds and audio should lag video by no more than 45 milliseconds. For film, acceptable lip sync is considered to be no more than 22 milliseconds in either direction.


The Media and Acoustics Perception Lab says

The results of the experiment determined that the average audio leading threshold for a/v sync detection was 185.19 ms, with a standard deviation of 42.32 ms


The ATSC says

At first glance it seems loose: +90 ms to -185 ms as a “Window of Acceptability”

and

  • Undetectable from -100 ms to +25 ms
  • Detectable at -125 ms & +45 ms
  • Becomes unacceptable at -185 ms & +90 ms

(– Sound delayed,+ Sound advanced)


To conclude

The results aren't so far from one another. It seems that the maximum acceptable delay is around 150ms, which is 9 frames at 60 frame per second.

\$\endgroup\$
5
  • 3
    \$\begingroup\$ "If you have a delay, it should be the video that is delayed." seems that it should be reversed, the ATSC article clearly states that people expect/tolerate sound happening a bit after the sight (since in real life sound lags sight by approx. 1 ms per foot of distance), but don't associate events properly if video event happens after the sound. \$\endgroup\$
    – Peteris
    Commented May 13, 2014 at 20:35
  • \$\begingroup\$ You are right, I completely misunderstood. Thank you. (I edited) \$\endgroup\$
    – Heckel
    Commented May 13, 2014 at 20:46
  • 1
    \$\begingroup\$ I can tell you from personal experience that this even varies between ears in the same person. I have a rare vestibular condition that actually causes my brain to process auditory stimulation in my left ear measurably delayed vs. the right ear. On a bad day this causes dizziness, but most of the time it is tolerable. So yes, this is extremely subjective. \$\endgroup\$ Commented May 13, 2014 at 21:21
  • \$\begingroup\$ Where do you get 150ms? Your sources clearly average around 45ms. \$\endgroup\$
    – mrr
    Commented May 15, 2014 at 0:25
  • \$\begingroup\$ Wikipedia says 45ms, but it isn't necessarily the most reliable source. The second source says 185.19 ms and the third 125ms until it becomes noticable. Can you quote the source to help me understand where I am wrong ? \$\endgroup\$
    – Heckel
    Commented May 15, 2014 at 6:12
9
\$\begingroup\$

It depends of the event

Feeling that, say, an explosion you see and hear is a single event will have the tolerances described in other answers - no more than ~50ms; some people may be more sensitive (e.g. musicians), so I'd suggest to aim at 30ms or no more than 2 frames at 60fps.

I believe that the perceived distance should affect those tolerances. People expect far sounds to be slightly delayed, since in real life sound lags sight by approximately 1ms per each foot of distance. So an explosion on an zoomed out RTS game 'map' might have a larger tolerance for sound lag than the player firing their own gun in an FPS.

Specialized cases, such as having a proper feel for a music/rythm game may require much tighter tolerances, 15-20ms or even lower - for example, if the player hears both the "input action" such as singing into a mic or banging a plastic instrument, and also a sound generated by your system for the same event, then a 50ms lag will cause the "original" and "played" sounds to mix weirdly.

In addition, keep in mind the lag between start of the audio file and the "event" inside that audio file - in many audio clips, the "event" won't be right on the edge, you may have a sound of a lightning strike where the 'strike' happens 200ms after the beginning, which would be obvious to everyone, and pretty much all sound files, even a drum-hit, will have some delay there.

Don't measure averages - look at worst case

Sight&hearing are deeply connected in human perception, and if one of them stutters relatively to other, then it will be perceivable. It's not okay if most of the time it's very fast but occasionally there's a 0.2 second delay while something is loading - people will notice such situations. This is why audio is often kept running on a separate thread, isolated from the other activities and just getting rapid notifications on what preloaded clips should be played.

\$\endgroup\$
5
\$\begingroup\$

Any situation where a player causes the sound (music games, guns in FPS) will need very low delay as the player has sent an impulse to make it happen at that moment, so as with a musician hearing their instrument delayed, will be particularly aware of very small delays. Sound engineers fret about recording delays below 5 mSec ruining the "groove"

The Journal of the American Academy of Audiology states that people (not just musicians), when listening to their own voice delayed, are aware of delays as short as 3mSec, and a delay of longer than 10 mSec was objectionable 90% of the time .

Humans use the time delay between their ears for directional information, and thus must be able to process and extract information from delays below 1mSec

The 185.19 ms quoted above is irrelevant as it is referring to a leading sound error, and anyhow, to what people found acceptable when passively watching a film, not actively engaged in a game.

\$\endgroup\$
5
\$\begingroup\$

The accepted answer here mainly discusses perception of audio synchronization in passively watching video. In these cases, the audience can't easily pin down exactly when the audio should play except by attending to telltale signs in the video. This means they have limited anticipation of the sound.

There are two important cases in games where this low-anticipation assumption doesn't hold:

  1. When the player themselves caused the sound (as SamB points out), so from the moment they form the intent to press the button they know exactly when they expect to hear the sound.

  2. When the sound is supposed to land on on a periodic beat, as in music games or anything with a ticking timer/counter, this rhythm allows the player to anticipate the next sound and notice if plays out of time.

In this talk from GDC 2013, Mathieu Pavageau argues that players can perceive differences in synch precision above about 5ms, much less forgiving than the examples from lip synching would suggest. Check out the sections "Time Perception Examples" and "Example of Ubisoft Games" to hear it for yourself. You can hear the Rayman Origins menu doesn't sound "laggy" per se when synched within 16 ms (video frame), but when synched within 5 ms it sounds noticeably better & tighter.

Pavageau advocates using a low-level audio callback to get this kind of sub-frame precision if you want tight-feeling rhythmic gameplay of this variety.

\$\endgroup\$
2
\$\begingroup\$

For games which require a person to react to audio cues, every millisecond by which the sound is delayed will cause the person's response to likewise be delayed. Someone who is simply watching a movie or cut-scene may not notice too much if the audio and video aren't exactly in sync, but it's often important and sometimes critical that audio be in sync with what the player is expected to be doing.

\$\endgroup\$
-1
\$\begingroup\$

In theory, everything above 50ms can be noticeable when it comes with it's association to pictures, at 25ms you can start hearing a sound and its delay as two separated sounds, so I would say I'll highly recommend you stay under 50ms and if you can even stay at something from 5ms to 15ms it would be really nice.

I hope this will help you!

https://en.wikipedia.org/wiki/Delayed_Auditory_Feedback

\$\endgroup\$
5
  • \$\begingroup\$ This answer doesn't add any new advice not already present in the existing answers, so it's in danger of coming off as just a plug or advertisement for your company's contact info. StackExchange is not intended for promoting services, so I'd recommend removing that portion (folks can still look you up by your username), and adding more detail about why you'd recommend particular timings beyond what's covered in the existing answers. \$\endgroup\$
    – DMGregory
    Commented Feb 13, 2017 at 5:03
  • \$\begingroup\$ None of the answers that we saw were right to us, we are a team of sound engineer and acoustics are the first thing we learned. some answers were saying over 100ms others where saying -100s & +85s how that is even an answer ? -50ms or +50ms it still 50ms of difference betwen the action & the sound. we are only trying to help if giving our email is that offensive we'll remove it. \$\endgroup\$ Commented Feb 13, 2017 at 5:09
  • \$\begingroup\$ See, for example, Peteris's answer from 3 years ago, which gives the same absolute upper cap of 50 ms and recommends lower as this answer does, or the reference to the Mathieu Pavageau talk recommending 5ms as the ideal target. That seems to cover the gamut of what's contained in this answer, unless you want to expand on the recommendations? For example, if there are details from the Wikipedia link that you feel are relevant, it's good practice to at least summarize them in the text of the answer (in case the linked page changes in the future). \$\endgroup\$
    – DMGregory
    Commented Feb 13, 2017 at 5:21
  • \$\begingroup\$ Ah sorry about that we didn't read all the answers we juste skip some few ones then we said what we know and enforce it with a wikipedia link, we are still newbies to the forum, we're trying to give some help on sound related problems but we didn't found much haha \$\endgroup\$ Commented Feb 13, 2017 at 5:25
  • \$\begingroup\$ No worries. Coaching new users is one of the reasons these comments exist. :) You'll get the hang of StackExchange answers pretty quickly - it just means thinking of them as long-term reference resources, rather than forum replies. \$\endgroup\$
    – DMGregory
    Commented Feb 13, 2017 at 5:27

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .