3
$\begingroup$

Is it possible that a call to ID3D11DeviceContext::Map on a dynamic vertex buffer with D3D11_MAP_WRITE_DISCARD causes my application to wait for something ?

I'm using a dynamic buffer to draw my UI and I noticed a strange behavior when drawing a complex UI after a time consuming effect such as ssao:

START FRAME
...

Draw (full-screen quad with ssao shader);

for(int i=0; i<N; i++)
{
    Update dynamic buffer;
    Draw;
}

...

END FRAME

While updating the dynamic buffer usually takes a negligible amount of time, if N is high enough(~100) for some i it looks like the Map command takes some time , proportional to the GPU time of the ssao shader, as if there was some form of synchronization.

Is this possible? Why?

$\endgroup$

1 Answer 1

1
$\begingroup$

Yes, Map can force synchronization in some situations.

In D3D11, the driver handles GPU command recording, submission and synchronization. When you make D3D11 calls, generally they don't get submitted to the GPU immediately; rather, the driver tries to batch up a large amount of work into a command buffer and then submit it all at once (because submission is expensive).

What this means for your vertex buffer is that every time you use MAP_WRITE_DISCARD, the driver is internally "renaming" your vertex buffer and giving you a new, fresh buffer into which to write. It can't actually overwrite the previous buffer because the work that uses that buffer hasn't executed on the GPU yet. So each time you do a discard, you're effectively allocating more memory, and the driver is buffering up all those vertex buffers and draw commands for the GPU to access at a later time.

The driver uses various hints/heuristics to determine that it's time to submit work to the GPU. Usually it submits when you call Flush or Present, but probably another heuristic it uses is the amount of memory allocated that's waiting for the GPU to consume it before it can be reclaimed. So at a certain point, when you've done a large enough number of discards, the driver decides it had better submit the pending work and wait for it so that it can reclaim the memory for all those buffers.

To work around this, you can help out the driver by managing the memory yourself. Allocate a vertex buffer large enough to hold all N draw calls' worth of data, fill it all in with a single Map, then do the draw calls using the StartVertexLocation parameter to point each one at its corresponding data within the buffer. Or even better, combine all the draw calls into one large one (assuming you don't need to change states between them).

$\endgroup$
2
  • $\begingroup$ Thanks a lot for the detailed answer! However, I don't fully understand the proposed solution: if I were to batch all the N Map commands into a single Map flagged with MAP_WRITE_DISCARD shouldn't I get the exact same result (synchronization) if the heuristic for submission is based on the amount of memory and the dynamic buffer I was mapping was used in some pending command? Is your solution based on using a "brand new" buffer/a buffer not involved in some pending work for the N draw calls? $\endgroup$ Commented Sep 15, 2022 at 7:44
  • 1
    $\begingroup$ @leoneruggiero It might be based on some combination of the number of individual buffers and the total amount of memory, and maybe other factors as well - impossible to tell what the exact heuristic is. However, doing a larger buffer and using MAP_WRITE_DISCARD once per frame should be a better hint to the driver about what you're doing. $\endgroup$ Commented Sep 16, 2022 at 18:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.