4
$\begingroup$

I am currently experimenting with unbounded descriptor arrays and ultimately want to use bindless descriptors. I wrote a test application, that draws 100.000 simple object instances, where the transform for each object is pulled from a buffer array.

#pragma pack_matrix(row_major)

struct VertexData 
{
    float4 Position : SV_POSITION;
}; 

struct VertexInput
{
    float3 Position : POSITION;
};

struct CameraData
{
    float4x4 ViewProjection;
};

struct InstanceData
{
    float4x4 Transform;
};

ConstantBuffer<CameraData> camera : register(b0, space0);
StructuredBuffer<InstanceData> instanceBuffer[] : register(t0, space1);

VertexData main(in VertexInput input, uint id : SV_InstanceID)
{
    VertexData vertex;

    InstanceData instance = instanceBuffer[id].Load(0);
    float4 position = mul(float4(input.Position, 1.0), instance.Transform);
    vertex.Position = mul(position, camera.ViewProjection);
 
    return vertex;
}

I am using DXC to compile this shader to both, SPIR-V and DXIL. Under Vulkan, the draw call takes roughly 2ms, however under D3D, it takes around 10ms. I tried eliminating the buffer access by hashing the instance id and generating the transform for each object this way and (as expected) the draw calls were equally fast. I then tried using a ByteAddressBuffer instead of a StructuredBuffer and it did not show any difference in Vulkan (which is expected, since they are essentially the same in SPIR-V), but in D3D performance dropped even further to around 40ms per draw call. Now I am wondering what could cause such behavior. I tried looking at the IL, but I can't really tell how this compares to each other.

What could cause this discrepancy? Is there any way to profile this? All debuggers I tried just showed me the GPU times, which I mentioned above, but not more. Also some NSight Graphics features are not supported on my GPU (1080 Ti), so I could not test this compeltely.

$\endgroup$

1 Answer 1

1
$\begingroup$

Are you sure the buffer isn't just defined as a host visible buffer? If you allocate it in CPU space it might have to transfer it a lot, I use ByteAddressBuffer a lot and it's not that slow for me. So maybe it's just not properly allocated.

$\endgroup$
1
  • $\begingroup$ Hi! Thanks for the answer. The buffer is allocated at the default heap (D3D12_HEAP_TYPE_DEFAULT). For reference: D3D12_RESOURCE_DIMENSION_BUFFER, Width = elements * elementSize, D3D12_TEXTURE_LAYOUT_ROW_MAJOR and DXGI_FORMAT_UNKNOWN... If I did not forget anything, since it's been a while. Still haven't figured this one out though! Also note that the slowdown only occurs under D3D, not under Vulkan (since afair host visible is Vk terminology). $\endgroup$
    – Carsten
    Commented Oct 4, 2023 at 12:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.