4
$\begingroup$

I am building a volumetric ray marching shader in HLSL in Unreal Engine, based on Ryan Brucks’ work: https://shaderbits.com/blog/creating-volumetric-ray-marcher

I am trying to add some additional noise to the shader using a tiling 3D noise texture, and I’ve noticed something which I don’t understand, and can’t find an explanation for - probably because I’m not exactly sure what I should be searching for to get to the bottom of the problem.

The position inside the ray marched volume is normalized, so I can just use that value to read into the 3D texture. When using this value directly, I get decent performance. But as soon as I start multiplying the value in order to increase the UV tiling (and frequency of the noise) the performance drops seemingly linearly with the amount of tiling. The thing that confuses me about this is that I’m not changing the number of ray march steps or any other value, each step is simply taking a read of the 3D texture at different points.

The only thing I can imagine is that this is something fundamental to do with how texture reads work, or something to do with GPU caching, and that reading from the texture in this way does so in a manner that is somehow less efficient.

So I would like to understand the nature of the problem, but also understand if there is some way around it.

$\endgroup$

1 Answer 1

2
$\begingroup$

It sounds like you are experiencing a loss of texture cache hit rate on your 3D texture.

As with any modern processor, GPUs have a cache hierarchy. Texture units typically read from a local L1 cache, then a global L2 cache, then from DRAM. With a texture, each cache line will typically contain several adjacent texels. So, when you sample a texture at a certain location, you bring a small region surrounding that sample point into cache. If other texture samples (e.g. the corresponding samples in other nearby fragments, or later sequential samples within the same shader) hit the regions that are already in cache, they will process a lot faster than if they have to descend the cache hierarchy to get the data.

So, sampling the texture at spread-out sample points (more than a few texels apart from each other) will likely be a lot slower than sampling them at closely spaced points. This will be exacerbated for 3D textures as opposed to 2D ones, because there's a lot more texels within a given distance of an initial sample point; and thus the surrounding region brought in with a cache line will tend to be smaller in diameter for 3D than for 2D.

As a concrete example, supposing your texture is 8 bits per texel and a cache line is 64 bytes, a single cache line could fit an 8×8 region of a 2D texture, but only a 4×4×4 region of a 3D texture. So, as soon as your texture samples got to be further than about 4 texels from each other, you would experience a performance cliff.

In ray marching, the samples will spread farther from each other the farther they are from the camera, so this perf cliff could occur at different distances depending on the texture scaling. That might explain the proportional performance drop you saw.

This effect is one of the reasons why we have mipmaps. It's not just to prevent aliasing; it's also for performance. Mipmapping tries to match the texel size to the distance between adjacent texture samples, so it helps maintain higher cache hit rates as the texture gets farther away from the camera (and thus the distance in UV space between adjacent samples grows larger).

If you don't have mipmaps on your noise texture, that would be the first thing I would try. Also, you can try narrowing the texture format (e.g. use 8 bits instead of 16, or 1 channel instead of 4) or using a compressed texture if possible, which will improve the number of texels per cache line.

An alternative approach would be to implement a noise function instead of a texture. This can be quite fast, and the cost of the math instructions may be able to be hidden behind other texture fetches in the shader anyway.

$\endgroup$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.