10

I've heard fetching a texture is quite expensive operation. But how much? like ten-times of arithmetic multiplication.

For instance, there are 3D-Look-up table technique to process an image which requires 3d texture fetch once:

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter24.html

Even if the conversion can be achieved with only some matrix and vector product in shader, can I expect that 3D-LUT is still useful in terms of performance?

3D-LUT vs matrix/vector product is just an example. What I want to know is general way to estimate the overhead by texture fetch before measuring the exact running time. Or, is there something like 'cheat-sheet' for GLSL overhead?

4
  • 1
    This will vary per vendor, per card, per OS. You won't get a single good answer. If you have particular hardware you're working with, you should profile it there. Is there a reason to think that texture fetches are a bottleneck for your app? In my experience, the overhead of setting up textures to be processed dwarfs the overhead of shaders. Commented Sep 14, 2013 at 15:25
  • 3
    I'll say this much right now - GPU Gems 2 is a very old book. Back when this was written, normalization cubemaps were still a legitimate thing on some hardware (yes, sampling from a cube map was quicker than normalizing vectors using arithmetic back then on some hardware). These days instructions are a lot cheaper than memory fetches, the same is true in the CPU world - nobody uses lookup tables for trig. functions on modern CPUs anymore. Whatever the case, it really depends on your use-case, if you can substitute enough instructions with 1 lookup, you might have a win. Commented Sep 14, 2013 at 18:40
  • @user1118321 I don't target specific hardware. More precisely, all devices by major vendor which supports OpenGL 2.0 or higher are my target. Like I said, what I really want to know is general method to estimate overheads from GLSL code. If there's no general answer like you said, it means that it is useles thing to test in a specific hardware, and I have to test every hardware/driver combination which is not realistic though.
    – slyx
    Commented Sep 15, 2013 at 9:17
  • @AndonM.Coleman I didn't know that. Taking account into cache miss, it seems that dynamic texture fetch could be extremely bad idea. I have some LUT to find weight function for interpolation, but maybe I should rethink it. Thank you.
    – slyx
    Commented Sep 15, 2013 at 10:51

1 Answer 1

11

It always depends. One concern with texture fetches is memory bandwidth and - quite related to that - cache efficiency. The caching is optimized for the case where neighboring fragments access neighboring texels. I did some benchmarks, and in such a scenario, I did not even notice any difference between sampling with nearest or sampling with bilinear filter and directly using texelFetch.

In another scenario where I used a 3D texture as color lookup, the cache efficiency became a major concern. The image content actually influences the performance significantly. My benchmark scenario was post-processing a 1920x1080 frame, and with a quite cache-friendly content (screenshot of ms office), I measured about 0.35ms for the operation, while with an image containing random noise, processing time went up to ~4ms (with bilinear filtering) or ~2ms (nearest/texelFetch).

Of course, this depends very much on your specific scenario and the hardware, so the only advice I can give is: benchmark/profile it.

3
  • 1
    Actually, I've already tested with my codes and I couldn't find any difference. So I think memory bandwidth is not a bottleneck in my GPU at least. But as you said, the concrete result depends device and driver. That's why I want to know 'general way to estimate' even if it is not exact. Anyway, I see now cache miss is a big issue for texture fetching from your answer. Thank you.
    – slyx
    Commented Sep 15, 2013 at 10:55
  • @xylosper: The really big thing you have to consider when dealing with memory fetches in GPUs is that their memory is considerably higher latency than CPUs; they trade latency for bandwidth. Usually GPUs hide this fact by scheduling other shaders to run while memory is being fetched, and this works really well for highly-parallel data tasks like vertex shading, but when you start fetching texture data from random/dependent locations it can really start to expose the underlying latency. So as you say, cache is very important, even more important than in CPUs. Commented Sep 15, 2013 at 13:44
  • Using nearest versus bilinear shouldn't affect speed much since modern hardware fetches 4 texels regardless and lerping (if it's not done in hardware), is practically free.
    – geometrian
    Commented Oct 28, 2013 at 4:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.