28

What is the best method to copy pixels from texture to texture?

I've found some ways to accomplish this. For instance, there's a method glCopyImageSubData() but my target version is OpenGL 2.1, so I cannot use it. Also, because the performance is very important, glGetTexImage2D() is not an option. Since I'm handling video frames as texture, I have to make copies about 30~60 times per second.

Available options which I found are next:

  1. create fbo for source texture and copy it to destination texture using glCopyTexSubImage2D().
  2. create fbos for source and destination textures and blit the fbos.
  3. create fbo for destination texture and render source texture to fbo.

You can ignore the cost of creation of fbo because fbo will be created only once.

Please just don't post something like 'it depends. do your benchmark.'. I'm not targeting only one GPU. If it depends, please, please let me know how it depends on what.

Furthermore, because it is very difficult to measure timing of OpenGL calls, what I want to know it not a quantitative result. I need some advices about which method I should avoid.

If you know better method to copy textures, please let me know it too.

Thank you for reading.

5
  • I think that giving the GL2.1 limitation (FBOs aren't a core feature there, too, but might be widely available as extensions), there are no other good alternatives to the ones you listed. I also think that the performance deltas between all 3 options will be rather small.
    – derhass
    Commented Jun 1, 2014 at 15:43
  • Timing GL calls is actually easy in modern GL. If your driver supports timer queries (most 2.1 implementations will not), you can insert a query into the command queue before your command and finish the query after your command. Wait a few frames to get the result of that query and you will have a very accurate measure of the amount of time it took. You cannot use it on older drivers, but if you just want a general idea of the performance differences between each call using a modern implementation this could help. Commented Jun 1, 2014 at 18:06
  • You can get the result of the timer query immediately if you want... but that will force a pipeline stall. So it depends on what you are using the timer query for in the long-run. If you want to do performance measurement with minimal impact on run-time performance, only query the result after it becomes available. If you want to know the cost of a command immediately, go ahead and get the result without doing some other work while it becomes available (in other words, force a CPU/GPU sync). In your case, I think the second approach will work fine. Commented Jun 1, 2014 at 18:33
  • @derhass Good to know that I've found all method! Thank you.
    – slyx
    Commented Jun 2, 2014 at 1:21
  • @AndonM.Coleman I didn't know that. In fact, I'm using Qt and Qt already provides a wrapper class for that! I'm testing with it now. I'll update later the result. Thanks.
    – slyx
    Commented Jun 2, 2014 at 1:22

2 Answers 2

39

Since I didn't know that timer query, I didn't think of benchmarking. Now, I can do my own benchmarks. I've measured tming for each 100 operations and repeated five times. The cost to create FBOs is not included.

- S=source texture, D=destination texture, SF=FBO of S, DF=FBO of D
- operation=copying texture to texture
- op/s = how many operations for one second(average), larger is better
  1. Create DF and render S to DF using simple passthrough shader

    • 945.656op/s (105.747ms for 100 operations)
    • 947.293op/s (105.564ms for 100 operations)
    • 949.099op/s (105.363ms for 100 operations)
    • 949.324op/s (105.338ms for 100 operations)
    • 948.215op/s (105.461ms for 100 operations)
  2. Create SF and use glCopyTexSubImage2D() for D

    • 937.263op/s (106.694ms for 100 operations)
    • 940.941op/s (106.277ms for 100 operations)
    • 941.722op/s (106.188ms for 100 operations)
    • 941.145op/s (106.254ms for 100 operations)
    • 940.997op/s (106.270ms for 100 operations)
  3. Create DF and SF and use glBlitFramebuffer()

    • 828.172op/s (120.748ms for 100 operations)
    • 843.612op/s (118.538ms for 100 operations)
    • 845.377op/s (118.290ms for 100 operations)
    • 847.024op/s (118.060ms for 100 operations)
    • 843.303op/s (118.581ms for 100 operations)
  4. Create DF and SF and use glCopyPixels()

    • 525.711op/s (190.219ms for 100 operations)
    • 523.396op/s (191.060ms for 100 operations)
    • 537.605op/s (186.010ms for 100 operations)
    • 538.560op/s (185.680ms for 100 operations)
    • 553.059op/s (180.813ms for 100 operations)

Performance comparision

passthrough shader ~ glCopyTexSubImage2D > glBlitFramebuffer >> glCopyPixels

So, simple passthrough shader shows the best performance to copy textures. glCopyTexSubImage2D is slightly slower than passthrough shader. fbo-blitting is fast enough but worse than shader and glCopyTexSubImage2D. glCopyPixels, from which I didn't expected good result, shows the worst performance as my expectation.

2
  • 4
    What about copying depth attachments?
    – Tara
    Commented Feb 3, 2016 at 19:26
  • which hardware do you use? Commented Feb 18, 2020 at 15:59
2

We ultimately ended up going with rendering a quad into the target; when using minimal shaders, lowp precision etc performance difference between the different methods that use the GPU to do the blit is slight, and this approach gives the most flexibility.

However, if you can find a way of avoiding operations that only copy entirely - if you can change an operation that mutates one of your copies into an operation that reads the original, applies the mutation and generates a new copy all in one pass - that will of course be much faster.

1
  • 1
    I'm not sure I understood your answer. In first paragraph, may I think you mean the 3rd option with minimal shader is most flexible and easy-to-use?
    – slyx
    Commented Jun 2, 2014 at 1:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.