64
\$\begingroup\$

I've been researching processors and graphics cards, and I discovered that GPUs are way faster than CPUs. I read in this one article, a 2-year-old Nvidia GPU outperformed a 3.2GHz Core I7 Intel processor by 14 times in certain circumstances. If GPUs are that fast, why don't developers use them for every function in a game? Is it possible for GPUs to do anything other than graphics?

\$\endgroup\$
11
  • 18
    \$\begingroup\$ If you're in a game where you are offloading everything to the GPU, and your CPU is hardly doing anything, then you can get a performance increase by putting some of the load back on the CPU. \$\endgroup\$
    – Tetrad
    Commented Sep 9, 2011 at 19:07
  • 3
    \$\begingroup\$ your GPU its maybe better than your CPU, but I dont think your video card is better than your mainboard (and I will not compare the OS to the driver lol) \$\endgroup\$
    – e-MEE
    Commented Sep 9, 2011 at 21:11
  • 28
    \$\begingroup\$ GPU is faster than a CPU is a false myth that many people are led to believe after seeing benchmarks based on problems that are specifically geared for GPU (this class of problems are called "embarrassingly parallel problems"), see my answer on this SuperUser question: Why are we still using CPUs instead of GPUs? \$\endgroup\$
    – Lie Ryan
    Commented Sep 9, 2011 at 23:14
  • 5
    \$\begingroup\$ This is a very nice question and answer to this problem - Why aren't we programming on the GPU? \$\endgroup\$
    – Tomas
    Commented Sep 10, 2011 at 6:54
  • 5
    \$\begingroup\$ One benefit is that every computer has a CPU :) \$\endgroup\$
    – Tim Holt
    Commented Sep 10, 2011 at 16:37

10 Answers 10

53
\$\begingroup\$

"I've read that F1 cars are faster than those we drive on the streets... why people don't use F1 cars then?" Well... The answer to this question is simple: F1 cars can't break or turn as fast as most cars do (the slowest car could beat an F1 in that case). The case of GPUs is very similar, they are good at following a straight line of processing, but they are not so good when it comes to choosing different processing paths.

A program executed in te GPU makes sense when it must be executed many times in parallel, for instance when you have to blend all the pixels from Texture A with pixels from Texture B and put them all in Texture C. This task, when executed in a CPU, would be processed as something like this:

for( int i =0; i< nPixelCount; i++ )
     TexC[i] = TexA[i] + TexB[i];

But this is slow when you have to process a lot of pixels, so the GPU instead of using the code above, it just uses the next one:

     TexC[i] = TexA[i] + TexB[i];

and then it populates all the cores with this program (essentially copying the program to the core), assigning a value for i for each. Then is where it comes the magic from the GPU and make all cores execute the program at the same time, making a lot of operations much faster than the linear CPU program could do.

This way of working is ok when you have to process in the same way a very lot of small inputs, but is really bad when you have to make a program that may have conditional branching. So now let's see what the CPU does when it comes to some condition check:

  • 1: Execute the program until the first logical operation
  • 2: Evaluate
  • 3: Continue executing from the memory address result of the comparison (as with a JNZ asm instruction)

This is very fast for the CPU as setting an index, but for the GPU to do the same, it's a lot more complicated. Because the power from the GPU comes from executing the same instruction at the same time (they are SIMD cores), they must be synchronized to be able to take advantage of the chip architecture. Having to prepare the GPU to deal with branches implies more or less:

  • 1: Make a version of the program that follows only branch A, populate this code in all cores.
  • 2: Execute the program until the first logical operation
  • 3: Evaluate all elements
  • 4: Continue processing all elements that follow the branch A, enqueue all processes that chose path B (for which there is no program in the core!). Now all those cores which chose path B, will be IDLE!!--the worst case being a single core executing and every other core just waiting.
  • 5: Once all As are finished processing, activate the branch B version of the program (by copying it from the memory buffers to some small core memory).
  • 6: Execute branch B.
  • 7: If required, blend/merge both results.

This method may vary based on a lot of things (ie. some very small branches are able to run without the need of this distinction) but now you can already see why branching would be an issue. The GPU caches are very small you can't simply execute a program from the VRAM in a linear way, it has to copy small blocks of instructions to the cores to be executed and if you have branches enough your GPU will be mostly stalled than executing any code, which makes no sense when it comes when executing a program that only follows one branch, as most programs do--even if running in multiple threads. Compared to the F1 example, this would be like having to open braking parachutes in every corner, then get out of the car to pack them back inside the car until the next corner you want to turn again or find a red semaphore (the next corner most likely).

Then of course there is the problem of other architectures being so good at the task of logical operations, far cheaper and more reliable, standarized, better known, power-efficient, etc. Newer videocards are hardly compatible with older ones without software emulation, they use different asm instructions between them even being from the same manufacturer, and that for the time being most computer applications do not require this type of parallel architecture, and even if if they need it them, they can use through standard apis such as OpenCL as mentioned by eBusiness, or through the graphics apis. Probably in some decades we will have GPUs that can replace CPUs but I don't think it will happen any time soon.

I recommend the documentation from the AMD APP which explains a lot on their GPU architecture and I also saw about the NVIDIA ones in the CUDA manuals, which helped me a lot on understanding this. I still don't understand some things and I may be mistaken, probably someone who knows more can either confirm or deny my statements, which would be great for us all.

\$\endgroup\$
6
  • 6
    \$\begingroup\$ weird analogy but it's a good point that the fastest isn't always the fastest. \$\endgroup\$
    – Lie Ryan
    Commented Sep 10, 2011 at 1:18
  • 1
    \$\begingroup\$ Thanks! I think it's an interesting topic because it binds many game programming concepts to the way the hardware works, which is somewhat forgotten in the land of today's high level languages. There are some other things that I would like to add but writing the answer took some time already so I will try to update it later, such as the "protected mode" capabilities of CPUs, memory bus speed, etc. but I hope this clarify some technical drawbacks of executing everything in the gpu. \$\endgroup\$ Commented Sep 10, 2011 at 3:14
  • 6
    \$\begingroup\$ The analogy would be far better if it were accurate. F1 cars have tremendous braking abilities which allow them to maintain high speed further into a curve instead of start to brake well in advance. Cornering at high speed is also better thanks to high downforces, although the turning radius probably isn't great for parking lots. Better reasons might include the lack of storage space, rear-view mirror, air conditioning, cruise control, protection from the elements, passenger seats, suspension and ground clearance to handle poor roads, or various other things common in passenger vehicles. \$\endgroup\$ Commented Sep 10, 2011 at 16:42
  • 5
    \$\begingroup\$ @Pablo Ariel I'm responding to the statement: "F1 cars can't break or turn as fast as most cars do". You suggest that F1 cars can only accelerate in a straight line, and aren't very good in turns or during deceleration. But F1 cars actually can brake far more quickly than "most cars", and are excellent at high-speed cornering. \$\endgroup\$ Commented Sep 10, 2011 at 23:16
  • 5
    \$\begingroup\$ The analogy is more accurate if you think in Dragsters rather than F1 cars \$\endgroup\$ Commented Jun 27, 2013 at 16:43
32
\$\begingroup\$

GPUs are very good a parallel tasks. Which is great... if you're running a parallel tasks.

Games are about the least parallelizable kind of application. Think about the main game loop. The AI (let's assume the player is handled as a special-case of the AI) needs to respond to collisions detected by the physics. Therefore, it must run afterwards. Or at the very least, the physics needs to call AI routines within the boundary of the physics system (which is generally not a good idea for many reasons). Graphics can't run until physics has run, because physics is what updates the position of objects. Of course, AI needs to run before rendering as well, since AI can spawn new objects. Sounds need to run after AI and player controls

In general, games can thread themselves in very few ways. Graphics can be spun off in a thread; the game loop can shove a bunch of data at the graphics thread and say: render this. It can do some basic interpolation, so that the main game loop doesn't have to be in sync with the graphics. Sound is another thread; the game loop says "play this", and it is played.

After that, it all starts to get painful. If you have complex pathing algorithms (such as for RTS's), you can thread those. It may take a few frames for the algorithms to complete, but they'll be concurrent at least. Beyond that, it's pretty hard.

So you're looking at 4 threads: game, graphics, sound, and possibly long-term AI processing. That's not much. And that's not nearly enough for GPUs, which can have literally hundreds of threads in flight at once. That's what gives GPUs their performance: being able to utilize all of those threads at once. And games simply can't do that.

Now, perhaps you might be able to go "wide" for some operations. AIs, for instance, are usually independent of one another. So you could process several dozen AIs at once. Right up until you actually need to make them dependent on each other. Then you're in trouble. Physics objects are similarly independent... unless there's a constraint between them and/or they collide with something. Then they become very dependent.

Plus, there's the fact that the GPU simply doesn't have access to user input, which as I understand is kind of important to games. So that would have to be provided. It also doesn't have direct file access or any real method of talking to the OS; so again, there would have to be some kind of way to provide this. Oh, and all that sound processing? GPUs don't emit sounds. So those have to go back to the CPU and then out to the sound chip.

Oh, and coding for GPUs is terrible. It's hard to get right, and what is "right" for one GPU architecture can be very, very wrong for another. And that's not even just switching from AMD to NVIDIA; that could be switching from a GeForce 250 to a GeForce 450. That's a change in the basic architecture. And it could easily make your code not run well. C++ and even C aren't allowed; the best you get is OpenCL, which is sort of like C but without some of the niceties. Like recursion. That's right: no recursion on GPUs.

Debugging? Oh I hope you don't like your IDE's debugging features, because those certainly won't be available. Even if you're using GDB, kiss that goodbye. You'll have to resort to printf debugging... wait, there's no printf on GPUs. So you'll have to write to memory locations and have your CPU stub program read them back.

That's right: manual debugging. Good luck with that.

Also, those helpful libraries you use in C/C++? Or perhaps you're more of a .NET guy, using XNA and so forth. Or whatever. It doesn't matter, since you can't use any of them on the GPU. You must code everything from scratch. And if you have an already existing codebase, tough: time to rewrite all of that code.

So yeah. It's horrible to actually do for any complex kind of game. And it wouldn't even work, because games just aren't parallel enough for it to help.

\$\endgroup\$
1
  • \$\begingroup\$ Is it possible for GPU programming to become as accessible as programming for the CPU or are there limitations preventing this? \$\endgroup\$ Commented Apr 3, 2021 at 12:11
21
\$\begingroup\$

Why is not so easy to answer -- it's important to note that GPUs are specialized processors which are not really intended for generalized use like a regular CPU. Because of this specialization, it's not surprising that a GPU can outperform a CPU for the things it was specifically designed (and optimized) for, but that doesn't necessarily mean it can replace the full functionality and performance of a generalized CPU.

I suspect that developers don't do this for a variety of reasons, including:

  • They want the graphics to be as fast and highest quality possible, and using valuable GPU resources could interfere with this.

  • GPU-specific code may have to be written, and this will likely introduce additional complexity to the overall programming of the game (or application) at hand.

  • A GPU normally doesn't have access to resources like network cards, keyboards, mice, and joysticks, so it's not possible for it to handle every aspect of the game anyway.

In answer to the second part of your question: Yes, there are other uses. For example, projects like SETI@Home (and probably other BOINC projects) are using GPUs (such as those by nVidia) for high-speed complex calculations:

  Run SETI@home on your NVIDIA GPU
  http://setiathome.berkeley.edu/cuda.php

(I like your question because it poses an interesting idea.)

\$\endgroup\$
0
18
\$\begingroup\$

CPUs are more flexible, it's generally easier to program them, they can run single threads a lot faster.

While modern GPUs can be programmed to solve pretty much any task they only gain a speed advantage when they can utilize their parallel architecture. This is usually the case with highly repetitive "simple" tasks. A lot of the code we write is branching too unpredictably to run efficiently on a GPU.

On top of all this you could end up spending a lot of time optimizing the code for different graphics chips. While OpenCL is available to make the same code run across a lot of different graphics chips you'll trade some of the speed advantage for this luxury.

From a game programmer perspective, we'd generally also want our game to run on computers with lesser graphics cards. Some of the integrated chips don't have the required programmability, but if they do they are so slow that they won't beat the processor by a very big margin, even for the kind of jobs they ought to be good at. And of course if you did tap into a low end GPU for a game you would take dearly needed processing power from the graphics rendering.

Indeed the prospects are great, but when you are making a game rather than cracking passwords the practical issues in most cases outweigh the benefits.

\$\endgroup\$
6
\$\begingroup\$

GPU are very difficult to program. You should search howto to sort a list on a GPU. Many thesis have search to do it.

Use a CPU with one thread is easy, use multi-threads is more difficult, use many computers with parallel library as PVM or MPI is hard and use a gpu is the hardest.

\$\endgroup\$
4
\$\begingroup\$

Other than what Randolf Richardson answered there are some certain functionalities that GPU processors can't handle by themselves. For example, some of the graphics memory management commands are processed by the CPU since the GPU can't handle them.

And there is one other big reason, the GPU is designed for multithreaded calculations. This means GPU makers can easily add cores whenever they want to increase the computational power. But there are many tasks that can't be divided in smaller problems like computing the n'th number in the Fibonacci series. In these situations the CPU is much faster since it is more optimized for single-threaded tasks.

\$\endgroup\$
4
\$\begingroup\$

There are a lot of answers suggesting that GPUs are only faster because they handle tasks in parallel. This is overstating the issue a little. GPUs can be more efficient for other reasons, such as being able to have more restrictive memory access, not having to support as many data types, being able to have a more efficient instruction set, etc. Early GPUs could still only draw 1 pixel at a time, but it was the fact that they could do 1 every cycle that was important.

The real difference is because they are 2 different types of machine that are customised to perform well on different categories of task which seem similar but are actually quite different. It's like comparing an aeroplane to a car. The aeroplane has a much higher top speed but has more restrictions on how it can be used. On the occasions where you can make the same journey with either kind, the aeroplane seems superior.

\$\endgroup\$
3
  • \$\begingroup\$ The analogy about the aeroplane is a very good one (+1), but with regard to CPUs supporting different data types that's actually more of a higher-level-language concept as CPUs (at least in the Intel space) tend to just deal with data in very basic forms (e.g., bits, bytes, words, dwords, etc.). There are some tight-loop instructions to scan or copy data that is terminated with a zero byte, but the data in these instances isn't really recognized by the CPU as being a particular type (other than being a zero-terminated chunk of data in the context of these loops). \$\endgroup\$ Commented Sep 10, 2011 at 16:25
  • \$\begingroup\$ @Randolf: CPUs have different instructions and registers that deal with different low level data types (eg. signed vs. unsigned, integral vs. floating point). This is the case on 8086 and indeed most modern architectures, and it doesn't come entirely for free. \$\endgroup\$
    – Kylotan
    Commented Sep 10, 2011 at 20:35
  • \$\begingroup\$ I'm sure they still do a lot of linear processing in the underlying architecture. From the programming side it takes just an instruction to the GPU but the cores don't execute exactly in parallel because of their dependence on other hardware wich is not parallel such as reading from memory, probably the GPU can provide data to a single core at a time. \$\endgroup\$ Commented Sep 12, 2011 at 17:37
3
\$\begingroup\$

Developers do use GPUs for all the functions they're good at. They use CPUs for all the functions they're good at. What makes you think they don't?

GPUs are good at tasks that can be massively paralellized and require massive amounts of computation with either low memory requirements or high temporal correlation with only small amounts of decision making. This includes rendering images, physics simulations (particles, collision, cloth, water, reflection) and so on. So this is precisely what modern games use the GPU for.

CPUs are good at tasks that do not parallelize well and require massive amounts of decision making. They can tolerate high memory requirements even with only moderate temporal correlation. This includes artificial intelligence, user interface, disk and network I/O, and so on. So this is precisely what modern games use the CPU for.

\$\endgroup\$
1
\$\begingroup\$

Readback is another reason I can think of to occasionally prefer the CPU. Not in terms of bandwidth (as GPU->CPU bandwidth is not so much an issue on modern hardware) but in terms of stalling the pipeline. If you need to fetch back results from a computation and do something interesting or useful with them, using the GPU is not a wise choice (in the general case - there will be special cases where it can remain appropriate) as reading back will always require the GPU to stop whatever it is doing, flush all pending commands, and wait for the readback to complete. This can kill performance to the extent that it not only wipes out the benefit of using the GPU, but may actually be considerably slower.

\$\endgroup\$
0
\$\begingroup\$

This is an old thread, but this recently published paper may answer this question. This paper, published in ACM Computing Surveys 2015 shows that each of CPUs and GPUs have their unique advantages and hence, this paper makes a case for moving away from "CPU vs GPU debate" to "CPU-GPU collaborative computing" paradigm.

A Survey of CPU-GPU Heterogeneous Computing Techniques

\$\endgroup\$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .