Skip to main content

Questions tagged [gpgpu]

For questions related to the usage of graphics processing units for computation outside of the traditional graphics pipeline but still somewhat related to computer graphics.

1 vote
2 answers
302 views

Has general purpose GPU computing been used before compute shaders were available?

Today we have tools such as Nvidia's CUDA and OpenCL to perform general purpose computing on the GPU (GPGPU). Seeing that traditional shaders are specifically used for generating graphics by filling a ...
Entangled Superposition's user avatar
1 vote
0 answers
133 views

GL_OUT_OF_MEMORY Error when glDispatchCompute takes longer

I built a simple Ray Tracer which takes use of OpenGL's Compute Shader and traces ".obj" Files. The results are passed to the Host-Program via glMapBufferRange after finishing computing. ...
herrmutig's user avatar
0 votes
0 answers
145 views

How to achieve higher FPS for my GPU pathtracer?

So I've written a GPU based pathtracer using OpenCL-GL interoperability. The system uses the Mega-Kernel Approach instead of a wavefront one as I was aiming it as an educational software for other ...
gallickgunner's user avatar
1 vote
0 answers
59 views

Why cache working set per multiprocessor for texture memory in Nvidia has a variable size?

I saw it here https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability , I don't know if it also happens ...
alvaro9650's user avatar
1 vote
1 answer
662 views

Is `groupshared` memory stored in L2 cache of GPU?

The article says that L1 cache is shared by work items in the same work group(aka. SM) and L2 cache is shared by different work groups. In Direct3D, it seems that a thread group (which is specified ...
Cu2S's user avatar
  • 167
1 vote
0 answers
70 views

Why don't discretization errors occur with compute-shaded kernel filters?

An efficient compute-shaded image filter would be emitted with (screenX / [kernel width], screenY / [kernel height], 1) groups and one kernel in each group, allowing texels to pass into groupshared ...
Paul Ferris's user avatar
1 vote
0 answers
191 views

fastest way to bucket triangles into a grid?

What is the fastest known method for bucketing triangles into an unbounded regular 3D grid? Specifically, I need an array of buckets. Random queries (which bucket is here) are not necessary, as this ...
Taylor's user avatar
  • 151
0 votes
2 answers
4k views

Is it possible to emulate Vulkan on a non Vulkan compatible gpu?

I don't think there is much to explain here, since the question is pretty much in the title, but i'll try to explain myself better: My current laptop's gpu does not support Vulkan, so i was wondering ...
user avatar
4 votes
1 answer
582 views

Using multiply and accumulate of 4x4 matrices for ray-triangle intersection tests on GPU

Is it possible to gain performance boost using new 4x4 MAD from NVIDIA'a tensor cores for ray-triangle intersection tests? Really there are two questions: Is it possible to modify some of the ray-...
Tomilov Anatoliy's user avatar
3 votes
1 answer
120 views

How to avoid slowdown with 25-30 students running simple GPU kernels on 4 GeForce GTX 650 Ti s?

So I'm teaching crash-course in CUDA that teaches students how to write good GPU code (CUDA 7.5 in this case). They kernels they will be running will do matrix multiply on 2048x2048 floating point ...
lil' wing's user avatar
2 votes
0 answers
293 views

Using GPU instead of CPU in Scala

I wrote a program that displays points expressed in 3D in a 2D canvas, using perspective projection. The aim is to display a cube. Each face of the cube is drawn by linearly interpolating the points ...
JarsOfJam-Scheduler's user avatar
0 votes
1 answer
333 views

Process of compute shader in OpenGL

I'am curious about compute shader in OpenGL. Let's assume the number of points (vec4) is 900 and the work group size(= the number of work items) is 256 Then, We would have four work groups because ...
shashack's user avatar
  • 523
0 votes
0 answers
233 views

C++/OpenGL program crashes after return of glDispatchCompute function (TDR related?)

I'm doing some parallel computing using my GPU. When I used glDispatchCompute first the program crashed after a few seconds which I found out was due to TDR. I deactivated TDR using the registry key ...
Mario's user avatar
  • 109
3 votes
0 answers
215 views

What aspects of GPU architecture are computer graphics programmers expected to be intimately familiar with? [closed]

I am an aspiring CG programmer and would like to know what some of the more nuanced aspects of computer architecture are. I have already taken several introductory arch courses where we've covered ...
RomanLarionov's user avatar
2 votes
0 answers
45 views

Optimization Strategies for FFT sound transformations using GPGPU

I want to run audio FFT transformations on a GPU using, possibly, OpenCL. What are the best optimization strategies for: converting audio signals to FFT; transfer them to the graphics card; compute ...
tmm88's user avatar
  • 21

15 30 50 per page