Skip to main content

Questions tagged [gpu]

The tag has no usage guidance.

0 votes
1 answer
27 views

How do I decide the specifications for a compute cluster that I need for a fine-tuning task in Azure ML?

I was using the Standard_NC48ads_A100_v4 Compute cluster with 2 nodes to fine-tune a Phi-3-mini-128k-instruct model. The data used for the same was around 25 MB. I set the training batch size to 10. ...
S R's user avatar
  • 11
3 votes
0 answers
136 views

How to package and distribute a Tensorflow GPU desktop application

I am developing a desktop application that utilises Tensorflow. The aim of the application is to let users easily train a given model and use it for inference within the app. I want to support ...
turnip's user avatar
  • 1,677
3 votes
3 answers
830 views

How does cuRAND use a GPU to accelerate random number generation? Don't those require a state?

My understanding is that every PRNG or QRNG requires a state to prevent the next item in its sequence from being too predictable; which is sensible, as they're all running on deterministic hardware. ...
Michael Macha's user avatar
2 votes
1 answer
195 views

Docker and GPU-based computations. Feasible? [closed]

Recently I ran against this question and this Nvidia-docker project, which is an Nvidia Docker implementation and it made me wondering where, why and how this scheme makes sense? I found out some ...
Suncatcher's user avatar
1 vote
1 answer
163 views

What tools exist to determine the speed up a GPU will have on an algorithm?

Basically, I am wondering what sort of speed I will get by parallelizing a algorithm to work with GPUs. I am wondering if someone has implemented queueing theory/Amdahl's law with a UI or if everyone ...
Robert Baron's user avatar
  • 1,132
6 votes
1 answer
4k views

Definition and usage of "warp" in parallel / GPU programming

I have come across the word "warp" in a few places but haven't seen a thorough definition (there's no Wikipedia page on it either). A brief definition is found here: In the SIMT paradigm, threads ...
Lance's user avatar
  • 2,615
0 votes
1 answer
431 views

Example of parallel sum algorithm on GPU

Trying to imagine how you would go about implementing summation (or reduction?) on a parallel architecture and am having a difficult time. Specifically thinking in terms of WebGL arrays of vectors ...
Lance's user avatar
  • 2,615
4 votes
3 answers
2k views

Parallel processing a Tree on GPU

I have seen a few papers on parallel/GPU processing of trees, but after briefly looking through them I wasn't able to grasp what they did. The closest to a helpful explanation was found in ...
Lance's user avatar
  • 2,615
0 votes
1 answer
2k views

Feature of CPU needed to run Javascript fast

This is more of a Computer Engineering question, but what is the feature of a CPU to run Javascript fast? I use to access the internet with an AMD Phenom II with 6 cores and I could almost have as ...
Dehbop's user avatar
  • 169
7 votes
0 answers
194 views

Incorporating existing 2D OpenCL/OpenGL application in 3D scene

There is an existing real-time, scientific visualization application that uses OpenCL and OpenGL to render complex 2D graphs. My goal is to incorporate this application into a 3D rendered scene. At ...
Liam Kelly's user avatar
2 votes
1 answer
202 views

Large Scale Machine Learning vs Traditional HPC Hardware

I've spent the last few days working with tensorflow for the first time as part of a natural language processing assignment for my degree. It's been interesting (fun isn't the right word) trying to ...
HJCee's user avatar
  • 165
43 votes
7 answers
10k views

In software programming, would it be possible to have both CPU and GPU loads at 100%?

This is a general question on a subject I've found interesting as a gamer: CPU/GPU bottlenecks and programming. If I'm not mistaken, I've come to understand that both CPU and GPU calculate stuff, but ...
Azami's user avatar
  • 549
1 vote
1 answer
443 views

Does this workload fit GPU's?

I have a perfectly parallel function that would run great on a machine with 1024 cores and 4GB RAM. There's quite a lot of branching (doing set union and traversing structs). There is no communication ...
Drathier's user avatar
  • 2,863
8 votes
1 answer
3k views

Is hardware accelerated GUI data kept on the GPU

I am doing some research as to how most hardware accelerated GUI libraries work. I am actually only care about the rendering backends of them here. I am trying to figure out what would be the best way ...
Gerharddc's user avatar
  • 191
-1 votes
1 answer
336 views

Linking kernel voids without CPU parse (Compute shaders)

Is it possible to parse data between compute shader voids without having to create a new buffer and cpu link (Using unity with C# interface). For example I have a kernel with position data on a set ...
Jamie Nicholl-Shelley's user avatar

15 30 50 per page