I spent a couple months at the beginning of this year learning about GPU programming through trying to optimize inference for Cheng Chi’s awesome Diffusion Policy paper. I was able to improve inference time for the convolutional U-Net by ~3.4x over Pytorch eager mode and ~2.65x over Pytorch compile mode! For anyone interested in GPU optimizations for deep learning, I wrote a 9-part blog post that builds up from the physical structure of DRAM/SRAM cells all the way up to integrating custom CUDA kernels in Pytorch: https://lnkd.in/dBMSqh4g I also have a Twitter thread of the most interesting tid-bits here: https://lnkd.in/db_hEqbD This video (requires audio) is unrelated to the Diffusion inference stuff but imo, more amusing… I was able to get my Nvidia RTX 3090 inductor coils to play ‘Twinkle Twinkle Little Star’ using kernels (GPU programs) that modulate power draw at the right frequencies! What’s happening here is each kernel launch triggers a surge of in-rush current in the GPU’s inductor coils. The Lorentz force due to the change in current (proportional to change in current divided by the change in time) causes the coil to move slightly. If we play with the kernel launch frequencies we can vibrate the coils and get noises in the audible range. Unfortunately we can’t make sounds lower than 2000Hz because the ‘change in time’ part of the equation becomes too large, and the resulting vibration is too weak to make audible noise. So we end up with Twinkle Twinkle shifted up many octaves 😀
This is written in great detail! Kudos!
Holy shit man this is incredible - I thought it’d be a quick read but this is a LOT!
That’s amazing!
Thanks for sharing
LOL good stuff!
Epic
Unbelievable Stuff...🖤
This is awesome!
Great work
Data @ Aleph
2moThis is true engineering stuff right here. How do you make a VC-backable start up with this knowledge