0

So I do research in machine learning and use a remote GPU server to do the stuff. What I usually do when I come to work is access the server with an SSH client and run my alias hi command which is:

alias hi='conda activate userconda; export CUDA_VISIBLE_DEVICES=1; alias hi'

The server is usually shared among two to three people and has two GPU's, each with ID 0 or 1.

What I'm wondering is, would there be some kind of way to automatically determine which GPU ID to assign to the environment variable CUDA_VISIBLE_DEVICES based on which GPU isn't being used? Right now my alias is hard coded to be CUDA_VISIBLE_DEVICES=1, but it would be more convenient if the program could do that automatically.

I was thinking maybe there could be a way to use the output of nvidia-smi, but I'm not sure if that would be the right approach.

Thanks!

1 Answer 1

0

Nvidia smi reports you the bus ofthe active video card:

nvidia-smi 
Sat Apr 11 04:02:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    On   | 00000000:01:00.0  On |                  N/A |
| 45%   33C    P8    11W / 175W |    252MiB /  7974MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1089      G   /usr/lib/xorg/Xorg                           106MiB |
|    0      1355      G   /usr/bin/kwin_x11                             79MiB |
|    0      1361      G   /usr/bin/krunner                               2MiB |
|    0      1364      G   /usr/bin/plasmashell                          56MiB |
|    0      5152      G   /usr/lib/firefox/firefox                       2MiB |
+-----------------------------------------------------------------------------+

So: 00000000:01:00.0 On is the device bus. So you can run nvidia-smi grep for this result and assing the other device whose bus you can get by lspci.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .