Azure VM Loaded runtime CuDNN library: 8.2.4 but source was compiled with: 8.6.0

Question

I have tried to fit a Keras model on a notebook in Microsoft Azure Machine Learning Studio GPU machine. I have received an error similar to what was described here:

2023-04-27 09:56:21.098249: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:417] Loaded runtime CuDNN library: 8.2.4 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-04-27 09:56:21.099011: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at pooling_ops_common.cc:412 : UNIMPLEMENTED: DNN library is not found.
2023-04-27 09:56:21.099050: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
     [[{{node model_2/max_pooling1d_6/MaxPool}}]]
2023-04-27 09:56:21.100704: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:417] Loaded runtime CuDNN library: 8.2.4 but source was compiled with: 8.6.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2023-04-27 09:56:21.101366: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at pooling_ops_common.cc:412 : UNIMPLEMENTED: DNN library is not found.

What is the solution for Azures' machines?

ladams · Accepted Answer · 2023-05-17 16:13:46Z

This was a royal pain in the arse to fix - I don't know why Microsoft haven't fixed/bumped the cuDNN version from 6.1. The included conda environment with tensorflow doesn't work.

Essentially, we need to manually install an older version of tensorflow, or a newer version of cuDNN. As no version of tensorflow is compatable with cuDNN 6.1 we are forced to upgrade cuDNN.

The solution that works is as follows:

At time of writing - you want cuDNN version 6.8 (for TF 1.12.x) - get cuDNN link from here with your client computer, but stop the link so you can get one with an auth key

Enter the link into the export URL line below
Copy and paste this into your running compute terminal
Wait 5 minutes ☕️

export URL="PASTE-LINK-HERE"
# ==== DOWNLOAD CUDDN ==== 
curl $URL -o ./cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz 
sudo tar -xvf ./cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
# ==== INSTALL CUDDN ==== 
sudo cp ./cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include 
sudo cp -P ./cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# ==== CONFIGURE DYNAMIC RUNTIME BINDINGS ==== 
sudo ldconfig
# ==== INSTALL CONDA ENV ==== 
conda create -n "tfgpu" python=3.10 -y
conda activate tfgpu
conda install -c conda-forge cudatoolkit=11.8.0 ipykernel -y
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -m ipykernel install --user --name tfgpu --display-name "Python (tf-cudnn8.6)"
# ==== VERIFY ==== 
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Testing this on the tensorflow mnist example:

I hope this helps!

Thanks for answering. Do you think that the suggestion below works too? it seems to be simpler... — Gideon Kogan, Commented May 17, 2023 at 16:23
I'll give it a go and let you know; I don't expect it to work as I tried something similar but may be wrong. Does it work for you? — ladams, Commented May 17, 2023 at 16:29
He said that it worked but it seems to him that the GPU does not provide the expected performance. Now since I expect the GPU make things faster I am not sure it all works the same with our method and his — Gideon Kogan, Commented May 17, 2023 at 16:48
So I can confirm your friend's solution does work ~ and it does look like it hits the GPU; however you're then stuck with tensorflow version 2.4.1 (which is from Jan 2021 - maybe why not as performant?) - and you will need to do what I did at some point if you ever want to upgrade tensorflow to a more modern version. — ladams, Commented May 17, 2023 at 17:00

Gideon Kogan · Accepted Answer · 2023-05-17 20:37:22Z

0

In any notebook, you run:

!conda create -n cuda_env python=3.8 numpy scipy pandas scikit-learn matplotlib jupyter ipykernel cudatoolkit=10.1 -c anaconda -y
!pip install tensorflow-gpu==2.4.1
!pip install keras==2.4.3
!python -m ipykernel install --user --name cuda_env --display-name "Python (CUDA)"

This creates a kernel named Python (CUDA) you can later choose.

edited May 17, 2023 at 20:37

answered May 17, 2023 at 16:22

Gideon Kogan

7434 silver badges19 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Azure VM Loaded runtime CuDNN library: 8.2.4 but source was compiled with: 8.6.0

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
azure
keras
jupyter-notebook
virtual-machine
cudnn
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged azurekerasjupyter-notebookvirtual-machinecudnn or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
azure
keras
jupyter-notebook
virtual-machine
cudnn
or ask your own question.