/ neural network

How to install CUDA and TensorFlow GPU on Ubuntu 16.04

I remember when 4 years ago, I was trying to configure CUDA on a laptop with Ubuntu 14.04 and Nvidia Optimus technology - it was a quite tough process. Several times I messed up so much that it was easier to reinstall the whole OS to have a fresh start than trying to undo everything.

Luckily, time changed and nowadays, it's much easier and better documented. I was really positively surprised how much all CUDA-related tools matured over the time. So... let's start.

Initial checks

Make few initial checks to ensure that everything is in place:

GPU device

Execute in a terminal:

lspci | grep -i nvidia

And you should see something similar to this:

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c20 (rev a1)

Linux version

Run:

uname -m && cat /etc/*release | head -n0

And the expected version is:

x86_64

GCC

Run:

gcc --version

And check if your system has GCC installed. In my case, the output looked like:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Preparation

Kernel headers

Install kernel headers:

sudo apt-get update
sudo apt-get install -y linux-headers-$(uname -r)

Remove old Nvidia's stuff

First, locate what you have already installed:

dpkg --get-selections | grep cuda | egrep 'install$'

The command above can return nothing or some packages:

...
cuda-cufft-dev-7-5			install
cuda-curand-7-5				install
cuda-curand-dev-7-5			install
cuda-cusolver-7-5			install
cuda-cusolver-dev-7-5			install
cuda-cusparse-7-5			install
cuda-cusparse-dev-7-5			install
cuda-driver-dev-7-5			install
cuda-license-7-5			install
...

And for each found element run:

sudo dpkg --remove <package-name>
# Example:
sudo dpkg --remove cuda-curand-7-5

If you want to read more about the initial checks and preparations which we have made, you can check a documentation on Nvidia's website here.

Install CUDA

Disable Nouveau driver

First, check if your system uses it. Run:

lsmod | grep nouveau

And if it prints anything that it means that you have to disable the driver. To do it, create /etc/modprobe.d/blacklist-nouveau.conf file and add the following lines to the file:

blacklist nouveau
options nouveau modeset=0

When it's done and saved, recreate kernel initramfs:

sudo update-initramfs -u

Check device nodes

Check that the device files /dev/nvidia* exist and have the correct (0666) file permissions. If not, use the following script to fix it:

#!/bin/bash

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i
  done

  mknod -m 666 /dev/nvidiactl c 195 255

else
  exit 1
fi

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
else
  exit 1
fi

And verify again, if the files are present:

ls -l /dev/nvidia*

You should see something like this:

crw-rw-rw- 1 root root 195,   0 kwi 14 00:29 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 kwi 14 00:29 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 kwi 14 00:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 241,   0 kwi 14 00:29 /dev/nvidia-uvm

Again, if you are looking for more information, the relevant part is available on Nvidia's page here.

Install CUDA package

To download CUDA, go to developer.nvidia.com/cuda-90-download-archive and get the runfile installer.

I have installed version 9.0, but if you wish to install another one, you can download it from here.

Now, when the runfile is already downloaded, execute:

cd ~/Downloads
sudo sh cuda_9.0.176_384.81_linux.run

Note that your exact name of the runfile can be slightly different.

Then press q, accept, n, y, [enter], y, y and [enter] to configure the installation settings. My output looked like:

Logging to /tmp/cuda_install_2986.log
Using more to view the EULA.
End User License Agreement
--------------------------

...


Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/ntsoi ]:

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...

Install updates for CUDA

From the same location where you found runfile for CUDA, download also all available updates. In my case, two of them were released. To install them, type:

sudo sh cuda_9.0.176.1_linux.run
sudo sh cuda_9.0.176.2_linux.run

Then, execute ldconfig:

sudo ldconfig

Go to the text mode by pressing Ctrl+Alt+F1, login and run:

sudo service gdm stop

Next, add the following exports to /etc/profile.d/cuda.sh:

export PATH=/usr/local/cuda/bin:$PATH 
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Install graphic driver

To install the driver, execute:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-384

Note: you can check if there is available a newer version of the driver than 384.

Install CUDA Profile Tools Interface

Install CUPTI with:

sudo apt-get install -y libcupti-dev

And add the following line to /etc/profile.d/cuda.sh:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64

So your file should be:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64

Reboot

At the end, reboot your system:

sudo reboot

Verify the installation

Make some basic test, if you are able to use CUDA and your GPU is correctly detected.

cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
make
./deviceQuery

You should see a similar output:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 with Max-Q Design"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6073 MBytes (6367739904 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1342 MHz (1.34 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

Install CuDNN

Download CuDNN from developer.nvidia.com/cudnn - regular, dev and doc deb files. And install it:

sudo dpkg -i libcudnn7_7.1.2.21-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.2.21-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.1.2.21-1+cuda9.0_amd64.deb

Then, verify the installation running:

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN

You should observe as an output something analogous:

cudnnGetVersion() : 7102 , CUDNN_VERSION from cudnn.h : 7102 (7.1.2)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 10  Capabilities 6.1, SmClock 1341.5 Mhz, MemSize (Mb) 6072, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.027296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.031744 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.043616 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.168960 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.235520 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.025536 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029504 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.038912 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.152576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.249728 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Install TensorFlow

Choose if you want to install TensorFlow with virtualenv (I suggest this way) or directly in a system.

With virtualenv

To use virtualenv, execute:

sudo apt-get install python-pip python-dev python-virtualenv
mkdir .pyenv
cd .pyenv
virtualenv --system-site-packages ds
source ds/bin/activate
pip install pip -U
pip install --upgrade tensorflow-gpu

Or without virtualenv

sudo apt-get install python-pip python-dev
pip install pip -U
pip install --upgrade tensorflow-gpu

Verify installation

Run python in the terminal and then type import tensorflow to check if it was correctly loaded.

Summary

That's all. Now, you should be able to use your TensorFlow with GPU and train awesome Deep Neural Networks! I hope that this article was helpful. If so, do not hesitate to leave a comment and share it on Twitter, Facebook, LinkedIn or other social media which you prefer.