How to install CUDA and TensorFlow GPU on Ubuntu 16.04
I remember when 4 years ago, I was trying to configure CUDA on a laptop with Ubuntu 14.04 and Nvidia Optimus technology - it was a quite tough process. Several times I messed up so much that it was easier to reinstall the whole OS to have a fresh start than trying to undo everything.
Luckily, time changed and nowadays, it's much easier and better documented. I was really positively surprised how much all CUDA-related tools matured over the time. So... let's start.
Initial checks
Make few initial checks to ensure that everything is in place:
GPU device
Execute in a terminal:
lspci | grep -i nvidia
And you should see something similar to this:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c20 (rev a1)
Linux version
Run:
uname -m && cat /etc/*release | head -n0
And the expected version is:
x86_64
GCC
Run:
gcc --version
And check if your system has GCC installed. In my case, the output looked like:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Preparation
Kernel headers
Install kernel headers:
sudo apt-get update
sudo apt-get install -y linux-headers-$(uname -r)
Remove old Nvidia's stuff
First, locate what you have already installed:
dpkg --get-selections | grep cuda | egrep 'install$'
The command above can return nothing or some packages:
...
cuda-cufft-dev-7-5 install
cuda-curand-7-5 install
cuda-curand-dev-7-5 install
cuda-cusolver-7-5 install
cuda-cusolver-dev-7-5 install
cuda-cusparse-7-5 install
cuda-cusparse-dev-7-5 install
cuda-driver-dev-7-5 install
cuda-license-7-5 install
...
And for each found element run:
sudo dpkg --remove <package-name>
# Example:
sudo dpkg --remove cuda-curand-7-5
If you want to read more about the initial checks and preparations which we have made, you can check a documentation on Nvidia's website here.
Install CUDA
Disable Nouveau driver
First, check if your system uses it. Run:
lsmod | grep nouveau
And if it prints anything that it means that you have to disable the driver. To do it, create /etc/modprobe.d/blacklist-nouveau.conf
file and add the following lines to the file:
blacklist nouveau
options nouveau modeset=0
When it's done and saved, recreate kernel initramfs:
sudo update-initramfs -u
Check device nodes
Check that the device files /dev/nvidia*
exist and have the correct (0666) file permissions. If not, use the following script to fix it:
#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
# Find out the major device number used by the nvidia-uvm driver
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
mknod -m 666 /dev/nvidia-uvm c $D 0
else
exit 1
fi
And verify again, if the files are present:
ls -l /dev/nvidia*
You should see something like this:
crw-rw-rw- 1 root root 195, 0 kwi 14 00:29 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 kwi 14 00:29 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 kwi 14 00:29 /dev/nvidia-modeset
crw-rw-rw- 1 root root 241, 0 kwi 14 00:29 /dev/nvidia-uvm
Again, if you are looking for more information, the relevant part is available on Nvidia's page here.
Install CUDA package
To download CUDA, go to developer.nvidia.com/cuda-90-download-archive and get the runfile
installer.
I have installed version 9.0, but if you wish to install another one, you can download it from here.
Now, when the runfile is already downloaded, execute:
cd ~/Downloads
sudo sh cuda_9.0.176_384.81_linux.run
Note that your exact name of the runfile can be slightly different.
Then press q
, accept
, n
, y
, [enter]
, y
, y
and [enter]
to configure the installation settings. My output looked like:
Logging to /tmp/cuda_install_2986.log
Using more to view the EULA.
End User License Agreement
--------------------------
...
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ntsoi ]:
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Install updates for CUDA
From the same location where you found runfile
for CUDA, download also all available updates. In my case, two of them were released. To install them, type:
sudo sh cuda_9.0.176.1_linux.run
sudo sh cuda_9.0.176.2_linux.run
Then, execute ldconfig:
sudo ldconfig
Go to the text mode by pressing Ctrl+Alt+F1
, login and run:
sudo service gdm stop
Next, add the following exports to /etc/profile.d/cuda.sh
:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Install graphic driver
To install the driver, execute:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-384
Note: you can check if there is available a newer version of the driver than 384.
Install CUDA Profile Tools Interface
Install CUPTI with:
sudo apt-get install -y libcupti-dev
And add the following line to /etc/profile.d/cuda.sh
:
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64
So your file should be:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64
Reboot
At the end, reboot your system:
sudo reboot
Verify the installation
Make some basic test, if you are able to use CUDA and your GPU is correctly detected.
cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
make
./deviceQuery
You should see a similar output:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 with Max-Q Design"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6073 MBytes (6367739904 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1342 MHz (1.34 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
Install CuDNN
Download CuDNN from developer.nvidia.com/cudnn - regular, dev and doc deb
files. And install it:
sudo dpkg -i libcudnn7_7.1.2.21-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.2.21-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.1.2.21-1+cuda9.0_amd64.deb
Then, verify the installation running:
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN
You should observe as an output something analogous:
cudnnGetVersion() : 7102 , CUDNN_VERSION from cudnn.h : 7102 (7.1.2)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 10 Capabilities 6.1, SmClock 1341.5 Mhz, MemSize (Mb) 6072, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.027296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.031744 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.043616 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.168960 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.235520 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.025536 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029504 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.038912 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.152576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.249728 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Install TensorFlow
Choose if you want to install TensorFlow with virtualenv (I suggest this way) or directly in a system.
With virtualenv
To use virtualenv, execute:
sudo apt-get install python-pip python-dev python-virtualenv
mkdir .pyenv
cd .pyenv
virtualenv --system-site-packages ds
source ds/bin/activate
pip install pip -U
pip install --upgrade tensorflow-gpu
Or without virtualenv
sudo apt-get install python-pip python-dev
pip install pip -U
pip install --upgrade tensorflow-gpu
Verify installation
Run python
in the terminal and then type import tensorflow
to check if it was correctly loaded.
Summary
That's all. Now, you should be able to use your TensorFlow with GPU and train awesome Deep Neural Networks! I hope that this article was helpful. If so, do not hesitate to leave a comment and share it on Twitter, Facebook, LinkedIn or other social media which you prefer.