BIOS configuration
...Yes, my computer is old enough to still have a BIOS.
My configuration: monitor connected to the integrated graphics card (IGC) and GPU only for computations.
However, it's possible that the BIOS may activate the GPU instead of the IGC.
So, check out the BIOS settings (maybe, with the GPU unplugged) and find out the "video" section.
You should make sure that the graphics card parameter is NOT set to AUTO, but will point directly to the
IGC.
Setup the OS
Ok, here comes the trickier part, installing the driver and configuring the system so that everything works.
I'm using GNU/Linux, Debian 10.
As you may know, NVidia drivers are proprietary, not open-source.
However, there is an open-source version which may be activated by default when you install the operating system.
It's called "nouveau".
This version is less performant than the proprietary NVidia driver, so we shall install it.
Installing it will break for a moment the video settings, so we'll have to reset them by changing the xorg.conf file.
Step 0: lspci
Before starting, check out that your system recognized the GPU:
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
Step 1: disable nouveau drivers
First, we remove all automatically installed NVidia stuff.
In Debian-based distributions:
$ sudo apt-get remove --purge nvidia-*
$ sudo nvidia-uninstall
Note that this may be a bit a problem if your IGC is also from NVidia...
But I guess it's never the case.
Not sure though.
They are die-hard, so we make sure that the drivers are not loaded, by blacklisting them.
Go to the directory "/etc/modprobe.d" and take a look.
You will probably have a file "blacklist-nouveau.conf" or similar.
We create a file (the filename doesn't matter, it will just be called at the proper time):
$ sudo vim /etc/modprobe.d/blacklist-nouveau.conf
... and add this stuff in the file...
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
Now, drivers are loaded early during the boot process.
In order to make stuff work, you also need to update the initramfs.
This is done by
$ sudo update-initramfs -u
This should be enough.
Changes will take place after rebooting.
So, reboot your system (or if you prefer, you could use the command rmmod to remove a module now).
Step 2: Install NVidia drivers
First of all, on Debian, make sure that you have the linux-headers installed.
Not sure if this is strictly necessary, but this doesn't hurt.
You can use:
$ sudo apt install linux-headers-amd64
Ok, you can now download the NVidia drivers from the website, selecting your own GPU.
Make sure you select the 64 bit version (unless you have a 32 bits machine).
Note that NVidia updates the drivers quite frequently, fixing bugs and adding features.
Download the latest one.
In my case, the file is called "NVIDIA-Linux-x86_64-440.82.run".
Installing them is pretty easy.
Make the file executable:
chmod +x ./NVIDIA-Linux-x86_64-440.82.run
For installing the driver, you need to do it out of the graphics system (X).
So, press CTRL-ALT-F1 and you'll get out of X to a terminal.
To get back to X you press CTRL-ALT-F7 in many systems.
Note that in some systems, CTRL-ALT-F1 is actually X.
Log in and then kill the X system by:
$ sudo service lightdm stop
# Alternatively, you could have gone to the runlevel 3:
$ sudo init 3
I hope you didn't have anything important running in X.
We are ready to run the NVidia installer.
Since I want to use the integrated graphics card for the monitor and the GPU only for the
computations, I will specify to avoid installing the OpenGL support.
Run:
$ ./NVIDIA-Linux-x86_64-440.82.run --no-opengl-files
This step should run just fine, unless you still have the nouveau drivers running around.
You can accept the options that it suggests.
Reboot the system.
If everything went as expected, quite likely X is broken now.
Keep reading.
Step 3: set the xorg.conf
Probably, at this point when the system is booted, it won't show the login screen, but a black screen instead,
with a blinking cursor.
This is because the NVidia installer probably setup the xorg.conf file, that tells to the graphics
system X how to configure the outputs.
So, we need to fix this file.
Luckily, it's quite simple, and the file is quite intuitive.
First, press again CTRL-ALT-F1 (or another F-key) and log in the terminal.
Check out the PCI configuration with lspci | grep VGA:
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
This tells us that the integrated card is on the PCI bus 00:02.0 and the GPU on 01:00.0.
In the format of the xorg.conf file the IGC is on PCI:0@0:2:0.
So, you should now modify the xorg.conf file:
$ sudo vim /etc/X11/xorg.conf
Mine looks like the following.
You can see that first of all, there is a "ServerLayout" section, defining two screens, screen 0 and screen 1.
The first screen is the actual one.
We define a second one just to map it to the GPU.
When, we will define one "Screen" for the IGP (and call it "intel") and one for the the GPU (called "nvidia")
using the "Section "Screen"" keywork, and say that each Screen is
mapped to a different "Device".
We create also two Devices, one for the IGP ("intel") and one for the GPU ("nvidia")
For the GPU, we don't really need to put the correct address I guess, since X will not be managing it.
Just check out the file, it's quite easy:
# Configuration for /etc/X11/xorg.conf
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "intel"
Screen 1 "nvidia"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
# ========== MOUSE AND KEYBOARD ================
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
# ========== Integrated card and GPU ================
# This device maps to the IGC
Section "Device"
Identifier "intel"
Driver "intel"
BusID "PCI:0@0:2:0"
Option "AccelMethod" "SNA"
EndSection
# This device maps to the GPU
Section "Device"
Identifier "nvidia"
Driver "nvidia"
BusID "PCI:0@1:0:0"
Option "ConstrainCursor" "off"
EndSection
# This Screen calls the IGC device
Section "Screen"
Identifier "intel"
Device "intel"
EndSection
# This Screen calls the GPU device
Section "Screen"
Identifier "nvidia"
Device "nvidia"
Option "AllowEmptyInitialConfiguration" "on"
Option "IgnoreDisplayDevices" "CRT"
EndSection
So, now reboot and magically everything should be working!
You can check that your installation went well and the status of your GPU by running:
$ nvidia-smi
Sun Apr 12 13:15:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 760 Off | 00000000:01:00.0 N/A | N/A |
| 29% 30C P8 N/A / N/A | 12MiB / 1999MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
As you can see, the GPU is properly recognized, although no more info is available for this type of GPU
(NVidia releases this only for higher-end GPUs).
Installing the software
The "compute capability" of a CUDA-capable GPU has to do with the specifications of the GPU and the set of instructions
that it is able to run.
As of April 2020, TensorFlow is compiled as to support by default GPUs with compute capability larger than 3.5.
GPU with compute capability ≥ 3.5
If that's your case, then you just jave to install the gpu-enabled version of TensorFlow with python.
You can setup a Conda (Anaconda/Miniconda) environment, or with pip:
$ pip install tensorflow-gpu
And you will be good to go.
Now load TensorFlow from your python3 script and enjoy!
GPU with compute capability 3.0
My GPU has compute capability 3.0, which is not supported by default by TensorFlow.
So, I will install it from the sources and specify that option.
Check out the instructions on the tensorflow web page.
Basically, this will compile tensorflow from sources and create a package that you can then install into your python
environment.
Compiling TensorFlow on an low-specs computer takes forever, let me tell you.
In short, the procedure becomes:
### Install pre-requisites
$ pip install -U --user pip six numpy wheel setuptools mock 'future>=0.17.1'
$ pip install -U --user keras_applications --no-deps
$ pip install -U --user keras_preprocessing --no-deps
### Need Go language, for bazelisk
$ sudo apt-get install golang-go
### Install bazelisk
$ git clone https://github.com/bazelbuild/bazelisk.git
$ cd bazelisk
$ ./build.sh
$ cd ..
### Create an alias to bazelisk called "bazel", for your simplicity
### (put the proper path here and reference to the proper binary)
$ echo "alias bazel='~/bazelisk/bin/bazelisk-linux-amd64'" >> ~/.bash_aliases
Then, we need to install CUDA.
You'll need the CUDA SDK, which you can download from the NVidia CUDA website.
Unfortunately the CUDA package shipped with Debian supports the "nouveau" driver only, so we need to
download it from the website.
There is no direct support for Debian, but there are Ubuntu packages, which will do just fine.
Just, dowload the ".run" file, since messing with the repositories may not work.
Follow the instructions on the CUDA website.
So, you need to again go to runlevel 3 to install this...
At this point I accepted to install the CUDA driver and this made the previous
installation of the NVidia driver useless. I'll keep it this way.
Ok, it's time to download and compile Tensorflow.
First, you need to modify the configure.py file by changing the default compute capability.
Put 3.0 in place of 3.5.
Then, configure:
$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ ./configure
During the configurations you will be prompted some questions.
Make sure you enable CUDA support at the proper stage.
Then, compile with bazilisk.
Notice that bazilisk is quite memory eager, and if you have a limited system, compiling tensorflow
becomes an issue.
I have 4 GB of RAM and after 4 hours of compilations, it crashed throwing some "critical error: out of memory".
After passing the --local_ram_resources=2048 flag the compiling also crashed.
The amount of time it takes is largely due to the system swapping very often (check with htop).
So, you should give a set of additional options, for the maximum number of jobs and the available RAM:
bazel build --config=opt --config=cuda --local_ram_resources=2048 \
--local_cpu_resources=2 \
--jobs=2 \
--ram_utilization_factor=30 \
//tensorflow/tools/pip_package:build_pip_package
With this setup, it took some 9.5 hours (...) but it compiled succesfully!
At this point, you will have a directory bazel-bin in the current working directory, and inside there, you will
eventually find an executable build_pip_package.
We build a package for pip in the /tmp directory:
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
And finally you can install it with pip by:
$ pip install /tmp/tensorflow_pkg/whatevernameithas.whl
Note that you may need to use python3.5, since python3.7 would not work.
ANOTHER OPTION: using Theano
OK so, another option is using Theano.
First, as of now it needs a python version larger than 3.4 but below 3.6.
I have 3.7 installed, so let's use conda for creating a virtual environment with python 3.5.
Install miniconda (or anaconda) first.
Then create a virtual environment by:
$ conda create --name neural_net_theano_py3p5 python=3.5
$ conda activate neural_net_theano_py3p5
Now let's install some packages that Theano needs.
You'll need the CUDA SDK, which you can download from the NVidia CUDA website.
Unfortunately the CUDA package shipped with Debian supports the "nouveau" driver only, so we need to
download it from the website.
There is no direct support for Debian, but there are Ubuntu packages, which will do just fine.
Just, dowload the ".run" file, since messing with the repositories may not work.
Follow the instructions on the CUDA website.
So, you need to again go to runlevel 3 to install this...
At this point I accepted to install the CUDA driver and this made the previous
installation of the NVidia driver useless. I'll keep it this way.
Then, you need to install the cuDNN libraries (CUda Deep Neural Network), which are used by Theano and TensorFlow.
You'll need to create a profile, login into the NVidia website and the follow the instructions to install and download.
The installation will boil down to downloading a tar.gz file and copying files in the proper locations, in /usr/local/cuda/...
Ok, once this is done, let's try to install Theano:
$ conda install numpy scipy mkl
$ conda install theano pygpu
You will need one more step before running it, since it will probably not find the header files for the cuDNN library.
Create a file .theanorc in your home, with:
[global]
device = cuda
floatX = float32
[dnn]
include_path=/usr/local/cuda-10.2/include
library_path=/usr/local/cuda-10.2/lib64
At this point, Theano should work.
References
Back to Homepage